[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] strange gridmanager behaviour with proxies

On Jun 16, 2008, at 4:01 AM, Smith, Ian wrote:

-----Original Message-----

I've recently noticed that the condor_gridmanager daemon
is acting oddly when I specify proxy attributes in the job submission
file. The job completes OK but the condor_gridmanager hangs around afterwards and soaks up pretty much all of the CPU on one processor (using four way
SMP). This sounds suspiciously like it's got stuck in busy loop.

The GridManager log file seems to stall at:

6/12 10:30:41 [9658] Fetched 1 job ads from schedd
6/12 10:30:41 [9658] Updating classad values for 114314.0:
6/12 10:30:41 [9658]    GridJobId = UNDEFINED
6/12 10:30:41 [9658]    Managed = "ScheddDone"
6/12 10:30:41 [9658] Deleting job 114314.0 from schedd
6/12 10:30:41 [9658] GAHP[9663] <- 'UNCACHE_PROXY 1'
6/12 10:30:41 [9658] GAHP[9663] -> 'S'
6/12 10:30:41 [9658] GAHP[9663] <- 'USE_CACHED_PROXY 2'
6/12 10:30:41 [9658] GAHP[9663] -> 'S'

When I submit the next job, it just sits in the idle state indefinitely
and nothing gets written to the GridManager log file. If I kill the
condor_gridmanager (which requires root permission ?!), then the next
job submission works.

The job parameters that are set are:


If I take out MyProxyHost it works fine.

Well job submission works fine but the proxy no longer gets renewed from
the MyProxy server so it looks like it is something to do with this
(I set all the MyProxy parameters inside the refresh script).

This is not a known problem. When the gridmanager becomes stuck, can you attach with a debugger and see where it's at in the code?

Thanks and regards,
Jaime Frey
UW-Madison Condor Team