[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] strange gridmanager behaviour with proxies



Dear All,

I've recently noticed that the condor_gridmanager daemon
is acting oddly when I specify proxy attributes in the job submission 
file. The job completes OK but the condor_gridmanager hangs around afterwards
and soaks up pretty much all of the CPU on one processor (using four way
SMP). This sounds suspiciously like it's got stuck in busy loop.

The GridManager log file seems to stall at:

6/12 10:30:41 [9658] Fetched 1 job ads from schedd
6/12 10:30:41 [9658] Updating classad values for 114314.0:
6/12 10:30:41 [9658]    GridJobId = UNDEFINED
6/12 10:30:41 [9658]    Managed = "ScheddDone"
6/12 10:30:41 [9658] Deleting job 114314.0 from schedd
6/12 10:30:41 [9658] GAHP[9663] <- 'UNCACHE_PROXY 1'
6/12 10:30:41 [9658] GAHP[9663] -> 'S'
6/12 10:30:41 [9658] GAHP[9663] <- 'USE_CACHED_PROXY 2'
6/12 10:30:41 [9658] GAHP[9663] -> 'S'

When I submit the next job, it just sits in the idle state indefinitely
and nothing gets written to the GridManager log file. If I kill the
condor_gridmanager (which requires root permission ?!), then the next
job submission works.

The job parameters that are set are:

MyProxyHost
MyProxyServerDN 
MyProxyCredentialName 
MyProxyRefreshThreshold

If I take out MyProxyHost it works fine. 

I've seen this on 7.0.x and 6.8.4 and on solaris 10 and solaris 8.
Anyone else seen it ? Is it a known bug (the bug fix for 7.0.2
does mention clearing up daemons) ?

regards,

-ian.

-------------------------------------------
Dr. Ian C. Smith,
e-Science Team,
University of Liverpool
Computing Services Department.