[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor-G caches expired proxy certificate?



condor-users-bounces@xxxxxxxxxxx schrieb am 10/29/2008 05:01:36 PM:

> I didn't notice in the previous E-mail that Jan was doing gt4,
> WS-Gram.  There may be some complications there.  I haven't done
> much with what happens in that case.
> 
> What I do know is that, at the beginning of a gt4 job, condor-G will
> automatically delegate your proxy to the GT4 Delegation server on
> the remote gatekeeper and any further authentication during
> the job is using that delegated proxy which is already there.
> the question is--is condor-g smart enough to re-delegate the new
> proxy for gt4 jobs once it detects there's a new one?

Hi,

Looking at my log more closely, I can see that it is trying to refresh the 
original delegation resource (548f56c0-a5d5-11dd-90ab-d457d1ca8d16), which 
was created during submission of the now expired first job...

10/29 17:29:07 [21362] Checking proxies
10/29 17:29:07 [21362] JPL: Checking proxies for subject
10/29 17:29:07 [21362] JPL: Checking proxy for subject: /tmp/x509up_u1002
10/29 17:29:07 [21362] JPL: curr_proxy->expiration_time=1225297440, 
now=1225297747, new_expiration=1225301058
10/29 17:29:07 [21362] JPL: new_expiration > curr_proxy->expiration_time
10/29 17:29:07 [21362] JPL: curr_proxy->expiration_time > 
new_master->expiration_time
10/29 17:29:07 [21362] JPL: next_check (relative time) = 600
10/29 17:29:07 [21362] JPL: SetMasterProxy
10/29 17:29:07 [21362] Received CHECK_LEASES signal
10/29 17:29:07 [21362] Evaluating periodic job policy expressions.
10/29 17:29:07 [21362] (22528.0) doEvaluateState called: gmState 
GM_PROXY_EXPIRED, globusState Active
10/29 17:29:07 [21362] (22528.0) gm state change: GM_PROXY_EXPIRED -> 
GM_START
10/29 17:29:07 [21362] (22528.0) gm state change: GM_START -> GM_REGISTER
10/29 17:19:15 [21362] Updating classad values for 22528.0:
10/29 17:19:15 [21362]    GridftpUrlBase = 
"gsiftp://srvgrid01.offis.uni-oldenburg.de:20000";
10/29 17:19:15 [21362]    GlobusDelegationUri = 
"https://juggle-glob.fz-juelich.de:8443/wsrf/services/DelegationService?534c0150-a5d5-11dd-afec-95dd6b917206";
10/29 17:19:15 [21362]    JobLeaseExpiration = 1225340355
10/29 17:19:15 [21362]    GridJobId = "gt4 
uuid:548f56c0-a5d5-11dd-90ab-d457d1ca8d16"
10/29 17:19:15 [21362] leaving doContactSchedd()
...
...
10/29 17:29:07 [21362] GAHP[21368] <- 'CACHE_PROXY_FROM_FILE 1 
/tmp/x509up_u1002'
10/29 17:29:07 [21362] GAHP[21368] -> 'S'
10/29 17:29:07 [21362] GAHP[21368] <- 'USE_CACHED_PROXY 1'
10/29 17:29:07 [21362] GAHP[21368] -> 'S'
10/29 17:29:07 [21362] GAHP[21368] <- 'GT4_GRAM_JOB_CALLBACK_REGISTER 9 
https://juggle-glob.fz-juelich.de:8443/wsrf/services/ManagedExecutableJobService?548f56c0-a5d5-11dd-90ab-d457d1ca8d16 
1'
10/29 17:29:07 [21362] GAHP[21368] -> 'S'
10/29 17:29:07 [21362] *** checkDelegation()
10/29 17:29:07 [21362]     refreshing 
https://juggle-glob.fz-juelich.de:8443/wsrf/services/DelegationService?534c0150-a5d5-11dd-afec-95dd6b917206
10/29 17:29:07 [21362] GAHP[21368] <- 'GT4_REFRESH_CREDENTIAL 10 
https://juggle-glob.fz-juelich.de:8443/wsrf/services/DelegationService?534c0150-a5d5-11dd-afec-95dd6b917206'
10/29 17:29:07 [21362] GAHP[21368] -> 'S'
10/29 17:29:07 [21362] (22529.0) doEvaluateState called: gmState 
GM_PROXY_EXPIRED, globusState
10/29 17:29:07 [21362] (22529.0) gm state change: GM_PROXY_EXPIRED -> 
GM_START
10/29 17:29:07 [21362] (22529.0) gm state change: GM_START -> 
GM_GENERATE_ID
10/29 17:29:07 [21362] (22529.0) gm state change: GM_GENERATE_ID -> 
GM_SUBMIT_ID_SAVE
10/29 17:29:07 [21362] (22529.0) gm state change: GM_SUBMIT_ID_SAVE -> 
GM_SUBMIT  
Then it submits the new job using the same, allegedly refreshed, 
delegation resource (as I can see in the logged RSL, not quoted here).

The Globus error message hidden in the monstrous NoSuchResourceException 
stack trace reads "Error getting delegation resource", so maybe GT4 has 
discarded the expired delegation resource in the meanwhile. It would 
certainly be legitimate for it to do it, as it can't keep expired 
jobs/delegation resources on the server forever. If that's the case, it is 
a bit strange that the GT4_REFRESH_CREDENTIAL step before the actual job 
submission does not report an error yet. Also, the problem of rescuing a 
job whose proxy expired during its lifetime would then appear unsolvable 
in general. However, I believe that the newly submitted job should NOT be 
referencing the old (expired) delegation resource; instead Condor should 
be smart enough to create a new delegation resource instead of trying in 
vain to refresh the old one.

I hope that Jaime or a colleague responsible for Condor-GT4 integration is 
reading this and can confirm/record/fix it. It's not urgent, as it can be 
worked around by never allowing proxies to expire during a job's lifetime 
(i.e. by relying on MyProxy refresh).

Regards,
Jan Ploski

--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
FuE Bereich Energie | R&D Division Energy
Escherweg 2  - 26121 Oldenburg - Germany
Phone/Fax: +49 441 9722 - 184 / 202
E-Mail: Jan.Ploski@xxxxxxxx
URL: http://www.offis.de