[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Jobs fetched with a hook being killed after 20 minutes



In a nutshell: they're being axed because the startd thinks the claim
has timed out. From the StartLog:

3/25 13:01:15 Return from HandleReq <HandleChildAliveCommand> (handler:
0.000s, sec: 0.001s)
3/25 13:01:45 State change: claim lease expired (condor_schedd gone?)
3/25 13:01:45 Changing state and activity: Claimed/Busy ->
Preempting/Killing
3/25 13:01:45 Calling Handler <receiveJobClassAdUpdate>
3/25 13:01:45 Return from Handler <receiveJobClassAdUpdate>
3/25 13:01:45 DaemonCore: pid 21416 exited with status 0, invoking
reaper 3 <reaper>
3/25 13:01:45 Starter pid 21416 exited with status 0
3/25 13:01:45 State change: starter exited
3/25 13:01:45 State change: No preempting claim, returning to owner
3/25 13:01:45 Changing state and activity: Preempting/Killing ->
Owner/Idle
3/25 13:01:45 State change: IS_OWNER is false
3/25 13:01:45 Changing state: Owner -> Unclaimed

Second line in that output says it all really. I did not have a schedd
running in this pool. Didn't think I needed one because hooks were
fetching the work for me. I did start one but that hasn't stopped the
problem from occurring. The lease is still expiring.

Right now the jobs are not passing a JobLeaseDuration attribute when the
fetch work hook assigns them to the machine.

I have no other hooks currently defined. Only a fetch work hook.

>From my configs:

ALIVE_INTERVAL = 239
MAX_CLAIM_ALIVES_MISSED = 6
MaxJobRetirementTime = 2147483640
PREEMPT = False

I set no JobLeaseDuration default in any config files so that *should*
mean it's undefined. So my lease duration should be 6 * 239 = 1434 =~ 24
minutes. But I'm seeing the claim end at exactly 20 minutes. Making me
think JobLeaseDuration is defaulting to 20 for my jobs. Either I'd like
to stop the claim from expiring.

When I'm fetching jobs with a hook should I make MAX_CLAIM_ALIVES_MISSED
be some ridiculously large integer? Is there a more elegant way to
prevent the claim from expiring? This approach seems a mite hack-ish.

Thanks!

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.