[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs fetched with a hook being killed after 20 minutes



> When you turn on D_FULLDEBUG you don't see the startd at least
> attempting to fix up the lease time?

Whoops. You bet.

> You can also throw D_DAEMONCORE into the mix to see what is being done
> with the timer.

Done. Send me a private email and I'll ship you a tarball with the logs.
I don't want to past them in here. I've included a snippet below.

- Ian


3/26 11:47:00 HookClient /tools/arc/scripts/hooks/arc_job_fetch (pid
32691) exited with status 0
3/26 11:47:00 Warning, hook /tools/arc/scripts/hooks/arc_job_fetch (pid
32691) printed to stderr: DEBUG: Slot State="Unclaimed"
Found job 40900
Cmd = "/tools/arc/scripts/arc_execute.sh"
Owner = "ichesal"
Args = "40900"
JobUniverse = 5
Requirements = True
JobLeaseDuration = 60
ClusterId = 40900
ProcId = 0
ARCJob = 40900
IWD = "/data/ichesal/arc/sleeper"
Out = "/data/ichesal/job/20090326/1100/40900/stdout.txt"
Err = "/data/ichesal/job/20090326/1100/40900/stderr.txt"

3/26 11:47:00 Rank of this fetched claim is: 1.000000
3/26 11:47:00 Hook ARC_HOOK_REPLY_FETCH: UNDEFINED
3/26 11:47:00 State change: Finished fetching work successfully
3/26 11:47:00 Set destination state to Claimed
3/26 11:47:00 Changing state: Unclaimed -> Claimed
3/26 11:47:00 Warning: starting ClaimLease timer before lease duration
set.
3/26 11:47:00 in DaemonCore NewTimer()
3/26 11:47:00
3/26 11:47:00 DaemonCore--> Timers
3/26 11:47:00 DaemonCore--> ~~~~~~
3/26 11:47:00 DaemonCore--> id = 9, when = 1238093223, period = 0,
handler_descrip=<do_update>
3/26 11:47:00 DaemonCore--> id = 0, when = 1238093227, period = 120,
handler_descrip=<check_parent>
3/26 11:47:00 DaemonCore--> id = 8, when = 1238093279, period = 60,
handler_descrip=<eval_and_update_all>
3/26 11:47:00 DaemonCore--> id = 10, when = 1238093279, period = 0,
handler_descrip=<dc_touch_log_file>
3/26 11:47:00 DaemonCore--> id = 5, when = 1238093459, period = 240,
handler_descrip=<self_monitor>
3/26 11:47:00 DaemonCore--> id = 3, when = 1238093519, period = 300,
handler_descrip=<check_session_cache>
3/26 11:47:00 DaemonCore--> id = 7, when = 1238094389, period = 1170,
handler_descrip=<DaemonCore::SendAliveToParent>
3/26 11:47:00 DaemonCore--> id = 12, when = 1238094420, period = 0,
handler_descrip=<Claim::leaseExpired>
3/26 11:47:00 DaemonCore--> id = 4, when = 1238095020, period = 1801,
handler_descrip=<handle_cookie_refresh>
3/26 11:47:00 DaemonCore--> id = 11, when = 1238122019, period = 0,
handler_descrip=<dc_touch_lock_files>
3/26 11:47:00 DaemonCore--> id = 6, when = 1238122595, period = 29383,
handler_descrip=<DaemonCore::refreshDNS()>
3/26 11:47:00
3/26 11:47:00 leaving DaemonCore NewTimer, id=12
3/26 11:47:00 Started ClaimLease timer (12) w/ 1200 second lease
duration
3/26 11:47:00 Remote job ID is 40900.0
3/26 11:47:00 In reset_timer(), id=12, time=1200, period=0
3/26 11:47:00 in DaemonCore NewTimer()
3/26 11:47:00 In cancel_timer(), id=12
3/26 11:47:00 Renewing timer id 12 for 1200 secs
3/26 11:47:00

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.