[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] SUSPEND/CONTINUE puzzle



condor -version
$CondorVersion: 7.5.3 Jun 24 2010 BuildID: 250654 $
$CondorPlatform: INTEL-WINNT50 $

Given the Windows platform, I implement a SUSPEND policy. If the keyboard is touched in the last 5 minutes, or if the non-Condor load reaches a high value, I want to SUSPEND the job. Then CONTINUE the job when the keyboard is untouched for 5 minutes and the load is below the limit.

Unfortunately I have something wrong and the jobs SUSPEND/CONTINUE every 5 seconds:

07/12/11 16:32:21 slot1: Sent update to 1 collector(s)
07/12/11 16:32:22 slot1: State change: SUSPEND is TRUE
07/12/11 16:32:22 slot1: Changing activity: Busy -> Suspended
07/12/11 16:32:22 slot1: In Starter::kill() with pid 5372, sig 100 (DC_SIGSUSPEND)
07/12/11 16:32:23 slot1: Received job ClassAd update from starter.
07/12/11 16:32:26 Trying to update collector <123.456.78.910:9618>
07/12/11 16:32:26 Attempting to send update via UDP to collector delta-mod.water.ca.gov <123.456.78.910:9618>
07/12/11 16:32:26 slot1: Sent update to 1 collector(s)
07/12/11 16:32:27 slot1: State change: CONTINUE is TRUE
07/12/11 16:32:27 slot1: In Starter::kill() with pid 5372, sig 101 (DC_SIGCONTINUE)
07/12/11 16:32:27 slot1: Changing activity: Suspended -> Busy
07/12/11 16:32:27 slot1: Received job ClassAd update from starter.


Attempting to debug this, I set

STARTD_DEBUG        = D_FULLDEBUG

While this does give more information (see above), it doesn't state why Condor decides to SUSPEND or CONTINUE a job.  And that piece of information I need to see what is wrong with my condition statement.  What can I do to see why Condor is changing the state of a job?

Ralph Finch
Calif. Dept. of Water Resources
Sacramento, CA USA