[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Bug with TotalJobSuspendTime?



condor -version
$CondorVersion: 6.7.13 Nov  7 2005 $
$CondorPlatform: INTEL-WINNT50 $

# suspend job on VM1 if keyboard is touched 
# and VM2 has a Condor job or high load;
# but don't suspend if job suspension time exceeds limit
SUSPEND  = (VirtualMachineID == 1) \
  && ($(KeyboardBusy) ) \
  && ( (vm2_Activity == "Busy") || (vm2_LoadAvg > $(HighLoad)) ) \
  && (TotalJobSuspendTime <= $(MaxSuspendTime))

The classad section above in our condor_config.local worked before
(6.7.13 I think) but doesn't now.  After some testing, I found the last
line involving TotalJobSuspendTime is the problem.  The behavior is
peculiar:

- If the job has never suspended and tries to, the StartLog reports this
error,

1/19 10:44:51 ERROR "Can't evaluate SUSPEND" at line 1061 in file
..\src\condor_startd.V6\Resource.C

and kills all jobs on that machine.

- If I comment out that line and reconfig condor on that machine, then
it suspends properly.
- If I then uncomment the line and reconfig again, it again suspends
properly.

In other words once TotalJobSuspendTime has been defined once, the line
works OK.

So then I tried this line:

&& (TotalJobSuspendTime =!= UNDEFINED) && (TotalJobSuspendTime <=
$(MaxSuspendTime))

but got the same error on new jobs.

 
Ralph Finch, P.E.
Dept. of Water Resources
Bay-Delta Office, Room 215-13
Sacramento, CA  95814
916-653-7552
rfinch@xxxxxxxxxxxx