[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Can't find WANT_SUSPEND in internal ClassAd



I'm getting the error in the subject line from some of my machines when parallel universe jobs try to start on them. Any idea what might be causing it? After a few false starts like these, the jobs usually run. All of the machines in the parallel pool are similar and are running the same Condor config.

______________________________


This is an automated email from the Condor system
on machine "[machine]".  Do not reply.

"/lusr/condor/sbin/condor_startd" on "[machine]" exited with status 4.
Condor will automatically restart this process in 13 seconds.

*** Last 20 line(s) of file StartLog:
2/21 10:31:41 vm5: Changing state: Owner -> Unclaimed
2/21 10:31:41 vm6: State change: IS_OWNER is false
2/21 10:31:41 vm6: Changing state: Owner -> Unclaimed
2/21 10:31:41 vm7: State change: IS_OWNER is false
2/21 10:31:41 vm7: Changing state: Owner -> Unclaimed
2/21 10:31:41 vm8: State change: IS_OWNER is false
2/21 10:31:41 vm8: Changing state: Owner -> Unclaimed
2/21 10:36:18 DaemonCore: Command received via TCP from host <[IP]:53561>
2/21 10:36:18 DaemonCore: received command 442 (REQUEST_CLAIM), calling handler (command_request_claim)
2/21 10:36:18 vm1: Request accepted.
2/21 10:36:18 vm1: Remote owner is DedicatedScheduler@[machine].cs.utexas.edu
2/21 10:36:18 vm1: State change: claiming protocol successful
2/21 10:36:18 vm1: Changing state: Unclaimed -> Claimed
2/21 10:36:18 ERROR "Can't find WANT_SUSPEND in internal ClassAd" at line 992 in file Resource.C 2/21 10:36:18 vm1: Changing state and activity: Claimed/Idle -> Preempting/Killing
2/21 10:36:18 vm1: State change: No preempting claim, returning to owner
2/21 10:36:18 vm1: Changing state and activity: Preempting/Killing -> Owner/Idle
2/21 10:36:18 vm1: State change: IS_OWNER is false
2/21 10:36:18 vm1: Changing state: Owner -> Unclaimed
2/21 10:36:18 startd exiting because of fatal exception.
*** End of file StartLog