[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] jobs matched to slots in OWNER state when startd_cron used



Hi,

I'm wanting to implement a "startd_cron" job that does some checks on batch node "health" and provides results into the node's ClassAd and for use in the START macro.

I have this going in a test setup and jobs are not started when the "health" result is false, so it is mostly working.

However, I see that when the health status is included in START, that the node/slots go into OWNER state for 5 minutes after the startd is launched and also after health transitions from False to True.  This isn't so bad, except that jobs are being matched to the slots during this 5 minute time and then fail to start.  This seems like wasted work that might lead to problems at larger scale.  I've talked with admins at another site that uses this mechanism and they see the same 5 minute periods of slots in OWNER but don't get jobs matched during this time.

I have a mix of condor versions in the test setup: startd is 7.6.6, schedd is 7.4.2 and the collector is on 7.6.0

The START macro looks like:

START = ( Startd_Cron_Health =?= True )

Any suggestions on how to avoid the matches to slots in OWNER state?  My next guess to try is a later condor version on for the schedd.

Thanks,
Tom Rockwell
Michigan State U.