[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs matched to slots in OWNER state when startd_cron used



Hi Tom,

There is a race condition between decisions made by the negotiator and the startd. The negotiator makes its decisions based on the state of machines as it observes them at the beginning of each negotiation cycle and the startd makes its decisions based on its current state.

The behavior you describe makes me wonder if the Startd_Cron_Health attribute is not getting published in a timely manner. You can query the published state and the current state by doing something like this:

# published state
condor_status -f "%s\n" Startd_Cron_Health <machine>
# current state
condor_status -f "%s\n" Startd_Cron_Health -direct <machine>

--Dan

On 4/18/12 3:54 PM, Tom Rockwell wrote:
Hi,

I'm wanting to implement a "startd_cron" job that does some checks on batch node "health" and provides results into the node's ClassAd and for use in the START macro.

I have this going in a test setup and jobs are not started when the "health" result is false, so it is mostly working.

However, I see that when the health status is included in START, that the node/slots go into OWNER state for 5 minutes after the startd is launched and also after health transitions from False to True.  This isn't so bad, except that jobs are being matched to the slots during this 5 minute time and then fail to start.  This seems like wasted work that might lead to problems at larger scale.  I've talked with admins at another site that uses this mechanism and they see the same 5 minute periods of slots in OWNER but don't get jobs matched during this time.

I have a mix of condor versions in the test setup: startd is 7.6.6, schedd is 7.4.2 and the collector is on 7.6.0

The START macro looks like:

START = ( Startd_Cron_Health =?= True )

Any suggestions on how to avoid the matches to slots in OWNER state?  My next guess to try is a later condor version on for the schedd.

Thanks,
Tom Rockwell
Michigan State U.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/