[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Copying job attrs into slot attrs



> I'm inclined to think that the only way to make what people
> expect to happen, happen is to negotiate one slot at a time
> on an SMP machine and let the admin indicate that negotiation
> should not occur on that machine again till it spots an
> update on the machine indicating the start (or failure) of
> said job or a timeout occurs.

I think it's not even that complicated. You just don't want another slot on the machine to try and run any new jobs until its certain all the other slots have updated their ads and are in some steady state. This means you can get mis-matches, where the negotiator thought it could run 4 jobs on a machine, but it could only really run 2, the other 2 being rejected because the graphic cards were used up and the negotiator didn't know about it.

That would work, probably be easier to implement.

A slot, assigned a job, doesn't try to evaluate if it can run it until all the other slots are done evaluating if they can run their jobs. Do it round robin, slot 1 --> slot 2 --> slot 3 --> etc., for simplicity and you'd probaby be close enough to decent solution for this problem.

So you accept that the negotiator might get a match wrong from time to time, putting jobs to machines that will reject them when they get there, but you have the promise that the machine won't ever get it wrong. It'll always operate in the correct state.

I'd be okay with that kind of solution. Hopefully I explained it clearly!

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.