[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Copying job attrs into slot attrs



What might help is evaluating in a breadth first rather than depth first order on slots, typically how most people want to work anyway.
So negotiate 1 slot from every machine in the first cycle, then a slot from everything with more slots left in, repeat till finished.

This would reduce the probability of an incorrect match if negotiation was sufficiently slow to leave time for an update on the collector in the interim. I think in many cases negotiation is cases this would not actually help unless you put in an explicit delay after each cycle though.

The rejection idea, coupled with this wouldn't be too bad because the top priority jobs (as defined by their being assigned first) would be much less likely to be assigned to the same machine, and thus to then end up being rejected (and subsequently getting behind other jobs they are unable to pre-empt)

I'd still take total control, but that's just me :)

Matt

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 08 December 2009 17:29
To: Condor-Users Mail List
Subject: Re: [Condor-users] Copying job attrs into slot attrs

> I'm inclined to think that the only way to make what people
> expect to happen, happen is to negotiate one slot at a time
> on an SMP machine and let the admin indicate that negotiation
> should not occur on that machine again till it spots an
> update on the machine indicating the start (or failure) of
> said job or a timeout occurs.

I think it's not even that complicated. You just don't want another slot on the machine to try and run any new jobs until its certain all the other slots have updated their ads and are in some steady state. This means you can get mis-matches, where the negotiator thought it could run 4 jobs on a machine, but it could only really run 2, the other 2 being rejected because the graphic cards were used up and the negotiator didn't know about it.

That would work, probably be easier to implement.

A slot, assigned a job, doesn't try to evaluate if it can run it until all the other slots are done evaluating if they can run their jobs. Do it round robin, slot 1 --> slot 2 --> slot 3 --> etc., for simplicity and you'd probaby be close enough to decent solution for this problem.

So you accept that the negotiator might get a match wrong from time to time, putting jobs to machines that will reject them when they get there, but you have the promise that the machine won't ever get it wrong. It'll always operate in the correct state.

I'd be okay with that kind of solution. Hopefully I explained it clearly!

- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----