[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Slots Understanding Question....

On Tuesday, 20 September, 2011 at 8:43 AM, David Rebatto wrote:

On 09/14/2011 08:43 PM, Ian Chesal wrote:
In your case I'd try:

NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * (KFlops - SlotID)

So that favours machines where the slot is unoccupied and the slot ID
is lower. That would fill all empty slot1 slots, then all empty slot2
slots and so on. Adjust to suit your needs.

I was looking for something similar, in order to avoid jobs from piling
up on the same machine while other ones are empty.
I'm afraid that your solution can work only if all the jobs have similar
It's not perfect, but it's generally considered to be good enough. In most cases the time to fill an empty system with jobs is less than the time it takes for those jobs to complete, same duration or not. So it works. If your system is completely full, the fill order doesn't really matter. This ranking has diminishing returns as the use of your system increases towards 100%. And the goal should be 100% use 100% of the time, no? I don't advise spending much time trying to optimize fill order for empty systems. 
If not (e.g. because jobs can fail randomly during execution, or because
the duration is not predictable at all), there's no relation between the
lower free slot ID and the load of the machine in terms of claimed slots.
In this case, it would be useful to rank against the number of free
slots, possibly weighted with other attributes, but I haven't figured
out how to do that... Any hint?
With the static allocation approach to slots there isn't a great way to do this as all slots are treated as relatively independent units on the machine. You can cross-share some information between slots, but there's a good chance that information is stale at the collector when you do matchmaking and you have to write some pretty hairy expressions to make it work. For example: you could cross-advertise the State attribute on a 2-slot machine and you'd end up with 

slot1_state = "…" in the slot2 ad
slot2_state = "…" in the slot1 ad

You could then write something like:

NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * ((SlotID == 1 && slot2_state == "unclaimed") || (SlotID == 2 && slot1_state == "unclaimed))

That's simplistic, but it's mainly there to prove my point: I don't think this approach isn't really viable. Perhaps you can write that more elegantly with ClassAd _expression_ functions, but that's a rabbit hole I don't want to go down.

If you use dynamic slots you've got a chance at making this work. You know TotalCPUs on the machine and you know how many CPUs are currently being advertised via the Cpus attribute. So TotalCPUs - Cpus gives you some idea of how busy the machine is with other work. You could write:

NEGOTIATOR_POST_JOB_RANK = (RemoteOwner =?= UNDEFINED) * KFlops * ifThenElse((TotalCpus - Cpus) == 0, 2, 1/(TotalCPUs - CPUs))

I think that would work out. I haven't tested it with a dynamic slot so proceed with caution. But the idea is the machines with more available remaining (and faster) CPUs would be at the top of the list. And we prefer completely empty machines before a machine with any other job on it.

Again though, I don't think this is worth much more time and effort to optimize. Diminishing returns and all that...

- Ian

Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools