[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RANK question; slots unclaimed on a single computer



Ben and Todd, thanks much for the helpful replies.

A little more explanation:

We have a single Windows7x64 HTC pool of about 90 cores among about 20 deskside machines. Most have 4 cores, but two have 16 cores, and those two have clock speeds of 2.2 and 3.3 GHz, enough to make a difference in job speed. Typical jobs are 1000+ at a time, and each one takes 5-10 mins to finish. Of the 90 cores, up to 75-80 are usable, the remaining don't have enough disk space or their HTC install is boogered and I need to fix that.

We've noticed that as the last core is claimed and run on the 4-core machines, interactive use becomes fairly dismal.

Let's say my submit job wants 65 slots and there are 80 available. To avoid botherimg people I want to avoid claiming their last slot if possible. But I don't want to ignore kflops, as there's a noticeable difference in task throughput if I claim faster or slower machines. Thus, my imaginary rank:

RANK = Kflops * $(NumberOfCoresUnclaimed)

would do what I want. However

RANK = 10000 - SlotID

would do the opposite I think, as it would claim all the 4-core machines first, lastly the 16 core machines. But perhaps something like

RANK = SlotID * Kflops

would be simple and achieve my goal. I'll ponder more both your suggestions.

Ralph Finch
Calif. Dept. of Water Resources

On Fri, Aug 8, 2014 at 9:29 AM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 8/7/2014 5:03 PM, Ralph Finch wrote:
In the condor submit file, I want jobs to go to desktop computers as a
function of both computer speed and number of cores left
something like

RANK = Kflops * $(NumberOfCoresUnclaimed)


Curious, why do you want to rank based on number of cores left? Is the idea that you want to fill machines breadth first (i.e. to run eight jobs, you want to use one core on eight machines instead of eight cores on one machine) ? Â If you want run breadth first with static slots, perhaps you could just do something really simple like

 RANK = 10000 - SlotID

so that you prefer to use Slot1 on all machines, then Slot2 on all machines, etc. ÂYou could even stick this into NEGOTIATOR_POST_JOB_RANK so it happens for all jobs, without users having to put anything in their submit files...


My question is, how do I get the number of unclaimed cores in a single
computer?

I think Ben answered that nicely, assuming you are using static slots of course.

regards,
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/