[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Fill nodes breadth-first



On 4/6/2012 4:00 PM, Sarah Williams wrote:
Hi,

I was following this recipe to enable breadth-first filling of nodes on
the cluster:
https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSteerJobs

I added this to my condor_config.local files and ran condor_reconfig:
NEGOTIATOR_POST_JOB_RANK = isUndefined(RemoteOwner) * (KFlops - SlotID)

I can see in the Negotiator log that it took effect, but it is still
filling all the slots on one host before moving to another. Any ideas why?


Hi Sarah,

Regards from Wisconsin. One possibility: do most of your submitted jobs specify their own Rank? As explained in the HOWTO, the Rank specified in the submit file will trump whatever NEGOTIATOR_POST_JOB_RANK says. If you want your breadth-first rule to trump whatever your users request, use NEGOTIATOR_PRE_JOB_RANK instead.

Another possibility: perhaps for whatever reason the machines in your pool have a lot of small variance in the reported kflops ? I think the above expression will breadth-first fill across machines with the same kflops. Take a peek at the output from
   condor_status -server -sort kflops
and see if the reported kflops value slightly varies every few machines... and/or if on Unix you could do
   condor_status -format "%d" kflops | sort | uniq | wc -l
to see how many different "classes" of kflops machines you have. If large, perhaps you'd prefer something like:
  NEGOTIATOR_POST_JOB_RANK = isUndefined(RemoteOwner) * (500 - SlotID)
to simply ignore the kflops value.
(I know on a pool here at UW-Madison with 1951 slots, there are 163 different kflop values reported....)

hope the above makes sense,
regards,
Todd

p.s. Extra credit: for the real Condor geeks, another approach would be to bucket the kflops value in the NEGOTIATOR_POST_JOB_RANK expression, so this breadth-first recipe would still work even if the reported kflops varies by some small value like 30k or so. In Condor v7.7.6 (to be released next week) this is a spiffy quantize() ClassAd function to assist in this sort of bucketing, so in Condor v7.7.6 you could do:

NEGOTIATOR_POST_JOB_RANK = isUndefined(RemoteOwner) * (quantize(kflops,{30000}) - SlotID)

Maybe I'll update this HOWTO recipe based on your feedback (or this is open source, feel free to ask for a condor-wiki account by emailing condor-admin@xxxxxxxxxxx, and then you could edit the recipe yourself!)...



--Sarah
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
Condor Project Technical Lead          1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685