[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Fill nodes breadth-first



On 4/6/12 6:00 PM, Todd Tannenbaum wrote:
> On 4/6/2012 4:00 PM, Sarah Williams wrote:
>> Hi,
>>
>> I was following this recipe to enable breadth-first filling of nodes on
>> the cluster:
>> https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToSteerJobs
>>
>> I added this to my condor_config.local files and ran condor_reconfig:
>> NEGOTIATOR_POST_JOB_RANK = isUndefined(RemoteOwner) * (KFlops - SlotID)
>>
>> I can see in the Negotiator log that it took effect, but it is still
>> filling all the slots on one host before moving to another. Any ideas
>> why?
>>
> 
> Hi Sarah,
> 
> Regards from Wisconsin. One possibility:  do most of your submitted jobs
> specify their own Rank?  As explained in the HOWTO, the Rank specified
> in the submit file will trump whatever NEGOTIATOR_POST_JOB_RANK says. If
> you want your breadth-first rule to trump whatever your users request,
> use NEGOTIATOR_PRE_JOB_RANK instead.

In condor_q -long, I see:
Rank = 0.0
Would that be the same as unspecified?


> Another possibility: perhaps for whatever reason the machines in your
> pool have a lot of small variance in the reported kflops ?    I think
> the above expression will breadth-first fill across machines with the
> same kflops.  Take a peek at the output from
>    condor_status -server -sort kflops
> and see if the reported kflops value slightly varies every few
> machines... and/or if on Unix you could do
>    condor_status -format "%d" kflops | sort | uniq | wc -l
> to see how many different "classes" of kflops machines you have.  

This may be it, because they're all over the place. ( For the sake of
folks googling this, I'll mention the command needs a line return:
condor_status -format "%d\n" kflops | sort | uniq | wc -l    )

> If
> large, perhaps you'd prefer something like:
>   NEGOTIATOR_POST_JOB_RANK = isUndefined(RemoteOwner) * (500 - SlotID)
> to simply ignore the kflops value.
> (I know on a pool here at UW-Madison with 1951 slots, there are 163
> different kflop values reported....)
> 
> hope the above makes sense,

It does indeed!  I've set both PRE and POST to isUndefined(RemoteOwner)
* (TotalCpus - SlotID) to see what that does, and it seems to be working
while favoring the 24-core nodes, which is good.

> regards,
> Todd
> 
> p.s. Extra credit:  for the real Condor geeks, another approach would be
> to bucket the kflops value in the NEGOTIATOR_POST_JOB_RANK expression,
> so this breadth-first recipe would still work even if the reported
> kflops varies by some small value like 30k or so. In Condor v7.7.6 (to
> be released next week) this is a spiffy quantize() ClassAd function to
> assist in this sort of bucketing, so in Condor v7.7.6 you could do:
> 
>   NEGOTIATOR_POST_JOB_RANK = isUndefined(RemoteOwner) *
> (quantize(kflops,{30000}) - SlotID)
> 
> Maybe I'll update this HOWTO recipe based on your feedback (or this is
> open source, feel free to ask for a condor-wiki account by emailing
> condor-admin@xxxxxxxxxxx, and then you could edit the recipe yourself!)...

Will do!

> 
> 
> 
>> --Sarah
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
> 
>