[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] About choosing nodes over slots



Hi Max,

Thanks for your detailed reply.
We may try to play with the knobs.

Some thoughts on a general note (I assume this can be independent of the batch system e.g. HTCondor, LSF, SLURM, PBS).

Is there an advantage to submit jobs to fill all slots on a single node first? Hence perhaps the default choice?

Or perhaps from an admin point of view, it is better to manage jobs on single node first and then move to next node?

Or perhaps in the end it does not matter as one gets so many jobs in a relatively short time, that either way, all slots on all nodes would be eventually taken.

Thanks,
Vikas




On 8/14/17, 12:54 AM, "HTCondor-users on behalf of Fischer, Max (SCC)" <htcondor-users-bounces@xxxxxxxxxxx on behalf of max.fischer@xxxxxxx> wrote:

>Hi Vikas,
>
>rule of thumb: HTCondor has a knob for it. ;)
>
>Which node to run on is determined by the RANK settings of the negotiator, each job, and each worker node. For a start, let me just dump a comment from our local configuration:
>
># -- How does HTCondor schedule jobs? --
># Jobs are scheduled to StartDs via a sequence of filtering and sorting. Condor
># matches *jobs* to *workers*, not the other way arround! For each job, the
># following sequence is used:
>#  - Find all startds which match the job REQUIREMENT and vice versa
>#  - Sort startds by NEGOTIATOR_PRE_JOB_RANK
>#  - Sort startds by the job's RANK
>#  - Sort startds by NEGOTIATOR_POST_JOB_RANK
>#  - If preemtion is required, sort startds by PREEMPTION_RANK
>#  - Assign Job to the highest ranked startd
># Sorting preserves the order of the previous step, so later steps only
># have an effect if there was a tie.
>
>Each ALL_CAPS word is something set via configuration or on submission. Note that the startd (node) Rank never shows up here - if it is not used by the negotiator or job, it is ignored.
>If you have a fresh HTCondor, the most influential thing is NEGOTIATOR_PRE_JOB_RANK, which defaults to:
>	NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + (1000000 * (RemoteOwner =?= UNDEFINED)) - (100000 * Cpus) - Memory
>
>My.Rank : this is the ranking of the Startd. By default, it has the highest precedence. However, nodes probably all have the same policy.
>RemoteOwner =?= UNDEFINED : this means "avoid kicking out running jobs". If your cluster has free capacity, this basically means "do not kill running jobs".
>- Cpus : this prefers nodes with fewer *unused* cores. In effect, this means depth first filling.
>
>In other words, the default is to fill up a single node first before moving on to the next.
>
>There are a *lot* of knobs to tweak, and going through all their effects has a lot to do with your cluster setup. Most of the time, keeping a simple policy works best.
>The manual has some pretty decent info on configuration settings
>	http://research.cs.wisc.edu/htcondor/manual/current/3_5Configuration_Macros.html#SECTION004516000000000000000
>the effects and examples of scheduling policy configuration
>	http://research.cs.wisc.edu/htcondor/manual/current/3_7Policy_Configuration.html
>and how user/group priorities are used
>	http://research.cs.wisc.edu/htcondor/manual/current/3_6User_Priorities.html
>
>Cheers,
>Max
>
>> Am 13.08.2017 um 01:13 schrieb Bansal, Vikas <Vikas.Bansal@xxxxxxxx>:
>> 
>> Hi,
>> 
>> I have a condor batch system available at my site.
>> 
>> $ condor_version
>> $CondorVersion: 8.2.10 Oct 27 2015 $
>> $CondorPlatform: X86_64-CentOS_6.7 $
>> 
>> I have 3200 slots on it.
>> 100 nodes each with 32 slots.
>> 
>> I am wondering how jobs are scheduled to the nodes and slots.
>> 
>> E.g. If I have 100 jobs submitted in 1-5 minutes to the queue, how will they land up on the nodes/slots.
>> 
>> Will each job occupy one slot on a NEW node or will they fill up all 32 slots on one node and then move to next node and so on?
>> 
>> Is this configurable ? I.e. Which resource to choose first, node or slot?
>> 
>> Thanks for any help on this.
>> 
>> Vikas
>> 
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>