[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how to use rank with memory



On 6/24/2021 8:42 PM, Myunggi Yi wrote:
Dear users,

I have installed HTCondor 9.0.1
I want the jobs to run on more memory machines.

I submitted a job with the following script, but the job always goes to node07, which has less memory.
How can I achieve my goal?


Hi,

A couple of thoughts:

1.  I assume the node with more memory was available (unclaimed) at the time you submitted your job?  Realize that setting up a Rank in your submit file only sorts amongst resources that are currently available (unless you tell HTCondor to allow preemption of jobs based on rank - you can do this, but in practice very few sites would want to).    Rank is really only helpful on pools that are lightly loaded.  On busy pools where most of the nodes are busy doing something all of the time, Rank becomes pretty useless without preemption, because at any given moment there may only be one or two free slots to pick from.  In these scenarios, you really want to use Requirements instead of Rank.

2. The administrator of your central manager gets to take first stab at ranking slots before the job does.  Basically, it is a multi-level sort where matching slots are first sorted by config knob NEGOTIATOR_PRE_JOB_RANK, then sub-sorted (i.e. ties from the first sort are handled) by the job's Rank _expression_, and then sub-sorted by config knob NEGOTIATOR_POST_JOB_RANK (details in the Manual at https://tinyurl.com/yzcr5ocq). Suggest you login to your central manager and enter the command:
    condor_config_val -v NEGOTIATOR_PRE_JOB_RANK
On my machine, this returns the following:
  NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + (1000000 * (RemoteOwner =?= UNDEFINED)) - (100000 * Cpus) - Memory
   # at: <Default>
Note that the final clause is referencing CPUs and Memory, in an attempt to do depth-first allocation of nodes.  On your pool, perhaps you want to get rid of this behavior by setting the following in the config of your central manager:  
  NEGOTIATOR_PRE_JOB_RANK = (10000000 * My.Rank) + (1000000 * (RemoteOwner =?= UNDEFINED))
and then doing a condor_reconfig as usual.

Hope the above helps,
Todd