[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fill the htcondor pool breadth first



Hello Experts,

This issue is easily reproducible, if we submit a single job in a batch, all the batches are going to the same node. If we submit multiple jobs in a batch then they are using different worker nodes.Â

Changing RANK or changing the following _expression_ doesn't help.

# condor_config_val NEGOTIATOR_PRE_JOB_RANK
(10000000 * My.Rank) + (1000000 * (RemoteOwner =?= UNDEFINED)) - (100000 * Cpus) - Memory

# condor_config_val NEGOTIATOR_POST_JOB_RANK
(RemoteOwner =?= UNDEFINED) * (ifthenElse(isUndefined(KFlops), 1000, Kflops) - SlotID - 1.0e10*(Offline=?=True))



Thanks & Regards,
Vikrant Aggarwal


On Tue, Feb 27, 2024 at 11:47âAM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

In htcondor pool with dynamic slots followed article [1] to fill the pool breadth first still sched is running the jobs on a single worker machine (Let's say if batch of 10 jobs is submitted all the 10 jobs are landing on one worker node instead of spreading across available 5-6 worker nodes). These jobs are I/O intensive for local disk hence we want to distribute them across worker nodes. Anything else I need to do to make it work reliably?Â

Just in case it matters, this sched is also used to flock the jobs towards the pool where we want to fill the depth first however in the job requirement pool (primary master pool) with breadth first configuration is mentioned hence not sure whether flock part is relevant or not.


[1] https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToFillPoolBreadthFirst


Thanks & Regards,
Vikrant Aggarwal