Ok, first of all, there is no priority cutoff. If a low priority user is the only one that has jobs that match available resources, then that user WILL get negotiated for,
will match and their jobs will start.
Imagine if this were not true, and you had a high priority user submit a job that could never match ANY of your resources (maybe it requires a WINDOWS machine), you would
not want the existence of that job to drain your entire pool, but that’s exactly what it would do if there were a way to stop negotiation for low priority users if a high priority user had unmatched jobs.
All the ways that HTCondor has to deal with your use case involve enabling preemption in some form. The most straightforward way is to just enable pslot-preemption. The Negotiator
will then pick a machine with at least 8-cores, and evict the 1-core jobs from that machine and run the high priority 8-core job instead.
Now, we realize that users don’t exactly *like* having their jobs preempted, so you could choose to set MaxJobRetirementTime for the low priority jobs to some insanely
large number (a year maybe). In that case, the Negotiator would still match the high priority 8-core job with a specific machine, and put all of the 1-core slots on that machine into preempting/retiring state. The 1-core jobs would be allowed to finish,
but then the slots would be reclaimed back into the partitionable slot – once all of the 1-core jobs finished the 8-core job would run.
The downside with this configuration is that the 8-core job will have to wait for the slowest of the 1-core jobs on that machine, but in the meantime the rest of the pool would
not be prevented from matching new jobs – even low priority ones.
Now, as to your question about the defrag daemon configuration. If you want to drain only machines that have at least one single-core slot, you could do that by changing
the DEFRAG_REQUIREMENTS _expression_ so that it only matches machines like that.
DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True && Min(ChildCpus) == 1
I think you also probably want to define a whole machine as at least 8 free cores.
DEFRAG_WHOLE_MACHINE_EXPR = (Cpus >= 8)
I am wondering if there is any negotiator setting whereby users of a certain high (bad) enough priority will absolutely not get negotiated. The following is my issue:
0) the cluster is set up with partitionable slots. Pre-emption is disabled.
1) user F is the primary user of the cluster with prio_factor of 1. That priority factor is better than all the rest of the users of the cluster such that if they kept submitting jobs continously,
they would always be able to claim the whole cluster.
They run exclusively requesting 8-core slots.
2) User A is an opportunistic user with prio_factor of 10^18. They request single-core partitionable slots. They only manage to get any of them when user F does not have enough jobs to keep the
3) At the moment user A has 1554 single-core slots out of a pool of 21784 cores, and effective prio factor is 1.10x10^21.
User F has effective prio factor of 12612, current resource count of 15256, and 1000 more jobs pending.
4) The negotiator rather chooses to let more jobs from user A start on the existing single-core slots.
5) There used to be, I thought, a priority cutoff in the negotator such that in cases of extreme load such as this the low-priority users would not even be considered. I can't find it now.
6) the condor_defrag daemon is configured with following settings:
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, DEFRAG, GANGLIAD, HAD, REPLICATION