[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Low priorities vs. partitionable slots



Ok, first of all, there is no priority cutoff.   If a low priority user is the only one that has jobs that match available resources, then that user WILL get negotiated for, will match and their jobs will start.

 

Imagine if this were not true, and you had a high priority user submit a job that could never match ANY of your resources (maybe it requires a WINDOWS machine),  you would not want the existence of that job to drain your entire pool, but that’s exactly what it would do if there were a way to stop negotiation for low priority users if a high priority user had unmatched jobs.

 

All the ways that HTCondor has to deal with your use case involve enabling preemption in some form.  The most straightforward way is to just enable pslot-preemption.  The Negotiator will then pick a machine with at least 8-cores, and evict the 1-core jobs from that machine and run the high priority 8-core job instead.

 

Now, we realize that users don’t exactly *like* having their jobs preempted,  so you could choose to set MaxJobRetirementTime for the low priority jobs to some insanely large number (a year maybe).  In that case, the Negotiator would still match the high priority 8-core job with a specific machine, and put all of the 1-core slots on that machine into preempting/retiring state.  The 1-core jobs would be allowed to finish, but then the slots would be reclaimed back into the partitionable slot – once all of the 1-core jobs finished the 8-core job would run.

 

The downside with this configuration is that the 8-core job will have to wait for the slowest of the 1-core jobs on that machine, but in the meantime the rest of the pool would not be prevented from matching new jobs – even low priority ones.

 

Now, as to your question about the defrag daemon configuration.   If you want to drain only machines that have at least one single-core slot, you could do that by changing the DEFRAG_REQUIREMENTS _expression_ so that it only matches machines like that.

 

DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True && Min(ChildCpus) == 1

 

I think you also probably want to define a whole machine as at least 8 free cores.

 

DEFRAG_WHOLE_MACHINE_EXPR = (Cpus >= 8)

 

-tj

 

 

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Steven C Timm
Sent: Monday, January 30, 2017 9:58 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Low priorities vs. partitionable slots

 

I am wondering if there is any negotiator setting whereby users of a certain high (bad) enough priority will absolutely not get negotiated.  The following is my issue:

 

0) the cluster is set up with partitionable slots.  Pre-emption is disabled.

 

1) user F is the primary user of the cluster with prio_factor of 1.  That priority factor is better than all the rest of the users of the cluster such that if they kept submitting jobs continously, they would always be able to claim the whole cluster.

They run exclusively requesting 8-core slots.

 

2) User A is an opportunistic user with prio_factor of 10^18.  They request single-core partitionable slots.  They only manage to get any of them when user F does not have enough jobs to keep the cluster full.

 

3) At the moment user A has 1554 single-core slots out of a pool of 21784 cores, and effective prio factor is 1.10x10^21.

User F has effective prio factor of 12612,  current resource count of 15256, and 1000 more jobs pending.

 

4) The negotiator rather chooses to let more jobs from user A start on the existing single-core slots.

 

5) There used to be, I thought, a priority cutoff in the negotator such that in cases of extreme load such as this the low-priority users would not even be considered.  I can't find it now.

 

6) the condor_defrag daemon is configured with following settings:

 

DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, DEFRAG, GANGLIAD, HAD, REPLICATION
DEFRAG = $(LIBEXEC)/condor_defrag
DEFRAG_CANCEL_REQUIREMENTS = $(DEFRAG_WHOLE_MACHINE_EXPR)
DEFRAG_DRAINING_MACHINES_PER_HOUR = 2.0
DEFRAG_DRAINING_SCHEDULE = graceful
DEFRAG_INTERVAL = 3600
DEFRAG_LOG = $(LOG)/DefragLog
DEFRAG_MAX_CONCURRENT_DRAINING = 5
DEFRAG_MAX_WHOLE_MACHINES = 20
DEFRAG_NAME =
DEFRAG_RANK = -ExpectedMachineGracefulDrainingBadput
DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True
DEFRAG_STATE_FILE = $(LOCK)/defrag_state
DEFRAG_UPDATE_INTERVAL = 300
DEFRAG_WHOLE_MACHINE_EXPR = (Cpus >= 4)

As far as I can tell from the logs at this setting the DEFRAG daemon has never defragged anything.

7) My question is twofold:
a) is there a way to get the defrag daemon working preferentially to defrag the single-core slots which are only ever used by low-priority opportunistic users
b) is there a way either temporarily or permanently to make sure that the jobs of opportunistic users at some priority factor differential do not get negotiated as long as there are jobs of much higher priority in the queue?  In this scenario the single core slots which run fairly short jobs in general would exit on their own.

Steve Timm