[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Low priorities vs. partitionable slots



Hi Timm,

have you tried setting the quota for User A to 0, but allow regrouping? This way, User A does not get any slots reserved, but can pick up slots that are not claimed by other users.

Cheers,
Max

> Am 30.01.2017 um 16:58 schrieb Steven C Timm <timm@xxxxxxxx>:
> 
> I am wondering if there is any negotiator setting whereby users of a certain high (bad) enough priority will absolutely not get negotiated.  The following is my issue:
> 
> 0) the cluster is set up with partitionable slots.  Pre-emption is disabled.
> 
> 1) user F is the primary user of the cluster with prio_factor of 1.  That priority factor is better than all the rest of the users of the cluster such that if they kept submitting jobs continously, they would always be able to claim the whole cluster.
> They run exclusively requesting 8-core slots.
> 
> 2) User A is an opportunistic user with prio_factor of 10^18.  They request single-core partitionable slots.  They only manage to get any of them when user F does not have enough jobs to keep the cluster full.
> 
> 3) At the moment user A has 1554 single-core slots out of a pool of 21784 cores, and effective prio factor is 1.10x10^21.
> User F has effective prio factor of 12612,  current resource count of 15256, and 1000 more jobs pending.
> 
> 4) The negotiator rather chooses to let more jobs from user A start on the existing single-core slots.
> 
> 5) There used to be, I thought, a priority cutoff in the negotator such that in cases of extreme load such as this the low-priority users would not even be considered.  I can't find it now. 
> 
> 6) the condor_defrag daemon is configured with following settings:
> 
> DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, DEFRAG, GANGLIAD, HAD, REPLICATION
> DEFRAG = $(LIBEXEC)/condor_defrag
> DEFRAG_CANCEL_REQUIREMENTS = $(DEFRAG_WHOLE_MACHINE_EXPR)
> DEFRAG_DRAINING_MACHINES_PER_HOUR = 2.0
> DEFRAG_DRAINING_SCHEDULE = graceful
> DEFRAG_INTERVAL = 3600
> DEFRAG_LOG = $(LOG)/DefragLog
> DEFRAG_MAX_CONCURRENT_DRAINING = 5
> DEFRAG_MAX_WHOLE_MACHINES = 20
> DEFRAG_NAME = 
> DEFRAG_RANK = -ExpectedMachineGracefulDrainingBadput
> DEFRAG_REQUIREMENTS = PartitionableSlot && Offline=!=True
> DEFRAG_STATE_FILE = $(LOCK)/defrag_state
> DEFRAG_UPDATE_INTERVAL = 300
> DEFRAG_WHOLE_MACHINE_EXPR = (Cpus >= 4)
> 
> As far as I can tell from the logs at this setting the DEFRAG daemon has never defragged anything.
> 
> 7) My question is twofold:
> a) is there a way to get the defrag daemon working preferentially to defrag the single-core slots which are only ever used by low-priority opportunistic users
> b) is there a way either temporarily or permanently to make sure that the jobs of opportunistic users at some priority factor differential do not get negotiated as long as there are jobs of much higher priority in the queue?  In this scenario the single core slots which run fairly short jobs in general would exit on their own.
> 
> Steve Timm
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature