[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] peaceful node drain and shutdown
- Date: Wed, 13 Jul 2016 17:12:53 -0400
- From: Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] peaceful node drain and shutdown
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Date: 07/13/2016 04:28 PM
> You sure about this?
> I also recall the same behavior that Bob describes - if START goes
> FALSE instead of UNDEFINED, then the node transitions to Owner state,
> which then kills off running jobs.
> (Again, might have changed at some point)
I use to to manage machine oversubscription, among
However, I've always set PREEMPT to false and used
slots, so perhaps once I start trying to use preemption
going to fall apart on me.
For instance, I have the START _expression_ go false
when the load
average of the machine exceeds 125% of the CPU capacity
dynamic slots continue to run. Likewise if a remote
runs low on disk space. This has been quite handy.
Maybe I need to add a check of SlotType to limit this
of the START _expression_ to Partitionable slots only,
or look at
the state and activity so that a non-Unclaimed slot
into Owner and then try to preempt when low disk space
pulls my START _expression_ false?
Or is it just a matter of moving it to UNDEFINED instead