[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] peaceful node drain and shutdown



From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Date: 07/13/2016 04:28 PM
 
> You sure about this?
>
> I also recall the same behavior that Bob describes - if START goes to
> FALSE instead of UNDEFINED, then the node transitions to Owner state,
> which then kills off running jobs.
>
> (Again, might have changed at some point)

I use to to manage machine oversubscription, among other things.

However, I've always set PREEMPT to false and used partitionable
slots, so perhaps once I start trying to use preemption all that's
going to fall apart on me.

For instance, I have the START _expression_ go false when the load
average of the machine exceeds 125% of the CPU capacity while the
dynamic slots continue to run. Likewise if a remote filesystem
runs low on disk space. This has been quite handy.

Maybe I need to add a check of SlotType to limit this application
of the START _expression_ to Partitionable slots only, or look at
the state and activity so that a non-Unclaimed slot doesn't go
into Owner and then try to preempt when low disk space or whatever
pulls my START _expression_ false?

Or is it just a matter of moving it to UNDEFINED instead of
False?

        -Michael Pelletier.
_