[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_defrag only some machines?

having let the pool run for a while longer, it does appear to have
pulled in some of the nodes that originally weren't.

so i guess what this really boils down to is that I don't understand what

DEFRAG_RANK = -ExpectedMachineGracefulDrainingBadput

really means as it relates to the current state of my pool

I can see ExpectedMachineGracefulDrainingBadput is a classadd attached
to each of the machines in my pool, which represents a calculated
number, but i don't fully understand it

i see the explination in the manual, but it's still not clear.  does
anyone have a pointer to something that might make it more clear how
this is actually choosing machines to set to draining state?

On Thu, Jan 12, 2017 at 11:57 AM, Michael Di Domenico
<mdidomenico4@xxxxxxxxx> wrote:
> i just turned on condor_defrag on my pool, we have a mixture of single
> core jobs and full node jobs in the queue presently.  and as expected
> the full node jobs are backed up and not running behind the single
> core jobs
> the defragging seems to be churning along, but it only seems to drain
> a subset of nodes over all the others.
> my defrag config in condor is the defaults out of the box and we're
> 100% partitionable slots (cpu/memory/gpus) on the pool (which is the
> same on all nodes).  we're running condor 8.4.7 on linux
> is there something i can run that will tell me why defrag is picking
> certain nodes or subsequently something that will tell me why defrag
> is ignoring other nodes?
> as best i can tell it's trying to defrag nodes, except the full node
> jobs waiting require a certain set of nodes which defrag doesn't seem
> to be touching