[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] negotiator memory usage keeps climbing after enabling preemption



On 2/3/2016 9:52 AM, Vladimir Brik wrote:
Hello.

Memory usage of our condor_negotiator process started a continuous slow
climb after I enabled preemption. It takes about 48 hours for it to go
from 0 to 25GB at fairly constant rate (at which point our central
manager runs out of memory). Before preemption, condor_negotiator used
at most 1GB of memory.

Is that normal? Our pool has about 6000 cores and about 20k jobs in the
queue. Upgrading the central manager from 8.3.8 to 8.5.1 didn't help
(all other machines in our pool run 8.3.8). I didn't see anything
obviously wrong in the logs.

This behavior started when I replaced
NEGOTIATOR_CONSIDER_PREEMPTION = False
with
NEGOTIATOR_CONSIDER_PREEMPTION = True
ALLOW_PSLOT_PREEMPTION = True
PREEMPTION_REQUIREMENTS = False
(we only do rank-based preemption)

Anybody know what might be going on?


Hi Vlad,

The above looks like a valid config to me. We will investigate this issue and report back this week (in fact, Greg is setting up a run under valgrind w/ your config this afternoon...).

Thanks for reporting this as it certainly looks like a bug at this point,

regards,
Todd