[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Preempted jobs not carrying their ImageSize for the next match?

On Wed, Mar 04, 2015 at 09:37:04AM +0100, Steffen Grunewald wrote:
> On Tue, Mar 03, 2015 at 10:36:30AM -0600, Greg Thain wrote:
> > On 03/03/2015 05:31 AM, Steffen Grunewald wrote:
> > >I have a couple of users who underestimate the memory their jobs
> > >would attempt to allocate, and as a result some worker nodes end
> > >up swapping heavily.
> > >I tried to get those jobs preempted, and sent back into the queue
> > >with their updated (ImageSize) request_memory:
> > >
> > ># Let job use its declared amount of memory and some more
> > >MEMORY_EXTRA            = 2048
> > >MEMORY_ALLOWED          = (Memory + $(MEMORY_EXTRA)*Cpus)
> > ># Get the current footprint
> > >MEMORY_CURRENT          = (ImageSize/1024)
> > ># Exceeds expectations?
> > ># If exceeding, preempt
> > >#[preset]PREEMPT        = False
> > >PREEMPT                 = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
> > >WANT_SUSPEND            = False
> > >
> > >
> > This should all work.

And indeed it does, see below. No need to debug right away.

> For the "exclude parallel universe from preemption" part, I will now use
> PREEMPT                 = ($(PREEMPT)) || ($(MEMORY_EXCEEDED) && (JobUniverse =!= 11))

I had no opportunity to test this part yet...

> (and I'm afraid "PREEMPT_VANILLA = False" was the cause for preemption not
> happening to vanilla universe jobs... removed that one from the config now)

This one seems to have made the real difference.
Now preemption *does* happen, as one of the users quickly learned:

> Hi Steffen,
> This is a plotting job that looks like it's trying to plot a vast
> number of points .... and the job does seem to get kicked pretty
> quickly. However, the memory requirement doesn't get above 15GB (even
> though the reported condor size is bigger) and so it keeps matching,
> evicting, matching, evicting.
> 15GB is the original memory request of this job.

As the job was requesting more than the 15000 MB asked for in
request_memory, at 17090 MB the additional margin of 2 GB was
also reached, and the job was evicted from the slot it was running in.

Now I would have expected that the job would go back to the queue,
with its updated ImageSize as new RequestMemory setting, but this
apparently didn't happen.
Well, it got re-scheduled, but kept its request size at 15000, 
resulting in a never-ending loop, visiting more and more worker nodes.
I have extracted the corresponding messages from the overall log file.

Condor version is 8.3.3 (this isn't supposed to behave different from 
8.2.7 in this particular respect, is it?)

Log file extract attached.

Any ideas how to help the user (except suggesting to specify 50000,
and wait for a machine big enough to take this)?
I'm afraid this is only the tip of a bigger iceberg...


Steffen Grunewald * Cluster Admin * steffen.grunewald(*)aei.mpg.de
MPI f. Gravitationsphysik (AEI) * Am Mühlenberg 1, D-14476 Potsdam
http://www.aei.mpg.de/ * ------- * +49-331-567-{fon:7274,fax:7298}