[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] preempt and then hold?
For a long time we have set up our pool with
PREEMPT = False
so that the nodes in our cluster would not preempt a running
job for any reason (of course, the negotiator could still
cause jobs to preempt).
Lately, however, a few users have been running jobs that
malloc() a lot of memory and then eventually run the machine
in full swap, which eventually takes them into the weeds.
So we plan to change our configuration to
PREEMPT = (TARGET.ImageSize > ( 512 * 1024))
since each machine has 512 MB of physical memory (yes, the OS
uses some but we don't mind a little use of swap).
The idea is that when the job's memory usage grows, and Condor
notices, it will preempt the running job.
1) Will this work?
2) Is there any way to get the preempted job to be placed on
hold so that the schedd doesn't have to continually process
through these jobs trying to match them?
I would like to mail the user and say "your memory-hog jobs
are on hold now, please remove them"...