[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] question about preemption policy



On 6/6/06, Ilya Narsky <narsky@xxxxxxxxxxxxxxx> wrote:

In our condor pool, we would like the negotiator to preempt user A's jobs
in favor of users with lower priority factors only if user A takes more
than 25% of the cpu's and leave user A's jobs running otherwise. Can this
be accomplished? What are the condor variables that can be used to specify
PREEMPTION_REQUIREMENTS? We are using condor 6.7.18.  Thanks, Ilya

A vm (in the current condor sense of the word) can only service one job at once.
If you want (as I understand it) to have a job remain running if it is
a low CPU intensive task and allow another one to run at the same time
then you must have two VM's (or if you have a 2 way SMP machine 4 VM's
etc.) where one VM is for low CPU jobs only and the other is for high
CPU load ones*

The trick is in ensuring that the relevant jobs go to the proper
places. If your jobs are very well segmented (one set will always be
low CPU utilization) the others always high then you can achieve this
if your users mark and direct their jobs appropriately, if not you
have no way to control it.

On a side note, and not wanting to be teaching the sucking of eggs,
this setup is often not the best for throughput. Since most tasks tend
to be either CPU, memory, disk or network bound then even though the
job seems to 'only' be taking 25% CPU time the other factor may well
slow down the other more CPU intensive job more than you expect. Of
course in a non checkpointing environment where you need less latency
on those high CPU jobs it may be that the reduction in preemption
costs outweighs the reduction in theoretical ideal throughput.

As far as condor goes if you wish to use preemption to reduce latency
on certain jobs the best** way to reduce this cost is to get some form
of checkpointing (even if done by hand via the signal mechanisms in
the vanilla universe). Obviously this will sometimes just not be
possible.

Matt

* Note that in an ideal world the second vm would be for _either_ job
type but there are some nasty subtle behaviours that would make trying
to achieve this tricky

** By this I mean probably the best in real world benefit but, in some
ways more importantly, the best way to leverage condor to avoid
fighting with rather than working with it