[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] question about preemption policy



On Wed, 7 Jun 2006, Ilya Narsky wrote:


Sorry, but I am afraid I have not stated my question clearly then. We have
50 nodes, 4 vm's per node, that is, a pool of about 200 vm's. We want to
preempt jobs from a certain user (and only from this user) if they take
more than 25% of the slots in the entire pool, that is, more than 50 vm's.
These jobs allocate one vm per job and are likely to be cpu-intensive.
Thanks,  Ilya


It's my understanding that this cannot be done at the moment.  The
amount of jobs used by a given user at any given time is not
available to be used as an expression in PREEMPTION_REQUIREMENTS.

I am working on something a bit different to accomplish much the
same thing.  What I would do is have a GROUP_QUOTA_group_user=50
for user, and those jobs would not be pre-empted, and have
a second way for user to submit job which are not part of the
group_user, which could be pre-empted.. i.e.
those users who are not to be pre-empted are part of some group or
other, and any user with null group name can be pre-empted.

It would be better to have a transparent way to do this but based
on my conversations with experts at the last condor week it isn't
possible currently.

Steve



On Wed, 7 Jun 2006, Matt Hope wrote:

On 6/6/06, Ilya Narsky <narsky@xxxxxxxxxxxxxxx> wrote:

In our condor pool, we would like the negotiator to preempt user A's jobs
in favor of users with lower priority factors only if user A takes more
than 25% of the cpu's and leave user A's jobs running otherwise. Can this
be accomplished? What are the condor variables that can be used to specify
PREEMPTION_REQUIREMENTS? We are using condor 6.7.18.  Thanks, Ilya

A vm (in the current condor sense of the word) can only service one job at once.
If you want (as I understand it) to have a job remain running if it is
a low CPU intensive task and allow another one to run at the same time
then you must have two VM's (or if you have a 2 way SMP machine 4 VM's
etc.) where one VM is for low CPU jobs only and the other is for high
CPU load ones*

The trick is in ensuring that the relevant jobs go to the proper
places. If your jobs are very well segmented (one set will always be
low CPU utilization) the others always high then you can achieve this
if your users mark and direct their jobs appropriately, if not you
have no way to control it.

On a side note, and not wanting to be teaching the sucking of eggs,
this setup is often not the best for throughput. Since most tasks tend
to be either CPU, memory, disk or network bound then even though the
job seems to 'only' be taking 25% CPU time the other factor may well
slow down the other more CPU intensive job more than you expect. Of
course in a non checkpointing environment where you need less latency
on those high CPU jobs it may be that the reduction in preemption
costs outweighs the reduction in theoretical ideal throughput.

As far as condor goes if you wish to use preemption to reduce latency
on certain jobs the best** way to reduce this cost is to get some form
of checkpointing (even if done by hand via the signal mechanisms in
the vanilla universe). Obviously this will sometimes just not be
possible.

Matt

* Note that in an ideal world the second vm would be for _either_ job
type but there are some nasty subtle behaviours that would make trying
to achieve this tricky

** By this I mean probably the best in real world benefit but, in some
ways more importantly, the best way to leverage condor to avoid
fighting with rather than working with it
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team