[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] On-demand priority access to resources

this question in varying forms has been asked so many times it should be a FAQ

in short:

If you can handle preemption use it and have your execute machines rank the job higher in some way. (if this is at the users control search the archive for "TIER" for one example of doing this).

If you can't handle premption you're in a nasty position. look for some posts by myself and others over the last few weeks about this.

6.7.2 should add some nice functionality which will mean you can enable preemption as use any technique from above to differentiate jobs without causing too much thrashing by guaranteeing an minimum amount of time for a machine to be claimed, sacrificing some (controllable) latency against throughput loss due to preemption. (see previous thread about MaxJobRetirementTime named  "How to have schedd drop claim after each job")

There is currently no easy solution. I have had _MUCH_ pain trying to achieve this with my farm which is similar to the one you describe and there are several gotchas for the unwary due to the user priority mechanism - even if this is effectively disabled.

best answer - wait for 6.7.2 if you can.

if you really must then disable preemption and have a cron job/service running somewhere which periodically scans the queue for better jobs, picks the shortest running/lowest prio jobs on the farm that could be used for it and sends a condor vacate to them.

the latter solution is a lot of hassle to set up and maintain so I think twice about it.


> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Ian Chesal
> Sent: 03 August 2004 17:38
> To: Condor-Users Mail List
> Subject: [Condor-users] On-demand priority access to resources
> A colleague asked a similar question without any answers, so I thought
> I'd try rephrasing things.
> We are wondering how, in the Condor world, the following scenario is
> handled:
> I have lots of users and lots of machines in the Condor pool. 
> All users
> run vanilla jobs of varying length, between 1-10 hours  of comuptation
> time long. User A suddenly has a very, very important experiment that
> must take priority over all the other jobs in the system. How 
> do you let
> User A trump all other jobs? Based on Condor's manual and the
> description of the decaying user priority it would seem that User A
> would have "save up" system time by not submitting jobs for a few days
> and letting their user priority reset to the highest value.
> What if waiting for the user priority to reset wasn't an option? How
> does User A gain full access to all the resources and trump everyone
> else? Do the condor admins have to intervene to broker this type of
> priority access or can User A do something to indicate this one
> experiment is the top priority regardless of user priorities?
> Any insight into how Condor behaves, or how others manage large Condor
> installations with a large user base, would be greatly appreciated.
> Ian
> --
> Ian R. Chesal <ichesal@xxxxxxxxxx>
> Senior Software Engineer
> Altera Corporation
> Toronto Technology Center
> Tel: (416) 926-8300
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users

Gloucester Research Limited believes the information 
provided herein is reliable. While every care has been 
taken to ensure accuracy, the information is furnished 
to the recipients with no warranty as to the completeness 
and accuracy of its contents and on condition that any 
errors or omissions shall not be made the basis for any 
claim, demand or cause for action.