[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] Questions about resource allocation and user priorites...
- Date: Tue, 17 Feb 2004 15:41:53 -0800
- From: "Michael S. Root" <mike@xxxxxxxxxxxxxx>
- Subject: Re: [condor-users] Questions about resource allocation and user priorites...
Actually, I did look at PREEMPTION_REQUIREMENTS. I should have been more
specific in my initial email, where I lazily referred to it simply as
The thing is, the individual jobs that make up each dag usually only take
between 1 and 20 minutes each, but there can be several hundred jobs per
dag. Why don't the second user's jobs acquire resources as the first
user's dag's individual "sub-jobs" finish, which they do fairly often?
Since all our jobs are in the Vanilla universe, I can't preempt jobs
without losing any work that's already been done. In fact, the longer a
job runs, the more "costly" it becomes to kill. Thus, I'm trying to avoid
using preemption if possible.
On Tuesday 17 February 2004 12:33, Mark Silberstein wrote:
> You should take a look at the PREEMPTION_REQUIREMENTS expression. This
> one controls the behavior of negotiator when it considers to preempt or
> not to preempt the current running job. By default that expression
> evaluates to true ( that is, allows the job to be preempted ) only if
> the current job runs more than one hour and the remote user has 1.2
> times higher priority over the current user.
> Changing that expression does what you want. But there is a reason
> behind such default policy - if you just set that expression to
> RemoteUserPrio > SubmittorPrio, it might end up with job thrashing.
> By the way, PREEMPT expression has nothing to do with the preemption due
> to priority. It is evaluated by startd ( as opposed to negotiator ) to
> know if the job is still allowed to run on the resource according to the
> rules of that resource, and not versus another job.
> On Tue, 2004-02-17 at 22:14, Michael S. Root wrote:
> > Hi. I've been running Condor 6.6.0 for a couple of months now, and
> > I'm curious about how to more effectively control resource allocation
> > among users.
> > If a user submits a dagman job, it will be correctly distributed
> > across our machines (cuurenly 11 of them). However, if a second user
> > then also submits a dagman job, that user will not get ANY resources
> > until all of the first users jobs have been completed. Both of the
> > users have the same priority factor (1.0). The second user has an
> > Effective Priority of 0.5, while the first user's Effecive Priority is
> > 11.0 (or 1.0 times the number of machines). Why don't any of the
> > machines get allocated to the other user over time as some of the jobs
> > in the first user's dag finish?
> > As an experiment, I've set the PRIORITY_HALFLIFE to just 5 seconds, so
> > a user's Real Priority reflects only the resources they are currently
> > using (more or less). This doesn't seem to have affected the behavior
> > in the slightest.
> > I also tried fiddling with some of the PREEMPTION values. However,
> > this isn't really what we want. All of our jobs are in the Vanilla
> > universe, so we definitely do NOT want to just kill jobs that have run
> > for a while. We don't really want the instant resources that
> > PREEMPTION provides, just to have the resources equalize across users
> > over time as jobs finish.
> > Any idea what I'm missing here? Thanks for any insight you can
> > provide.
> > -Mike
> > mike@xxxxxxxxxxxxxx
> > Condor Support Information:
> > http://www.cs.wisc.edu/condor/condor-support/
> > To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> > unsubscribe condor-users <your_email_address>
> Condor Support Information:
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>