[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Questions about resource allocation and user priorites...



Actually, I did look at PREEMPTION_REQUIREMENTS.  I should have been more 
specific in my initial email, where I lazily referred to it simply as 
"PREEMPTION".

The thing is, the individual jobs that make up each dag usually only take 
between 1 and 20 minutes each, but there can be several hundred jobs per 
dag.  Why don't the second user's jobs acquire resources as the first 
user's dag's individual "sub-jobs" finish, which they do fairly often?

Since all our jobs are in the Vanilla universe, I can't preempt jobs 
without losing any work that's already been done.   In fact, the longer a 
job runs, the more "costly" it becomes to kill.  Thus, I'm trying to avoid 
using preemption if possible.

-Mike


On Tuesday 17 February 2004 12:33, Mark Silberstein wrote:
> You should take a look at the PREEMPTION_REQUIREMENTS expression. This
> one controls the behavior of negotiator when it considers to preempt or
> not to preempt the current running job. By default that expression
> evaluates to true ( that is, allows the job to be preempted ) only if
> the current job runs more than one hour and the remote user has 1.2
> times higher priority over the current user.
> Changing that expression does what you want. But there is a reason
> behind such default policy - if you just set that expression to
> RemoteUserPrio > SubmittorPrio, it might end up with job thrashing.
>
> By the way, PREEMPT expression has nothing to do with the preemption due
> to priority. It is evaluated by startd ( as opposed to negotiator ) to
> know if the job is still allowed to run on the resource according to the
> rules of that resource, and not versus another job.
>
> On Tue, 2004-02-17 at 22:14, Michael S. Root wrote:
> > Hi.  I've been running Condor 6.6.0 for a couple of months now, and
> > I'm curious about how to more effectively control resource allocation
> > among users.
> >
> > If a user submits a dagman job, it will be correctly distributed
> > across our machines (cuurenly 11 of them).  However, if a second user
> > then also submits a dagman job, that user will not get ANY resources
> > until all of the first users jobs have been completed.  Both of the
> > users have the same priority factor (1.0).  The second user has an
> > Effective Priority of 0.5, while the first user's Effecive Priority is
> > 11.0 (or 1.0 times the number of machines).  Why don't any of the
> > machines get allocated to the other user over time as some of the jobs
> > in the first user's dag finish?
> >
> > As an experiment, I've set the PRIORITY_HALFLIFE to just 5 seconds, so
> > a user's Real Priority reflects only the resources they are currently
> > using (more or less).  This doesn't seem to have affected the behavior
> > in the slightest.
> >
> > I also tried fiddling with some of the PREEMPTION values.  However,
> > this isn't really what we want.  All of our jobs are in the Vanilla
> > universe, so we definitely do NOT want to just kill jobs that have run
> > for a while. We don't really want the instant resources that
> > PREEMPTION provides, just to have the resources equalize across users
> > over time as jobs finish.
> >
> > Any idea what I'm missing here?  Thanks for any insight you can
> > provide.
> >
> > -Mike
> > mike@xxxxxxxxxxxxxx
> >
> > Condor Support Information:
> > http://www.cs.wisc.edu/condor/condor-support/
> > To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> > unsubscribe condor-users <your_email_address>
>
> Condor Support Information:
> http://www.cs.wisc.edu/condor/condor-support/
> To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
> unsubscribe condor-users <your_email_address>

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>