[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs priority



Daniel,
If I understand this properly, when you submit multiple submissions
you want it to start fair-sharing the pool between the submitted
clusters you have submitted. If you have 100 nodes, and 10 clusters in
the pool, each cluster should get 10 machines ideally. Are these jobs,
policy-wise, equivalent (you have start = true and rank = 0 on all of
your execute nodes, and preemption requirements is normal)?

Given the above are true, if you want Condor to fair-share amongst
clusters, then you need to tell the accountant to pay attention to the
clusters, rather than the users (it does fair-share based upon users
by default, but you can configure it to any arbitrary "accounting
group"). Use the Accounting Group setting:
http://www.cs.wisc.edu/condor/manual/v7.0/3_4User_Priorities.html#19302

If you set this to:
+AccountingGroup = "SomeUniqueID"
for example, if everyone is logging into the submit node, then
"$ENV(HOSTNAME).$(Cluster)" might work

Then each cluster will be "fair-shared".  This should work regardless
of whether you have multiple users. You may also want to set:
PREEMPTION_REQUIREMENTS = False to avoid preemption,
CLAIM_WORKLIFE to something small to guarantee churn on the matches
(we didn't have claim_worklife back in the day, so we had to preempt
the match, and set MaxJobRetirementTime very high :-) )
DEFAULT_PRIO_FACTOR  = Some base number for all the new accounting
Groups you're about to create per cluster
PRIORITY_HALFLIFE = 1 or some really low number as you don't care
about accumulated usage for an individual cluster

You may need to be careful with too many accounting groups, so ideally
you'd have some fixed number of round-robin accounts that jobs would
use/re-use.

There are a few other ways to get close to round-robin (I can think of
several Job Priority/Machine RANK/PREEMPTION_REQUIREMENTS), but none
of them are as elegant as accounting groups (Job Priority),
necessarily work effectively (PREEMPTION_REQUIREMENTS/RANK), and
require preemption (PREEMPTION_REQUIREMENTS/RANK).

I hope this helps!

Good luck,
Jason


-- 
===================================
Jason A. Stowe
cell: 607.227.9686
main: 888.292.5320

Cycle Computing, LLC
Leader in Condor Grid and Cloud Solutions,
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com







On Sat, Aug 30, 2008 at 12:31 PM, Matthew Farrellee <matt@xxxxxxxxxx> wrote:
> When a Schedd is running a job it is because it was given a claim to
> some resource, where the job is running. The Schedd, by default, can
> recycle that claim as many times as it likes to run other jobs. Those
> other jobs need to be similar enough to the one the claim was originally
> for. Being in the same cluster is often similar enough. This is
> typically a good thing as it avoids negotiation overhead for your jobs.
>
> There are a few ways you can stop a Schedd from recycling claims. One is
> a negotiation cycle can give the claimed resource to another user for
> their jobs, based on user priorities and negotiation policy. This
> effectively breaks the claim. If you're submitting cluster A, B and C to
> the same user, or have preemption disabled, this won't help you much. A
> second way is to limit the amount of time a claim can be recycled, via
> the CLAIM_WORKLIFE configuration option.
>
> http://www.cs.wisc.edu/condor/manual/v7.1/3_3Configuration.html#13583
>
> So you know, the profile of your clusters matters. For instance, if they
> are all very short running jobs you may in fact see them run
> sequentially because the A cluster may completely finish before a
> negotiation cycle can give resources to the B cluster. Also, when you
> say "immediately after" the time actually matters because if there were
> no negotiation cycle between A being submitted and B then you may in
> fact get a mixture of A and B jobs run.
>
> Best,
>
>
> matt
>
> Daniel Tardón wrote:
>> Hello, thank you for your response. Im reopening this thread because ive
>> been absent.
>> I think i havent explained the situation properly.
>> Our problem is that we always send jobs from the same machine and the same
>> user.
>> Our problem is not prioritizing some jobs over others, what we want to do
>> is to get all the jobs to execute concurringly when they start arriving at
>> the pool.
>> In other words again, if we have a pool with 10 machines, a cluster A is
>> submitted to the pool with 100 jobs. Then immediatly after a cluster B is
>> submitted to the pool with 30 jobs for example. Now there are 10 jobs
>> running and all are from cluster A. One job finishes, and the next job to
>> be executed is always a job from cluster A. This is what we dont want.
>>
>> We want to imitate a round-robin scheduling with the jobs of different
>> clusters.
>>
>> We want the jobs from the cluster B to execute at the same time as jobs
>> from cluster A.
>> Not to execute cluster B only when cluster A has finished.
>>
>> We would like the same thing to happen if a new cluster C entered in the
>> queue, so that, for example, the first job of cluster C could execute
>> before the last job of cluester A (obviously depending on when cluster C
>> was submitted).
>>
>> We want the resources to be shared equaly.
>>
>> I hope our doubts are clearly expressed, and that someone can help us.
>> Thank you very much.
>>
>>> Steven Timm wrote:
>>>> You can either add
>>>>
>>>> Priority=nnn
>>>>
>>>> to the submit file of the second job,
>>> Could it be used in the config file of the submit host so that any job
>>> submitted form that particular host always has the higher priority?
>>>
>>> Cheers,
>>> Santanu
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>