[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to always avoid evictions of jobs belonging to GROUP_NAMES



Hi Todd, All,

Thanks for this comprehensive explanation, everything is much clearer to me now.

But I would like to take advantage of your expertise for create a correct configuration.

In practice we have three "Pipelines" (X,Y,Z) corrisponding to Group_Names.

I know that when run X uses 20 slots, Y uses 14 slots and Z all others slots.

I would like X and Y jobs never evicted.

When start X, Y jobs can be evicted only Z jobs.


I'm fighting with this problem.

Can you point me to a simple solution?

Thanks again

Giuseppe






must run on the same cluster. One of these must never be evicted. I already know the number of slots that each queue uses when it is in run.

On 11/15/19 4:05 PM, Todd Tannenbaum wrote:
On 11/15/2019 2:04 AM, Giuseppe Di Biase wrote:
Hi All,

Hi Giuseppe!

I think we will need more clarification on what policy you desire. We
could help better if you could explain the scheduling policy you have in
mind as simply as you can, ignoring HTCondor configuration issues at the
moment.  Once we understand what you want to happen, as a second step we
can suggest what you add to your HTCondor config.

More below...

i would like to avoid evictions of jobs belonging to a *user* in a
GROUP_NAMES defined.

Under what conditions are you seeing jobs being evicted now?

Under what conditions do you *want* HTCondor to evict jobs?  As food for
thought, some potential example answer(s) are
    1. Never
    2. If a job runs more than X amount of time (e.g. kill a "run-away" job)
    3. If a job runs more than X amount of time and some other job from a
higher priority group or user is waiting to run
    4. If a job runs more than X amount of time, and HTCondor is trying
to drain this node
    5. If a job uses more memory (RAM) than requested
    6. If a machine prefers to run specific types of jobs (e.g. GPU
jobs), and such a preferred job is waiting to run

Most people like to start out simple with (1), then add (2) and (5).

I defined this GROUP_NAMES in the node running Negotiator/Collector
daemon but how and where (daemons) i must define his priority respect
jobs belonging to others GROUP_NAMES?

So you defined some accounting groups (with GROUP_NAMES), and now you
wish to tell HTCondor how many resources each group should get?

Lets say you have three groups X, Y, and Z.  Do you want a strict
priority across groups so that for instance X always gets machines ahead
of Y, and Y always gets machines ahead of Z?   Alternatively perhaps you
want to say that X should get 50% of cpus in pool, Y should get 40%, and
Z should get 10%.  Either policy is possible with HTCondor.  Different
users within the same group will get a proportional share (i.e.
'fair-share' across users in the same group). For explanation and
examples see

https://htcondor.readthedocs.io/en/v8_9_3/admin-manual/user-priorities-negotiation.html#accounting-groups-with-hierarchical-group-quotas

Once you have groups the way you want, you can decide if you want
preemption or not.  Most people do NOT want preemption.  For instance,
imagine you have two groups, A and B, and you want each to get 50% of
your pool.  Imagine users in group A are using the entire pool, and then
suddenly a group B job is submitted.  Without preemption, the group B
job will wait until some group A job exits and then it will start.  With
preemption, HTCondor will kill a group A job and give the slot it was
running on over to group B.  Most organizations prefer to avoid
preemption, because all the cycles consumed by the killed job might be
wasted (if the job cannot checkpoint).

Can i use a formula like this in SCHEDD?

IsX = (Experiment =?= "X")
IsY = (Experiment =?= "Y")
IsZ = (Experiment =?= "Z")

RANK = $(X)*70 + $(Y)*10Â + $(Z)*8

No, RANK expression in your condor_config file is only used by the
STARTD, not the schedd, and it always implies preemption (which I am
guessing you do not want). If you wish to define priorities / quotas
across different groups, you will want to use GROUP_QUOTA_xxxx settings
in the configuration of your condor_negotiator.  See the above reference
to the HTCondor Manual for more info...

Hope the above helps,
Todd