[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to always avoid evictions of jobs belonging to GROUP_NAMES



Hello,

If you are using startd based eviction then probably following conf may work for you :

If you are Groupname with JOB_TRANSFORM feature of schedd.Â

STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) Groupname
RetirementTime = 60 * $(MINUTE)
LOCAL_Groupname = (MY.Groupname =?= "A" || MY.Groupname =?= "B")
TARGET_Groupname = (TARGET.Groupname =?= "A" || TARGET.Groupname =?= "B")
RANK = ifthenelse(!isundefined(TARGET.Groupname), $(TARGET_Groupname), $(LOCAL_Groupname))

You also need to setÂALLOW_PSLOT_PREEMPTION to TRUE on negotiator.Â

Thanks & Regards,
Vikrant Aggarwal


On Fri, Nov 15, 2019 at 9:17 PM Giuseppe Di Biase <giuseppe.dibiase@xxxxxxxxx> wrote:
Hi Todd, All,

Thanks for this comprehensive explanation, everything is much clearer to
me now.

But I would like to take advantage of your expertise for create a
correct configuration.

In practice we have three "Pipelines" (X,Y,Z) corrisponding to Group_Names.

I know that when run X uses 20 slots, Y uses 14 slots and Z all others
slots.

I would like X and Y jobs never evicted.

When start X, Y jobs can be evicted only Z jobs.


I'm fighting with this problem.

Can you point me to a simple solution?

Thanks again

Giuseppe






must run on the same cluster. One of these must never be evicted. I
already know the number of slots that each queue uses when it is in run.

On 11/15/19 4:05 PM, Todd Tannenbaum wrote:
> On 11/15/2019 2:04 AM, Giuseppe Di Biase wrote:
>> Hi All,
>>
> Hi Giuseppe!
>
> I think we will need more clarification on what policy you desire. We
> could help better if you could explain the scheduling policy you have in
> mind as simply as you can, ignoring HTCondor configuration issues at the
> moment. Once we understand what you want to happen, as a second step we
> can suggest what you add to your HTCondor config.
>
> More below...
>
>> i would like to avoid evictions of jobs belonging to a *user* in a
>> GROUP_NAMES defined.
>>
> Under what conditions are you seeing jobs being evicted now?
>
> Under what conditions do you *want* HTCondor to evict jobs? As food for
> thought, some potential example answer(s) are
>Â Â Â1. Never
>Â Â Â2. If a job runs more than X amount of time (e.g. kill a "run-away" job)
>Â Â Â3. If a job runs more than X amount of time and some other job from a
> higher priority group or user is waiting to run
>Â Â Â4. If a job runs more than X amount of time, and HTCondor is trying
> to drain this node
>Â Â Â5. If a job uses more memory (RAM) than requested
>Â Â Â6. If a machine prefers to run specific types of jobs (e.g. GPU
> jobs), and such a preferred job is waiting to run
>
> Most people like to start out simple with (1), then add (2) and (5).
>
>> I defined this GROUP_NAMES in the node running Negotiator/Collector
>> daemon but how and where (daemons) i must define his priority respect
>> jobs belonging to others GROUP_NAMES?
>>
> So you defined some accounting groups (with GROUP_NAMES), and now you
> wish to tell HTCondor how many resources each group should get?
>
> Lets say you have three groups X, Y, and Z. Do you want a strict
> priority across groups so that for instance X always gets machines ahead
> of Y, and Y always gets machines ahead of Z? ÂAlternatively perhaps you
> want to say that X should get 50% of cpus in pool, Y should get 40%, and
> Z should get 10%. Either policy is possible with HTCondor. Different
> users within the same group will get a proportional share (i.e.
> 'fair-share' across users in the same group). For explanation and
> examples see
>
> https://htcondor.readthedocs.io/en/v8_9_3/admin-manual/user-priorities-negotiation.html#accounting-groups-with-hierarchical-group-quotas
>
> Once you have groups the way you want, you can decide if you want
> preemption or not. Most people do NOT want preemption. For instance,
> imagine you have two groups, A and B, and you want each to get 50% of
> your pool. Imagine users in group A are using the entire pool, and then
> suddenly a group B job is submitted. Without preemption, the group B
> job will wait until some group A job exits and then it will start. With
> preemption, HTCondor will kill a group A job and give the slot it was
> running on over to group B. Most organizations prefer to avoid
> preemption, because all the cycles consumed by the killed job might be
> wasted (if the job cannot checkpoint).
>
>> Can i use a formula like this in SCHEDD?
>>
>> IsX = (Experiment =?= "X")
>> IsY = (Experiment =?= "Y")
>> IsZ = (Experiment =?= "Z")
>>
>> RANK = $(X)*70 + $(Y)*10Â + $(Z)*8
>>
> No, RANK _expression_ in your condor_config file is only used by the
> STARTD, not the schedd, and it always implies preemption (which I am
> guessing you do not want). If you wish to define priorities / quotas
> across different groups, you will want to use GROUP_QUOTA_xxxx settings
> in the configuration of your condor_negotiator. See the above reference
> to the HTCondor Manual for more info...
>
> Hope the above helps,
> Todd
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/