[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] GROUP_PRIO_FACTOR and condor_userprio



Hi Greg,

thanks for the detailed answer.

To answer your question: I want to get the jobs submitted by the users of a given group (cms.admin) to be run as soon as possible. Ideally without touching the quota of the parent group (cms) and of other groups. The jobs are very short so it is not much a matter of group quota, I guess. The quota used by these jobs is infinitesimal compared to other categories.
I just want them scheduled on the next-free-slot.

This said, with the standard setup and a normal activity, I'm already having he jobs scheduled with a pretty short delay. So this is more for me to understand how this work and how can I make this more robust against peaks of activity.


Thanks!
Cheers,
Andrea

On 2/20/20 19:07, Greg Thain wrote:
On 2/20/20 7:53 AM, Beyer, Christoph wrote:
Hi,

I think when it comes to user_prios a lower prio-number/factor will give you more priority:


Yes, this is correct -- for user prio, lower is better, and 0.5 is the best you can get.


GROUP_PRIO_FACTOR is confusing. In HTCondor, when we have accounting groups configured, the prioritization of jobs happens in two very different stages. In the first stage, all jobs are grouped together by accounting group. The system looks at each accounting group, and calculates how many idle jobs in that group are requesting slots, and how many jobs in the group are running. Each group has a target quota that should not be exceeded. If the group is below that quota, HTCondor allocates resources to the group based on the quota and available resources.


Then, in the second stage, HTCondor tries to assign slots to each user with jobs within that accounting group. So, if alice, bob and charlie all have jobs in the PHYSICS accounting group, and the physics group should get 90 more slots, HTcondor tries to assign slots "fairly" to the three. Unlike the first stage, this is done based on the condor "user prio". HTCondor keeps track of the historical usage of each submitter within the group, and tries to give out an even number of slots over time. So, if alice has used millions of hours in the last few days, and bob has used none, bob will probably get most of the idle slots.

Now, maybe the administrator doesn't want to be fair. Perhaps Alice is more important that Bob. HTCondor has a priority factor setting that allows a pool administrator to say that Alice should get twice as many jobs as a "fair". This is usually set with the "condor_userprio" command line tool. However, the GROUP_PRIO_FACTOR can set the initial value of the Priority Factor when a user arrives for the first time in the system. Changing the GROUP_PRIO_FACTOR does not impact anyone who has already submitted a job. After the entry appears in condor_userprio, you can change the factor with condor_userprio -setfactor.


So, the question is -- do you want to change the way slots are allocated to groups as a whole, or do you want to change the way that submitters within a group are allocated their part of their group's quota? GROUP_PRIO_FACTOR can help a little bit with the latter, but does not impact the first part.


-greg


 3.6.2
Effective User Priority (EUP)
The effective user priority (EUP) of a user is used to determine how many resources that user may receive. The EUP is linearly related to the RUP by a priority factor which may be defined on a per-user basis. Unless otherwise configured, an initial priority factor for all users as they first submit jobs is set by the configuration variable DEFAULT_PRIO_FACTOR , and defaults to the value 1000.0. If desired, the priority factors of specific users can be increased using condor_userprio,
so that some are served preferentially.
The number of resources that a user may receive is inversely related to the ratio between the EUPs of submitting users. Therefore user A with EUP=5 will receive twice as many resources as user B with EUP=10 and four times as many resources as user C with EUP=20. However, if A does not use the full number of resources that A may be given, the available resources are repartitioned and distributed among remaining users according to the inverse ratio rule.

Best
Christoph


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/