[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_userprio



Hi Colin,

I was arriving to the same conclusions that I can ignore group priorities and beef up GROUP_SORT_EXPR but your detail explanation is very helpful.Â

To simplify the configuration I wonder if removing thosà priorities all together and only leave a default one would have any negative consequence.

cheers
alessandra

On Mon, 27 Aug 2018, 22:49 Collin Mehring, <collin.mehring@xxxxxxxxxxxxxx> wrote:
HiÂAlessandra,

The groups themselves do not really have priority factors, they simply have a default value that is used by new users submitting to that group for the first time. I'm not sure why your condor_userprio output is even displaying values for them in the 'Priority Factor' column. It's blank for our output, which would more accurately represent things:

% condor_userprio -allusersÂ-hierarchical
Group                        Config  ÂUse  Effective ÂPriority ÂRes ÂTotal Usage Time Since RequestedÂ
 User Name                     ÂQuota ÂSurplus ÂPriority  Factor ÂIn Use (wghted-hrs) Last Usage ResourcesÂ
--------------------------------------------------- --------- ------- ------------ --------- ------ ------------ ---------- ----------
prod                          Â0.95 Regroup              Â0    Â0.00 17770+20:4     0
prod.btr                        Â0.00 Regroup              Â0    Â0.00 17770+20:4     0
prod.btr.btr_anim                    6.00 Regroup              Â0    40.76  3+00:14     0
 user@xxxxxxxxxxxxxxxxxxx                         500.00 Â1000.00   0    40.76  3+00:14     Â

Note that only the actual user in the group has a priority factor.

This is important because it means changing the defaults for new users in the config files has no effect on existing users in that group (even with a reconfig/restart). You can still change the priority factor for existing users with condor_userprio -setfactor <user> <val>. That said, I think the effective priority the factor contributes to is being used differently than you think.

When a new negotiation cycle starts it will use theÂGROUP_SORT_EXPR to decide on the order it will consider groups in. Since you have that defined to a non-default value in 11_fairshares it looks like your pool will always considerÂgroup_atlas.admin,Âgroup_northgrid.manchester, andÂgroup_skatelescope first, followed by any group with "multicore" in it's name in starvation order, then any other group in starvation order, and finally any user not in a group. During negotiation with any of these schedds it will order the users in the group using their effective priorities. The group ordering happens first and is therefore more important. During this round of negotiation, because you haveÂGROUP_ACCEPT_SURPLUS = true, any surplus allocation from a group is redistributed according to your group hierarchy.

In the same file you also haveÂGROUP_AUTOREGROUP = true, which complicates things a little more. Because of this, after each round of negotiating every group it will do a 'Regroup' negotiation round. During this 'Regroup' round it will negotiate every user a second time outside of their group, strictly ordered by their effective priority. After the regroup round it repeats the whole process a number of times determined by GROUP_QUOTA_MAX_ALLOCATION_ROUNDS. In my experience the subsequent rounds matter much less when AUTOREGROUP is enabled, because it tends to schedule almost all of the remaining resources during the 'Regroup' round if able. This round is why the different group priority factors matter to you, as they are loosely enforcing your group ordering when jobs are being considered outside of their groups.

I would suggest tryingÂGROUP_AUTOREGROUP = false, and instead adding a more specific ordering to GROUP_SORT_EXPR. This would make it easier to change the ordering in the future, and you could then largely ignore the group priority factors.

Hope this helps,
Collin

On Mon, Aug 27, 2018 at 6:26 AM, Alessandra <afortiorama@xxxxxxxxx> wrote:
HI Todd,

The original configuration comes from RAL and it does have group GROUP_PRIO_FACTOR and group GRUOP_QUOTAS_DYNAMIC for each group and subgroup. I started to look into it because the original algorythm mapped VO names to group_names and some of the VO names have dots in them clearly making a mess. So after remapping the VO names to something without dots I have a clearer picture in condor_userprio but still wrong numbers.

All setting GROUP_PRIO_FACTOR_<groupname> does is
set the default priority factor for a new user that submits into that
group for the first time.
Â
I don't understand this, isn't an initial condition for each job and then it gets maybe dumped by excessive usage? or really this number applies only to the very first job? In any case shouldn't the value be correctly reported by the tool as configured and not some number that creeped in at some point? I thought this was the meaning of the columns "priority factor" as configured and "effective priority factor" after calculations get applied.

In any case the point is to configure quotas and priorities for each group. I don't mind if they are hierarchical or not (though ATLAS still asks for it) but there are groups with a small quotas and large priority, also there are groups which occasionally will have their priority bumped up (or reduced) quite a lot for a period and from what you are telling me this is not possible in condor.

I've attached my new configuration files as it might help understanding. In this configuration I removed the Owner from ActtGroup as well because they are pool users anyway and I have no say if user jenny has priority over user tom it also simplifies following things. I left it however in ConcurrencyLimits for now.

thanks

cheers
alessandra

On Mon, 27 Aug 2018 at 12:57, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 8/26/2018 8:40 AM, Alessandra wrote:
> Hi,
>
> i'm trying to adjust the fair shares and priorities in condor. One of
> the things that I don't understand is why the condor_userprio tool
> doesn't report the configured priorities but seems to report random numbers.
>
> for example I have condor_userprio reporting
>
> group_atlas.pilot = 10000
> group_atlas.production = 10000
>
> But the priorities configured are
>
> GROUP_PRIO_FACTOR_group_atlas =Â 10.0
> GROUP_PRIO_FACTOR_group_atlas.pilot =Â 10.0
> GROUP_PRIO_FACTOR_group_atlas.production =Â 10.0
>
> it really should be 10 even if I didn't explicitely declared them
> because the group_atlas is 10. 10k is the default and 1000 I'm not sure
> where it comes from.

Hi Alessandra,

Do the below groups have a group quota? In other words, do you have in
your config a GROUP_NAMES entry and for each group a GROUP_QUOTA_* entry ?

If so, please be aware that these hierarchical groups themselves do not
have a priority factor. The historical fair share scheduling in HTCondor
is only applied to users, not to groups, so all the parameters dealing
with historical fair share scheduling (like real user priority,
effective user priority, and priority factor) have no meaning with
respect to groups. All setting GROUP_PRIO_FACTOR_<groupname> does is
set the default priority factor for a new user that submits into that
group for the first time. I am guessing this is not what you thought it
did.

Instead of historical fair share, the group scheduling is controlled
solely by GROUP_SORT_EXPR, which defaults to "starvation group order."
That is, the group whose current usage is the smallest fraction of its
quota goes first, then the next, and so on.

I realize I did not directly answer you question(s) below, but given the
above information, I am guessing that you really didn't mean to apply a
priority factor to groups in the first place. Perhaps if you told us
your desired scheduling policy someone could better assist...

Hope the above helps,
regards,
Todd



--
Well you'll still need a tray. (Eddie Izzard)

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Collin Mehring | PE-JoSE - Software Engineer

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/