Re: [HTCondor-users] Limiting max number of running jobs for a group

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

Hola JosÃ,

Thanks! I thought of that myself, but I cant' think of a proper groups hierarchy that achieves what we want but still feels 'natural'.

E.g. if we just have 'cms' (90% quota) and 'other' (10%), what hierarchy do we define? If we want the hard limit to be 50% of the machines, we should have other's parent group get 50% quota, but then, how do we give 50% to 'cms'?

All I can think of is to have two cms groups, one outside and one inside this parent, like this:

GROUP_QUOTA_DYNAMIC_group_cms1 = 0.50
GROUP_QUOTA_DYNAMIC_group_parent = 0.50

GROUP_QUOTA_DYNAMIC_group_parent.cms2 = 0.80
GROUP_QUOTA_DYNAMIC_group_parent.other = 0.20

But then I have the problem of mapping users to cms1 or parent.cms2, and it is also very ugly for the resulting accounting.

Antonio

On 09/21/2017 12:03 PM, Jose Caballero wrote:

2017-09-21 11:06 GMT+02:00 Antonio Delgado Peris
<antonio.delgadoperis@xxxxxxxxx>:

Dear all,

This is my first message to the list, so I'll start by presenting myself :-)
I am writing from CIEMAT institute, at Madrid, Spain, where we have recently
installed a HTCondor cluster (with an HTCondor-CE in front of it). We're
still in the testing phase, but should be moving to production fairly soon.
We'll be serving mostly (but not uniquely) the LHC CMS experiment.

So moving to my question... we've defined some hierarchical dynamic group
quotas, with surplus allowed, which is nice because we want minor groups to
be able to use the farm if CMS is not running for some reason. However, we
also would like to limit their expansion, so that they cannot occupy the
whole farm (to speed up CMS taking over the farm when their jobs come back).

Naively, this would be like having both dynamic (soft, fair share-like)
quotas and static (hard) quotas for some groups. But the manual says that if
you define both dynamic and static quotas, the dynamic one is ignored.

I have looked for another parameter like 'MAX_RUNNING_JOBS_PER_GROUP' but
haven't found anything like that. I have also tried to code some logic in
the START _expression_ using 'SubmitterGroupResourcesInUse', but it didn't
work (I think that attribute is only usable for preemption... which we don't
allow).

We have solved the situation by just reserving some named nodes to CMS, but
I was still curious if there might be a less static solution to the
problem--i.e.: not tied to a fixed set of nodes, but just stating a max
number of simultaneous running jobs.

Thanks for any hints. (And sorry if this question has been replied
earlier... I couldn't find it)

Cheers,

    Antonio

Hola Antonio,

not an expert myself, but I believe that if you use groups and
subgroups, in an scenario where the parent group does not allow
surplus but the children do, then you allow children to use idle
resources but never beyond the hard limit impossible by the parent.
Would that work?

Cheers,
Jose
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] Limiting max number of running jobs for a group