[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Limiting max number of running jobs for a group



Another option could be to allow eviction ONLY for those "low priority" non-CMS jobs. Then all the resources could be use when there are no (or a few) CMS jobs in the queue...


Quoting Jose Caballero <jcaballero.hep@xxxxxxxxx>:

And no eviction, right?
I see your problem...



On Sep 21, 2017, at 12:57, Antonio Delgado Peris <antonio.delgadoperis@xxxxxxxxx> wrote:

Hola JosÃ,

Thanks! I thought of that myself, but I cant' think of a proper groups hierarchy that achieves what we want but still feels 'natural'.

E.g. if we just have 'cms' (90% quota) and 'other' (10%), what hierarchy do we define? If we want the hard limit to be 50% of the machines, we should have other's parent group get 50% quota, but then, how do we give 50% to 'cms'?

All I can think of is to have two cms groups, one outside and one inside this parent, like this:

GROUP_QUOTA_DYNAMIC_group_cms1   = 0.50
GROUP_QUOTA_DYNAMIC_group_parent = 0.50

GROUP_QUOTA_DYNAMIC_group_parent.cms2  = 0.80
GROUP_QUOTA_DYNAMIC_group_parent.other  = 0.20

But then I have the problem of mapping users to cms1 or parent.cms2, and it is also very ugly for the resulting accounting.

Antonio


On 09/21/2017 12:03 PM, Jose Caballero wrote:
2017-09-21 11:06 GMT+02:00 Antonio Delgado Peris
<antonio.delgadoperis@xxxxxxxxx>:
Dear all,

This is my first message to the list, so I'll start by presenting myself :-) I am writing from CIEMAT institute, at Madrid, Spain, where we have recently
installed a HTCondor cluster (with an HTCondor-CE in front of it). We're
still in the testing phase, but should be moving to production fairly soon.
We'll be serving mostly (but not uniquely) the LHC CMS experiment.

So moving to my question... we've defined some hierarchical dynamic group
quotas, with surplus allowed, which is nice because we want minor groups to
be able to use the farm if CMS is not running for some reason. However, we
also would like to limit their expansion, so that they cannot occupy the
whole farm (to speed up CMS taking over the farm when their jobs come back).

Naively, this would be like having both dynamic (soft, fair share-like)
quotas and static (hard) quotas for some groups. But the manual says that if
you define both dynamic and static quotas, the dynamic one is ignored.

I have looked for another parameter like 'MAX_RUNNING_JOBS_PER_GROUP' but
haven't found anything like that. I have also tried to code some logic in
the START expression using 'SubmitterGroupResourcesInUse', but it didn't
work (I think that attribute is only usable for preemption... which we don't
allow).

We have solved the situation by just reserving some named nodes to CMS, but
I was still curious if there might be a less static solution to the
problem--i.e.: not tied to a fixed set of nodes, but just stating a max
number of simultaneous running jobs.

Thanks for any hints. (And sorry if this question has been replied
earlier... I couldn't find it)

Cheers,

    Antonio


Hola Antonio,

not an expert myself, but I believe that if you use groups and
subgroups, in an scenario where the parent group does not allow
surplus but the children do, then you allow children to use idle
resources but never beyond the hard limit impossible by the parent.
Would that work?

Cheers,
Jose
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Antonio Dorta
Servicios InformÃticos EspecÃficos (SIE)
InvestigaciÃn y EnseÃanza
Instituto de AstrofÃsica de Canarias (IAC)
C/ VÃa LÃctea, s/n. 38205 - La Laguna, Santa Cruz de Tenerife
Despacho: 1124. Tfno: 922 60 5278. email: adorta@xxxxxx
Supercomputing at IAC: http://www.iac.es/sieinvens/SINFIN/Main/supercomputing.php
----------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Proteccion de Datos, acceda a http://www.iac.es/disclaimer.php WARNING: For more information on privacy and fulfilment of the Law concerning the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en