[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Understanding RequestCpus for HTCondor-CE



Sorry, it seems I did not explain the use-case properly.

We donât want to override OriginalCPUs, RequestCPUs or any of the others. We want to *use* them to set a different attribute, namely accounting groups and concurrency limits. For example, we want to separate 1-core and 8-core jobs into different sub-groups, e.g. ATLAS.SC and ATLAC.MC.
So weâre looking for the <x> in ``eval_set_Attribute = IfThenElse(<x> == 8, âMCâ, âSCâ)``.

Hope that makes it clearer what weâre trying to do.

Cheers,
Max

On 22. Jun 2020, at 16:17, Brian Lin <blin@xxxxxxxxxxx> wrote:

Unfortunately, no. This is possible for intermediate expressions that you're generating yourself with combination of set_ and eval_set [1] but not ones in JOB_ROUTER_DEFAULTS_GENERATED. You can override it in the JRD by setting `eval_set_OriginalCpus` in the route itself but you'll be doing so at your own risk, as this is overriding behavior intrinsic to HTCondor-CE [2].

If I may, why do you want to override `eval_set_OriginalCpus`?

- Brian

[1] Slides 23 and 24 here:

https://indico.cern.ch/event/817927/contributions/3570557/attachments/1915512/3166681/2019-09-26.htcondorce-config.pdf

[2] https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#quirks-and-pitfalls

On 6/22/20 9:04 AM, Fischer, Max (SCC) wrote:
Hi Brian,

thanks for the information. That clears up a lot already.

Is the evaluation order inside one group of job router functions well-defined? Say, if we only need the CPU count to compute `eval_set_â` attributes, can we reliably use OriginalCpus set by `eval_set_OriginalCpus` from JOB_ROUTER_DEFAULTS_GENERATED?

Cheers,
Max

On 22. Jun 2020, at 15:37, Brian Lin <blin@xxxxxxxxxxx> wrote:

Hi Max,

For a given job route, you should use `set_default_xcount` in your job routes (https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#number-of-cores-to-request) to set a default RequestCpus for a given route. orig_RequestCpus gets set to the original value of RequestCpus from the remote submitter and if they don't bother to set this, it will default to 1.

Depending on how you have your CE configured, the order of the job routes may indeed be random, so I suggest specifying the order via JOB_ROUTER_ROUTE_NAMES as documented here: https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#how-jobs-match-to-job-routes. Additionally, it's important to note that the job router ClassAd functions (copy_, set_, etc.) have an order of operations and I've seen this trip up other users when writing routes: https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#editing-attributes.

- Brian

On 6/22/20 8:24 AM, Fischer, Max (SCC) wrote:
Hi all,

weâve just had an HTCondor-CE Job Router _expression_ behave weirdly because we seem to mishandle the number of CPUs requested. This seems to be wildly different from regular Condor.
Since the evaluation order of a JRE seems random, we sometimes end up with the correct value (evaluated by the CE) and sometimes not (the initial job value).

In short, what *is* the correct job attribute to check the number of requested cpus in HTCondor-CE?

Looking at a known 8-Core job:
It seems job RequestCpus is a dummy. Using it in the JRE leads to the unexpected behaviour depending on evaluation order, and orig_RequestCpus always ends up as 1. Is this correct? This is what we usually use in Condor, so that came as a surprise.

Other candidate attributes are: OriginalCpus, xcount, remote_SMPGranularity (from GlideinWMS?), but none of these seem documented either for HTCondor-CE or HTCondor itself. Can we use them? Should we use them?

Cheers,
Max=

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




Attachment: smime.p7s
Description: S/MIME cryptographic signature