[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Understanding RequestCpus for HTCondor-CE

Hi Max,

In that case, you should use `set_*` to an _expression_ that includes the `OriginalCpus` attribute since `eval_set_*` will evaluate its expressions in the context of the job route and incoming job, while `set_*` places the _expression_ directly into the routed job. So if you had the following in your job route:

set_ConcurrencyLimits = strcat("foo", OriginalCpus);

And if the OriginalCpus _expression_ for the original job evaluates to 1, then your resultant job would have the following attributes:

ConcurrencyLimits = strcat("foo", OriginalCpus)
OriginalCpus = 1

- Brian

On 6/22/20 9:47 AM, Fischer, Max (SCC) wrote:
Sorry, it seems I did not explain the use-case properly.

We donât want to override OriginalCPUs, RequestCPUs or any of the others. We want to *use* them to set a different attribute, namely accounting groups and concurrency limits. For example, we want to separate 1-core and 8-core jobs into different sub-groups, e.g. ATLAS.SC and ATLAC.MC.
So weâre looking for the <x> in ``eval_set_Attribute = IfThenElse(<x> == 8, âMCâ, âSCâ)``.

Hope that makes it clearer what weâre trying to do.


On 22. Jun 2020, at 16:17, Brian Lin <blin@xxxxxxxxxxx> wrote:

Unfortunately, no. This is possible for intermediate expressions that you're generating yourself with combination of set_ and eval_set [1] but not ones in JOB_ROUTER_DEFAULTS_GENERATED. You can override it in the JRD by setting `eval_set_OriginalCpus` in the route itself but you'll be doing so at your own risk, as this is overriding behavior intrinsic to HTCondor-CE [2].

If I may, why do you want to override `eval_set_OriginalCpus`?

- Brian

[1] Slides 23 and 24 here:


[2] https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#quirks-and-pitfalls

On 6/22/20 9:04 AM, Fischer, Max (SCC) wrote:
Hi Brian,

thanks for the information. That clears up a lot already.

Is the evaluation order inside one group of job router functions well-defined? Say, if we only need the CPU count to compute `eval_set_â` attributes, can we reliably use OriginalCpus set by `eval_set_OriginalCpus` from JOB_ROUTER_DEFAULTS_GENERATED?


On 22. Jun 2020, at 15:37, Brian Lin <blin@xxxxxxxxxxx> wrote:

Hi Max,

For a given job route, you should use `set_default_xcount` in your job routes (https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#number-of-cores-to-request) to set a default RequestCpus for a given route. orig_RequestCpus gets set to the original value of RequestCpus from the remote submitter and if they don't bother to set this, it will default to 1.

Depending on how you have your CE configured, the order of the job routes may indeed be random, so I suggest specifying the order via JOB_ROUTER_ROUTE_NAMES as documented here: https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#how-jobs-match-to-job-routes. Additionally, it's important to note that the job router ClassAd functions (copy_, set_, etc.) have an order of operations and I've seen this trip up other users when writing routes: https://htcondor-ce.readthedocs.io/en/latest/batch-system-integration/#editing-attributes.

- Brian

On 6/22/20 8:24 AM, Fischer, Max (SCC) wrote:
Hi all,

weâve just had an HTCondor-CE Job Router _expression_ behave weirdly because we seem to mishandle the number of CPUs requested. This seems to be wildly different from regular Condor.
Since the evaluation order of a JRE seems random, we sometimes end up with the correct value (evaluated by the CE) and sometimes not (the initial job value).

In short, what *is* the correct job attribute to check the number of requested cpus in HTCondor-CE?

Looking at a known 8-Core job:
It seems job RequestCpus is a dummy. Using it in the JRE leads to the unexpected behaviour depending on evaluation order, and orig_RequestCpus always ends up as 1. Is this correct? This is what we usually use in Condor, so that came as a surprise.

Other candidate attributes are: OriginalCpus, xcount, remote_SMPGranularity (from GlideinWMS?), but none of these seem documented either for HTCondor-CE or HTCondor itself. Can we use them? Should we use them?


HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: