[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] [Issue][v8.6.11] - Setting the NiceUser parameter to "TRUE" breaks group quotas.
- Date: Fri, 05 Feb 2021 09:22:21 -0600
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] [Issue][v8.6.11] - Setting the NiceUser parameter to "TRUE" breaks group quotas.
On 2/5/21 6:25 AM, MaÃl Lefeuvre wrote:
We're currently struggling to make Group-Quotas and the "nice-user"
feature of HTCondor coexist within our pool (v8.6.11). I've heard this
is a bug that has just recently been fixed in v8.9.9 & v8.9.10, but
I'm nonetheless posting the issue to see if anyone can help us in
designing a workaround.
If you want to stick with 8.6, you might consider a concurrency limits
solution instead of a group quota solution.Â This requires a static
limit on the number of long jobs, but if that's acceptable, you could
configure something like:
On the central manager
LONGJOB_LIMIT = 100
In submit files that want long jobs
concurrency_limits = LongJob
And in your hold expressions on the worker nodes, check for jobs that
have 'ConcurrencyLimits = "LongJob"' in them.
We're trying to create a "dynamic" queuing system on our computer
cluster were our entire pool is designed to limit the runtime of
submitted jobs by default, while still allowing for a limited number
of "unrestricted jobs" to exist (unlimited lifetime).
The solution we designed makes use of accounting groups and group
- The accounting group "LongJobs" is accessible to all users.
- A dynamic group quota then sets a hard limit to these Longjobs to
~75% of our pool(no surplus allowed)
- Jobs will get held if they exceed a runtime of 1hour UNLESS the user
is a member of the "LongJob" group.
GROUP_NAMES = LongJobs
GROUP_QUOTA_DYNAMIC_LongJobs = 0.75
GROUP_ACCEPT_SURPLUS_LongJobs = false
RUNTIME_EXCEEDED = (TARGET.AcctGroup=!="LongJobs" && (JobStatus==2) &&
(time() - EnteredCurrentStatus) >(1*3600))
PREMPTÂÂÂÂÂÂ = [...]
WANT_SUSPEND = [...]
WANT_HOLDÂÂÂ = [...]
This ensures at least 25% of our pool stays available to run short
jobs, while still giving users the ability to submit (very) long jobs.
ÂÂ accounting_group = "LongJobs"
ÂÂ nice_userÂÂÂÂÂÂÂ = True
...within a submit description file will overwrite the group quota :
the user becomes "nice-user.LongJobs.<user>@<domain>" and is not
recognized as a valid accounting group when looking at "condor_userprio".
Thus, "nice-user.LongJobs.<user>" jobs are able to completely fill our
pool, while retaining the privileged policies that are attached to the
An obvious and straighforward solution could be to disable the
nice-user setting entirely but this feature is in our case very
popular with our users, so keeping it intact remains a priority.
My question is therefore :
 - Does anyone know of a way to make GroupQuotas and nice-user
policies coexist within a v8.6.11 HTCondor Pool ?
 - If the answer to the first question is "No", are there viable
alternatives to implement our desired policy while keeping the
nice-user parameter intact ?
Any help or suggestions would be greatly appreciated and thanks in
advance to anyone willing to take a closer look at this issue (or has
kept reading 'till this point).
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
You can also unsubscribe by visiting
The archives can be found at: