[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [Issue][v8.6.11] - Setting the NiceUser parameter to "TRUE" breaks group quotas.




On 2/5/21 6:25 AM, MaÃl Lefeuvre wrote:
Greetings,

We're currently struggling to make Group-Quotas and the "nice-user" feature of HTCondor coexist within our pool (v8.6.11). I've heard this is a bug that has just recently been fixed in v8.9.9 & v8.9.10, but I'm nonetheless posting the issue to see if anyone can help us in designing a workaround.


If you want to stick with 8.6, you might consider a concurrency limits solution instead of a group quota solution. This requires a static limit on the number of long jobs, but if that's acceptable, you could configure something like:

On the central manager

LONGJOB_LIMIT = 100

In submit files that want long jobs

concurrency_limits = LongJob


And in your hold expressions on the worker nodes, check for jobs that have 'ConcurrencyLimits = "LongJob"' in them.

-greg




CONTEXT:

We're trying to create a "dynamic" queuing system on our computer cluster were our entire pool is designed to limit the runtime of submitted jobs by default, while still allowing for a limited number of "unrestricted jobs" to exist (unlimited lifetime).

The solution we designed makes use of accounting groups and group quotas :

- The accounting group "LongJobs" is accessible to all users.
- A dynamic group quota then sets a hard limit to these Longjobs to ~75% of our pool(no surplus allowed) - Jobs will get held if they exceed a runtime of 1hour UNLESS the user is a member of the "LongJob" group.

-----------------------
GROUP_NAMES = LongJobs
GROUP_QUOTA_DYNAMIC_LongJobs = 0.75
GROUP_ACCEPT_SURPLUS_LongJobs = false

RUNTIME_EXCEEDED = (TARGET.AcctGroup=!="LongJobs" && (JobStatus==2) && (time() - EnteredCurrentStatus) >(1*3600))
PREMPTÂÂÂÂÂÂ = [...]
WANT_SUSPEND = [...]
WANT_HOLDÂÂÂ = [...]
-----------------------

This ensures at least 25% of our pool stays available to run short jobs, while still giving users the ability to submit (very) long jobs.


PROBLEM:

Setting...
ÂÂ accounting_group = "LongJobs"
ÂÂ nice_userÂÂÂÂÂÂÂ = True

...within a submit description file will overwrite the group quota : the user becomes "nice-user.LongJobs.<user>@<domain>" and is not recognized as a valid accounting group when looking at "condor_userprio".

Thus, "nice-user.LongJobs.<user>" jobs are able to completely fill our pool, while retaining the privileged policies that are attached to the "LongJobs" group...

An obvious and straighforward solution could be to disable the nice-user setting entirely but this feature is in our case very popular with our users, so keeping it intact remains a priority.


My question is therefore :

[1] - Does anyone know of a way to make GroupQuotas and nice-user policies coexist within a v8.6.11 HTCondor Pool ? [2] - If the answer to the first question is "No", are there viable alternatives to implement our desired policy while keeping the nice-user parameter intact ?


Any help or suggestions would be greatly appreciated and thanks in advance to anyone willing to take a closer look at this issue (or has kept reading 'till this point).

Cheers,

MaÃl



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/