[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] [Issue][v8.6.11] - Setting the NiceUser parameter to "TRUE" breaks group quotas.



Greetings,

We're currently struggling to make Group-Quotas and the "nice-user" feature of HTCondor coexist within our pool (v8.6.11). I've heard this is a bug that has just recently been fixed in v8.9.9 & v8.9.10, but I'm nonetheless posting the issue to see if anyone can help us in designing a workaround.


CONTEXT:

We're trying to create a "dynamic" queuing system on our computer cluster were our entire pool is designed to limit the runtime of submitted jobs by default, while still allowing for a limited number of "unrestricted jobs" to exist (unlimited lifetime).

The solution we designed makes use of accounting groups and group quotas :

- The accounting group "LongJobs" is accessible to all users.
- A dynamic group quota then sets a hard limit to these Longjobs to ~75% of our pool(no surplus allowed) - Jobs will get held if they exceed a runtime of 1hour UNLESS the user is a member of the "LongJob" group.

-----------------------
GROUP_NAMES = LongJobs
GROUP_QUOTA_DYNAMIC_LongJobs = 0.75
GROUP_ACCEPT_SURPLUS_LongJobs = false

RUNTIME_EXCEEDED = (TARGET.AcctGroup=!="LongJobs" && (JobStatus==2) && (time() - EnteredCurrentStatus) >(1*3600))
PREMPT       = [...]
WANT_SUSPEND = [...]
WANT_HOLD    = [...]
-----------------------

This ensures at least 25% of our pool stays available to run short jobs, while still giving users the ability to submit (very) long jobs.


PROBLEM:

Setting...
   accounting_group = "LongJobs"
   nice_user        = True

...within a submit description file will overwrite the group quota : the user becomes "nice-user.LongJobs.<user>@<domain>" and is not recognized as a valid accounting group when looking at "condor_userprio".

Thus, "nice-user.LongJobs.<user>" jobs are able to completely fill our pool, while retaining the privileged policies that are attached to the "LongJobs" group...

An obvious and straighforward solution could be to disable the nice-user setting entirely but this feature is in our case very popular with our users, so keeping it intact remains a priority.


My question is therefore :

[1] - Does anyone know of a way to make GroupQuotas and nice-user policies coexist within a v8.6.11 HTCondor Pool ? [2] - If the answer to the first question is "No", are there viable alternatives to implement our desired policy while keeping the nice-user parameter intact ?


Any help or suggestions would be greatly appreciated and thanks in advance to anyone willing to take a closer look at this issue (or has kept reading 'till this point).

Cheers,

MaÃl