[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] the slot is occupied by one user all the time



On 5/4/2017 9:04 PM, jiangxw@xxxxxxxxxxxxxxx wrote:
Dear all,
    In our environment, some slots have been used by one user. when this
user's jobs were completed, the slots used by this user will normally
release and return idle.  But sometimes,  these slots were still
allocated to the same user, how can we solve it?
    What's more, this user's priority value is one large number, mapping
to very low priority. And at the same time, there were many users with
higher priority.
    What could be the possible reasons?
    Wish your reply. Thanks.


One possible reason is the CLAIM_WORKLIFE associated with the slot.

In the absence of job preemption, when a slot is claimed by a user, HTCondor will not reassign that slot to a different user until the amount of time specified by configuration parameter CLAIM_WORKLIFE expires (by default it is set to be 1200 seconds/20 minutes). After a user has had the slot for more than CLAIM_WORKLIFE, then the slot will go back to unclaimed at soon as the current job on that slot exits.

For example, with CLAIM_WORKLIFE is at the default of 20 minutes, imagine a low priority user submits a lot of 15 min long jobs. Then at T=5 minutes a high priority user submits some jobs. When the low priority user's job exits at T=15min, the slot will stay claimed by the low priority user and will start another 15 min job. At T=30, when the second low priority user job exits, the slot will switch to unclaimed (because it was claimed for more than 20 min) and will get matched to the high priority user.

You could always lower the value for CLAIM_WORKLIFE in the condor_config on your execute nodes, but this could negatively impact utilization if users submit large numbers of short jobs.

See

http://research.cs.wisc.edu/htcondor/manual/v8.7/3_5Configuration_Macros.html#param:ClaimWorklife

If anyone is really interested, several years ago I did some research to better understand all the tradeoffs of lower -vs- higher values for CLAIM_WORKLIFE. I could post the results here....

regards
Todd