[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] the slot is occupied by one user all the time
- Date: Fri, 05 May 2017 10:44:58 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] the slot is occupied by one user all the time
On 5/4/2017 9:04 PM, jiangxw@xxxxxxxxxxxxxxx wrote:
In our environment, some slots have been used by one user. when this
user's jobs were completed, the slots used by this user will normally
release and return idle. But sometimes, these slots were still
allocated to the same user, how can we solve it?
What's more, this user's priority value is one large number, mapping
to very low priority. And at the same time, there were many users with
What could be the possible reasons?
Wish your reply. Thanks.
One possible reason is the CLAIM_WORKLIFE associated with the slot.
In the absence of job preemption, when a slot is claimed by a user,
HTCondor will not reassign that slot to a different user until the
amount of time specified by configuration parameter CLAIM_WORKLIFE
expires (by default it is set to be 1200 seconds/20 minutes). After a
user has had the slot for more than CLAIM_WORKLIFE, then the slot will
go back to unclaimed at soon as the current job on that slot exits.
For example, with CLAIM_WORKLIFE is at the default of 20 minutes,
imagine a low priority user submits a lot of 15 min long jobs. Then at
T=5 minutes a high priority user submits some jobs. When the low
priority user's job exits at T=15min, the slot will stay claimed by the
low priority user and will start another 15 min job. At T=30, when the
second low priority user job exits, the slot will switch to unclaimed
(because it was claimed for more than 20 min) and will get matched to
the high priority user.
You could always lower the value for CLAIM_WORKLIFE in the condor_config
on your execute nodes, but this could negatively impact utilization if
users submit large numbers of short jobs.
If anyone is really interested, several years ago I did some research to
better understand all the tradeoffs of lower -vs- higher values for
CLAIM_WORKLIFE. I could post the results here....