[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Preemption question



On Fri, 24 Mar 2006, Dan Bradley wrote:


There are a number of different possible causes of preemption in Condor,
and your policy eliminates most but not all of them.  The startd RANK
expression is treated as an overriding directive by the negotiator,
trumping the normal user-priority based calculations (and therefore
PREEMPTION_REQUIREMENTS).  This means that your policy will cause
precisely the kind of preemption that you have observed--members of
group "numi" will preempt other users.

All the various tutorials I've been to and manuals I have read
didn't tell me that.  Interesting.  I thought as long as we
had PREEMPTION_REQUIREMENTS false we wouldn't preempt.
There are various pieces of documentation, particularly dealing
with the DedicatedScheduler, that seem to indicate that.

Most of our machines have a Rank = 0 (default)..it's only
on these machines with the non-standard rank that we see the preemption happening.

The effect we want to have is the following:

these 15 machines are owned by group_numi.
If the queue is full and all machines are claimed, and there
are jobs waiting from both group_numi and from others, then
on these 15 machines we want the job from group_numi to start,
independent of what user priority group_numi may have at the time.


One solution to this is to use MaxJobRetirementTime.  This allows you to
have preemption of resource claims without having killing of jobs.  The
expression specifies the number of seconds since the job started running
that the job will be allowed to run without interruption from kill
signals, even if the claim is preempted.  This applies to all forms of
preemption, including startd RANK preemption.

I'm aware of MaxJobRetirementTime, it is being used at several other
clusters here at Fermilab.  We tend to have extra-long jobs
run on this cluster from time to time, and since it is severely overbooked, pre-emption would be happening so much that jobs
would not get a chance to finish.  We would really rather not
have pre-emption happen at all, even if the cost is some idle
time on the cluster every once in a while.


If you do decide to use this policy mechanism, then you could consider
turning back on PREEMPTION_REQUIREMENTS, which allows the normal fair
share algorithm to adjust resource claims.  If you disable this form of
preemption, then the problem is that once a user gets a claim to a
machine, the schedd may hang on to it indefinitely if the user keeps
enough jobs waiting in the queue.

I had no idea up until now that a user through the schedd could keep a claim on a machine between the finishing of a job and the start of a new one. Where is there more information in the condor docs that
describes this situation? We may have to rethink our whole
strategy on how we do our batch system here.

Another way to solve that is to use
CLAIM_WORKLIFE to set an upper bound on how long a claim will keep
accepting more jobs.

I will have to look at this one.

Steve


--Dan

Steven Timm wrote:

I have a condor pool where most of the machines are set to
never pre-empt.  I thought that this setting would mean that
pre-emption doesn't happen but it  appears I am wrong.

On 15 of my machines I have the following settings
(and condor_config_val acknowledges they are seen both
by the startd on the machine and by the negotiator/collector).

[root@fnpc182 log]# condor_config_val -startd PREEMPT
FALSE
[root@fnpc182 log]# condor_config_val -startd PREEMPTION_REQUIREMENTS
FALSE
[root@fnpc182 log]# condor_config_val -startd START
TRUE
[root@fnpc182 log]# condor_config_val -startd RANK
(agroup == "group_numi" ) * 1000


What I want to happen is to give this machine priority of
starting jobs from group_numi, (agroup is a group attribute that
I set in the classads of all jobs).  But I don't want it to
pre-empt an existing job of some other group if that job is
not yet finished yet.

What is actually happening is the following:

From StartLog
3/23 13:22:56 DaemonCore: Command received via UDP from host
<131.225.167.42:198
21>
3/23 13:22:56 DaemonCore: received command 440 (MATCH_INFO), calling
handler (co
mmand_match_info)
3/23 13:22:56 vm1: match_info called
3/23 13:22:56 DaemonCore: Command received via UDP from host
<131.225.167.42:198
21>
3/23 13:22:56 DaemonCore: received command 440 (MATCH_INFO), calling
handler (co
mmand_match_info)
3/23 13:22:56 vm2: match_info called
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
85>
3/23 13:22:56 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler
(command_request_claim)
3/23 13:22:56 vm1: Preempting claim has correct ClaimId.
3/23 13:22:56 vm1: New claim has sufficient rank, preempting current
claim.
3/23 13:22:56 vm1: State change: preempting claim based on machine rank
3/23 13:22:56 vm1: State change: retiring due to preempting claim
3/23 13:22:56 vm1: Changing activity: Busy -> Retiring
3/23 13:22:56 vm1: State change: retirement ended/expired
3/23 13:22:56 vm1: Changing state and activity: Claimed/Retiring ->
Preempting/V
acating
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
86>
3/23 13:22:56 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler
(command_request_claim)
3/23 13:22:56 vm2: Preempting claim has correct ClaimId.
3/23 13:22:56 vm2: New claim has sufficient rank, preempting current
claim.
3/23 13:22:56 vm2: State change: preempting claim based on machine rank
3/23 13:22:56 vm2: State change: retiring due to preempting claim
3/23 13:22:56 vm2: Changing activity: Busy -> Retiring
3/23 13:22:56 vm2: State change: retirement ended/expired
3/23 13:22:56 vm2: Changing state and activity: Claimed/Retiring ->
Preempting/V
acating
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
94>
3/23 13:22:56 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), call
ing handler (command_handler)
3/23 13:22:56 vm1: Got KILL_FRGN_JOB while in Preempting state, ignoring.
3/23 13:22:56 Starter pid 4093 exited with status 0
3/23 13:22:56 vm1: State change: preempting claim exists - START is true
or unde
fined
3/23 13:22:56 vm1: Remote owner is rubin@xxxxxxxx
3/23 13:22:56 vm1: State change: claiming protocol successful
3/23 13:22:56 vm1: Changing state and activity: Preempting/Vacating ->
Claimed/I
dle
3/23 13:22:56 DaemonCore: Command received via TCP from host
<131.225.167.42:307
96>
3/23 13:22:56 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), call
ing handler (command_handler)
3/23 13:22:56 vm2: Got KILL_FRGN_JOB while in Preempting state, ignoring.
3/23 13:22:56 DaemonCore: Command received via UDP from host
<131.225.167.42:198
49>
3/23 13:22:56 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler
(command_release_claim)
3/23 13:22:56 Warning: can't find resource with ClaimId
(<131.225.167.182:22866>
#1142441053#75)
3/23 13:22:57 DaemonCore: Command received via UDP from host
<131.225.167.42:198
49>
3/23 13:22:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler
(command_release_claim)
3/23 13:22:57 vm2: Got RELEASE_CLAIM while in Preempting state, ignoring.
3/23 13:22:57 DaemonCore: Command received via UDP from host
<131.225.167.42:198
49>
3/23 13:22:57 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler
(command_release_claim)
3/23 13:22:57 vm2: Got RELEASE_CLAIM while in Preempting state, ignoring.
3/23 13:23:01 DaemonCore: Command received via TCP from host
<131.225.167.42:308
56>
3/23 13:23:01 DaemonCore: received command 444 (ACTIVATE_CLAIM), calling
handler
 (command_activate_claim)

\
and in NegotiatorLog it indicated that there was indeed
a job from a user in group_numi, with priority 16, who pre-empted
the existing job from a user not in group_numi, at the time had a priority
of 160.

How do we beat this, is there any way to give preference for
starting jobs without having pre-emption go on?

Steve Timm





_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team