Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Preemption question

Date: Fri, 24 Mar 2006 10:51:48 -0600 (CST)
From: Steven Timm <timm@xxxxxxxx>
Subject: [Condor-users] Preemption question


I have a condor pool where most of the machines are set to
never pre-empt.  I thought that this setting would mean that
pre-emption doesn't happen but it  appears I am wrong.

On 15 of my machines I have the following settings
(and condor_config_val acknowledges they are seen both
by the startd on the machine and by the negotiator/collector).

[root@fnpc182 log]# condor_config_val -startd PREEMPT
FALSE
[root@fnpc182 log]# condor_config_val -startd PREEMPTION_REQUIREMENTS
FALSE
[root@fnpc182 log]# condor_config_val -startd START
TRUE
[root@fnpc182 log]# condor_config_val -startd RANK
(agroup == "group_numi" ) * 1000


What I want to happen is to give this machine priority of
starting jobs from group_numi, (agroup is a group attribute that
I set in the classads of all jobs).  But I don't want it to
pre-empt an existing job of some other group if that job is
not yet finished yet.

What is actually happening is the following:

From StartLog

3/23 13:22:56 DaemonCore: Command received via UDP from host<131.225.167.42:198

21>

3/23 13:22:56 DaemonCore: received command 440 (MATCH_INFO), callinghandler (co

mmand_match_info)
3/23 13:22:56 vm1: match_info called

3/23 13:22:56 DaemonCore: Command received via UDP from host<131.225.167.42:198

21>

3/23 13:22:56 DaemonCore: received command 440 (MATCH_INFO), callinghandler (co

mmand_match_info)
3/23 13:22:56 vm2: match_info called

3/23 13:22:56 DaemonCore: Command received via TCP from host<131.225.167.42:307

85>

3/23 13:22:56 DaemonCore: received command 442 (REQUEST_CLAIM), callinghandler

(command_request_claim)
3/23 13:22:56 vm1: Preempting claim has correct ClaimId.

3/23 13:22:56 vm1: New claim has sufficient rank, preempting currentclaim.

3/23 13:22:56 vm1: State change: preempting claim based on machine rank
3/23 13:22:56 vm1: State change: retiring due to preempting claim
3/23 13:22:56 vm1: Changing activity: Busy -> Retiring
3/23 13:22:56 vm1: State change: retirement ended/expired

3/23 13:22:56 vm1: Changing state and activity: Claimed/Retiring ->Preempting/V

acating

3/23 13:22:56 DaemonCore: Command received via TCP from host<131.225.167.42:307

86>

3/23 13:22:56 DaemonCore: received command 442 (REQUEST_CLAIM), callinghandler

(command_request_claim)
3/23 13:22:56 vm2: Preempting claim has correct ClaimId.

3/23 13:22:56 vm2: New claim has sufficient rank, preempting currentclaim.

3/23 13:22:56 vm2: State change: preempting claim based on machine rank
3/23 13:22:56 vm2: State change: retiring due to preempting claim
3/23 13:22:56 vm2: Changing activity: Busy -> Retiring
3/23 13:22:56 vm2: State change: retirement ended/expired

3/23 13:22:56 vm2: Changing state and activity: Claimed/Retiring ->Preempting/V

acating

3/23 13:22:56 DaemonCore: Command received via TCP from host<131.225.167.42:307

94>

3/23 13:22:56 DaemonCore: received command 404(DEACTIVATE_CLAIM_FORCIBLY), call

ing handler (command_handler)
3/23 13:22:56 vm1: Got KILL_FRGN_JOB while in Preempting state, ignoring.
3/23 13:22:56 Starter pid 4093 exited with status 0

3/23 13:22:56 vm1: State change: preempting claim exists - START is trueor unde

fined
3/23 13:22:56 vm1: Remote owner is rubin@xxxxxxxx
3/23 13:22:56 vm1: State change: claiming protocol successful

3/23 13:22:56 vm1: Changing state and activity: Preempting/Vacating ->Claimed/I

dle

3/23 13:22:56 DaemonCore: Command received via TCP from host<131.225.167.42:307

96>

3/23 13:22:56 DaemonCore: received command 404(DEACTIVATE_CLAIM_FORCIBLY), call

ing handler (command_handler)
3/23 13:22:56 vm2: Got KILL_FRGN_JOB while in Preempting state, ignoring.

3/23 13:22:56 DaemonCore: Command received via UDP from host<131.225.167.42:198

49>

3/23 13:22:56 DaemonCore: received command 443 (RELEASE_CLAIM), callinghandler

(command_release_claim)

3/23 13:22:56 Warning: can't find resource with ClaimId(<131.225.167.182:22866>

#1142441053#75)

3/23 13:22:57 DaemonCore: Command received via UDP from host<131.225.167.42:198

49>

3/23 13:22:57 DaemonCore: received command 443 (RELEASE_CLAIM), callinghandler

(command_release_claim)
3/23 13:22:57 vm2: Got RELEASE_CLAIM while in Preempting state, ignoring.

3/23 13:22:57 DaemonCore: Command received via UDP from host<131.225.167.42:198

49>

3/23 13:22:57 DaemonCore: received command 443 (RELEASE_CLAIM), callinghandler

(command_release_claim)
3/23 13:22:57 vm2: Got RELEASE_CLAIM while in Preempting state, ignoring.

3/23 13:23:01 DaemonCore: Command received via TCP from host<131.225.167.42:308

56>

3/23 13:23:01 DaemonCore: received command 444 (ACTIVATE_CLAIM), callinghandler

 (command_activate_claim)

\
and in NegotiatorLog it indicated that there was indeed
a job from a user in group_numi, with priority 16, who pre-empted
the existing job from a user not in group_numi, at the time had a priority
of 160.

How do we beat this, is there any way to give preference for
starting jobs without having pre-emption go on?

Steve Timm



--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525  timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Div/Core Support Services Dept./Scientific Computing Section
Assistant Group Leader, Farms and Clustered Systems Group
Lead of Computing Farms Team

Follow-Ups:
- Re: [Condor-users] Preemption question
  - From: Dan Bradley

Prev by Date: Re: [Condor-users] java job fails to start / winxp
Next by Date: Re: [Condor-users] Preemption question
Previous by thread: Re: [Condor-users] java job fails to start / winxp
Next by thread: Re: [Condor-users] Preemption question
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] Preemption question