[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] RANK & PREEMPTion question



Mark,
The startd's RANK statement is preemptive by nature, so this policy
will always preempt jobs based upon that preference. The
PREEMPTION_RANK and PREEMPTION_REQUIREMENTS expressions are only valid
on the negotiator, and as you have them set they will remove user
priority preemption.

You might be able to accomplish your goal using the job's Rank
preference. Job Rank is non-preemptive, so you could make your jobs so
that they prefer running on those machines. You would place an
attribute in your startds' classads to indicate they are a group that
can be preferred by that user (be sure to put the attribute in the
STARTD_ATTRS setting). Then in the jobs' submission files you would
rank machines with that attribute higher than others. When the job
gets negotiated, Condor sorts the machines meeting the jobs
requirements by the Job's Rank value, and thus, when possible the jobs
would run on those machines over any others. Depending upon whether
jobs run on your pool are user or application-centric might influence
the name of the attribute you place on the startd's, and there may be
other means to accomplish this end, but the above is a straightforward
mechanism. Hope this helps.

Best of luck,
Jason

--

===================================
Jason A. Stowe

Phone: 607.227.9686
jstowe@xxxxxxxxxxxxxxxxxx

Cycle Computing, LLC
http://www.cyclecomputing.com

On 11/25/06, Mark Calleja <M.Calleja@xxxxxxxxxxxxxxx> wrote:
Hi,

I wanted to set up a machine such that it preferred new jobs from a
particular user (say, "mcal00"), but not to preempt any existing jobs.
Hence I set up the following policy on that machine:

KILL = False
PREEMPT = False
PREEMPTION_RANK = False
PREEMPTION_REQUIREMENTS = False
RANK =  (Owner == "mcal00")
START = True
SUSPEND = False
WANT_SUSPEND = True
WANT_VACATE = False

However, when a job was submitted by user mcal00 it did preemt the
existing job, as shown by the following snippet of the StartLog:

11/25 07:07:16 match_info called
11/25 07:07:16 DaemonCore: Command received via TCP from host
<172.24.116.7:9637>
11/25 07:07:16 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler (command_request_claim)
11/25 07:07:16 Preempting claim has correct ClaimId.
11/25 07:07:16 New claim has sufficient rank, preempting current claim.
11/25 07:07:16 State change: preempting claim based on machine rank
11/25 07:07:16 State change: retiring due to preempting claim
11/25 07:07:16 Changing activity: Busy -> Retiring
11/25 07:07:16 State change: retirement ended/expired
11/25 07:07:16 Changing state and activity: Claimed/Retiring ->
Preempting/Vacating
11/25 07:07:16 DaemonCore: Command received via TCP from host
<172.24.116.196:9739>
11/25 07:07:16 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
11/25 07:07:16 Got KILL_FRGN_JOB while in Preempting state, ignoring.
11/25 07:07:16 DaemonCore: Command received via UDP from host
<172.24.116.196:10395>
11/25 07:07:16 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
11/25 07:07:16 Got RELEASE_CLAIM while in Preempting state, ignoring.
11/25 07:07:16 DaemonCore: Command received via UDP from host
<172.24.116.196:9856>
11/25 07:07:16 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
11/25 07:07:16 Got RELEASE_CLAIM while in Preempting state, ignoring.
11/25 07:07:16 Starter pid 31071 exited with status 0
11/25 07:07:16 State change: starter exited
11/25 07:07:16 State change: preempting claim exists - START is true or
undefined
11/25 07:07:16 Remote owner is mcal00@xxxxxxxxxxxxxxxxxxxxxxxxx
11/25 07:07:16 State change: claiming protocol successful
11/25 07:07:16 Changing state and activity: Preempting/Vacating ->
Claimed/Idle
11/25 07:07:18 DaemonCore: Command received via TCP from host
<172.24.116.7:9611>
11/25 07:07:18 DaemonCore: received command 444 (ACTIVATE_CLAIM),
calling handler (command_activate_claim)
11/25 07:07:18 Got activate_claim request from shadow (<172.24.116.7:9611>)
11/25 07:07:18 Remote job ID is 119.0

All resources are running v6.8.2 of Condor. Can anyone suggest where
I've gone wrong with my policy?

Cheers,
Mark
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR