[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] How can a user bypass the negotiator?



Once a user has a claim to a slot it is allowed to send jobs to it
without going through a negotiation cycle (reducing latency and
improving throughput)

This can lead to starvation in cases where preemption is not allowed.

Previously people worked round this by allowing preemption but using
extremely long retirement periods, thus a job will last as long as it
needs by the claim will not.

In (IIRC) the 6.8 series CLAIM_WORKLIFE config variable was added which
allows the lifetime of this claim to be altered trading of the
throughput gains on smaller jobs for immediacy of other users jobs
taking the slot.

Matt 

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Carsten Aulbert
Sent: 10 March 2009 16:30
To: Condor-Users Mail List
Subject: [Condor-users] How can a user bypass the negotiator?

Hi all,

one of our "power" users just discovered something which I don't
understand. Our cluster has many nodes and two submit machines with
almost the same setup (HA). Box A is currently running the negotiator,
we only preempt the backfill, but no user jobs, even
negotiator_consider_preemption is false.

Power user 1 (U1) submits 10000 jobs on A, power user 2 (U2) a DAG job
spawning many jobs on machine B. Currently the situation is as follows:

U2
effective prio:		1623673.62
running jobs:		4114
idle: 			13000

U1
effective prio:		33489760.00
running jobs:		2410
idle:			0
hold: 			12000

now the fun part.

U1 submits 10 short running jobs on both A and B. Since his prio is much
worse than U2's and CLAIM_WORKLIFE is set to zero, I would expect that
no jobs are run from him, however:

on B:
3/10 17:06:13 (pid:24448) Negotiating for owner: U1 3/10 17:06:13
(pid:24448) Lost priority - 0 jobs matched

looks good

on A:
3/10 17:13:05 (pid:26999) Starting add_shadow_birthdate(541094.0) 3/10
17:13:05 (pid:26999) Started shadow for job 541094.0 on
slot2@xxxxxxxxxxxxxxxxx <10.10.13.65:35449> for U2, (shadow pid = 28027)

[...]

However, user U1 never shows up in the negotiator.

Now my question, how can U1 bypass the negotiator and also U2's jobs? Or
maybe my understanding of Condor needs to be improved again ;)


Cheers

Carsten
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----