Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] negotiator "poor" performance issue

Date: Fri, 14 Mar 2014 13:09:04 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] negotiator "poor" performance issue

On 3/14/2014 5:42 AM, Pek Daniel wrote:

Hi,


Hi Daniel, some thoughts inline...


I assigned to the jobs I submitted randomized priorities, because
otherwise the negotiator would go through the schedds sequentially
(first, it runs all the jobs from schedd1, then from schedd2, etc).
I've also set:
USE_GLOBAL_JOB_PRIOS = true

Just FYI - the negotiator communicates with schedds in user priorityorder regardless of schedd. So if your jobs were submitted fromdifferent users (or with different accounting_groups), the negotiatorwould not go through all the schedds sequentially.

I don't use job arrays or clusters and I can't consider using them,
this is a constraint.


^^^ This is a bummer...


In this way, I could achieve ~10 jobs / sec negotiation (dispatching)
rate (not using priorities doesn't change this).

My questions:
- did anybody measure before a higher dispatch rate?
- is this 10 jobs / sec considered a "normal" or "good enough" value
in case of HTCondor?

Of course we are always working to improve the at which the negotiatormakes matches, and we have several ideas/plans on the horizon.

However, negotiator match rate for most real-world scenarios is not asimportant as it may seem. The reason is because negotiator match ratehas little to do with job start rate in HTCondor. When the negotiatormakes a match, it hands it out to a schedd. This schedd then claims theslot, and starts a job. A key point is that when the job completes, theschedd will find another job from that same user that matches the slotand start it **without any involvement from the condor_negotiator**.The schedd will keep using and reusing a slot it has claimed for jobafter job until the match is broken. With a default CLAIM_WORKLIFE (seehttp://goo.gl/VOg9nm ) of an hour there are not typically that manyUnclaimed machines on any given negotiation cycle (i.e. machines thatare not already assigned to a schedd) that the negotiator has to worryabout. In other words, the negotiator is not typically involved at jobboundaries, but only when claims need to move from one user/schedd toanother due to priorities...


Hope the above makes sense...

- can I do anything without touching the source to increase the
negotiation performance?

Tuning knobs like NEGOTIATOR_INFORM_STARTD could help, but not sure howmuch. I guess you also need to think about how important/relevant of ametric negotiator dispatch rate is for your scenario. Maybe sustainedjob completion rate makes more sense. See


http://research.cs.wisc.edu/htcondor/CondorWeek2011/presentations/tannenba-roadmap.pdf

for a bunch of performance graphs starting around slide 18. Forexample, tests back with v7.6.0 showed a negotiator matchmaking rate of8 per second (close to what you found), but because the schedd reusesmatches, the sustained job completion rate for just one schedd was 80jobs/second. And of course, you can scale job completion ratehorizontally by adding more schedds.

You may find the following paper of interest, even though it is gettinga bit old:

Dan Bradley, Timothy St Clair, Matthew Farrellee, Ziliang Guo, MironLivny, Igor Sfiligoi, and Todd Tannenbaum, "An update on the scalabilitylimits of the Condor batch system", Journal of Physics: ConferenceSeries, Vol. 331, No. 6, 2011


http://research.cs.wisc.edu/htcondor/doc/chep10_condor_scalability.pdf

regards,
Todd

p.s. Also be aware the negotiator classad ("condor_status -negotiator-l") publishes a number of statistics related to matchmakingperformance, see http://goo.gl/BbIp9R . Useful for graphing withcondor_gandliad


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685

Follow-Ups:
- Re: [HTCondor-users] negotiator "poor" performance issue
  - From: Pek Daniel
- Re: [HTCondor-users] negotiator "poor" performance issue
  - From: Pek Daniel

References:
- [HTCondor-users] negotiator "poor" performance issue
  - From: Pek Daniel

Prev by Date: Re: [HTCondor-users] negotiator "poor" performance issue
Next by Date: Re: [HTCondor-users] negotiator "poor" performance issue
Previous by thread: Re: [HTCondor-users] negotiator "poor" performance issue
Next by thread: Re: [HTCondor-users] negotiator "poor" performance issue
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] negotiator "poor" performance issue