[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_shadow start rate



On Fri, May 9, 2014 at 4:57 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> What are you using for JOB_START_DELAY ?  We run with JOB_START_DELAY set at
> zero (which has been the default for quite some time).  If JOB_START_DELAY
> is set in your config to something greater than zero, that could explain
> what you are seeing.

For reasons that I haven't yet dug out from the version control
history, we set JOB_START_DELAY = 1 by default. However, we did try
letting it revert to zero on the production cluster and it still
averaged three shadow starts per second. I had the same thought
(blaming JOB_START_DELAY), but even with JOB_START_DELAY, we see no
more than three shadows per second, so long as JOB_START_COUNT is at
least 3. It seems like there is another limiting factor in play, but I
haven't been able to determine what it might be.

I know Matt Farrellee did some testing [1] and got much higher
numbers, so it's not just that the shedd is incapable of doing this.
One thing that's noteworthy is that depending on your definition of
"start", we did see high start rates. The schedd got matches from the
negotiator quickly and jobs moved into the "running" state. However,
there was up to a 20-minute lag between the time a match was made and
when the shadow started, so we saw a lot of slots in Claimed/Idle.

[1] http://spinningmatt.wordpress.com/2011/04/15/quick-walk-with-condor-looking-at-scheduler-performance/

Thanks,
BC

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Leader in Utility HPC Software

http://www.cyclecomputing.com
twitter: @cyclecomputing