[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] sharing resources without preempt, suspend and kill ?



Laurent,

I would expect the priority mechanism to take care of this for you.
What version of HTCondor are your execute nodes running? One thing
that comes to mind is perhaps you have an infinite CLAIM_WORKLIFE, so
the schedd/startd keep working together without going back to the
negotiator for a match. The default value of CLAIM_WORKLIFE changed
from -1 (infinite) in 7.8 to 3600 (1 hour) in 8.0 to 1200 (20 minutes)
in 8.2. So if your CLAIM_WORKLIFE is infinite, you may consider
shortening that and see if that helps. The tradeoff is that you'll
increase the scheduling overhead, but for a pool of your size it's not
going to be an issue.

I wrote a post on the Cycle Computing blog about this last year that
might be helpful:
http://www.cyclecomputing.com/blog/how-to-use-htcondors-claim_worklife-to-optimize-cluster-throughput/


Thanks,
BC

-- 
Ben Cotton
main: 888.292.5320

Cycle Computing
Better Answers. Faster.

http://www.cyclecomputing.com
twitter: @cyclecomputing