[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] sharing resources without preempt, suspend and kill ?



Le 22/04/2015 17:28, Ben Cotton a écrit :
Laurent,

I would expect the priority mechanism to take care of this for you.
What version of HTCondor are your execute nodes running? One thing
that comes to mind is perhaps you have an infinite CLAIM_WORKLIFE, so
the schedd/startd keep working together without going back to the
negotiator for a match. The default value of CLAIM_WORKLIFE changed
from -1 (infinite) in 7.8 to 3600 (1 hour) in 8.0 to 1200 (20 minutes)
in 8.2. So if your CLAIM_WORKLIFE is infinite, you may consider
shortening that and see if that helps. The tradeoff is that you'll
increase the scheduling overhead, but for a pool of your size it's not
going to be an issue.

I wrote a post on the Cycle Computing blog about this last year that
might be helpful:
http://www.cyclecomputing.com/blog/how-to-use-htcondors-claim_worklife-to-optimize-cluster-throughput/


Thanks,
BC

Hello Ben.

We’re running 8.2.8 and started using Condor at 7.8.x
I guess my users were not patient enough ;) CLAIM_WORKLIFE has never been changed.
Thanks for the tip anyway, I’ll tweak it if needed.

Best,

--
Laurent Wandrebeck
HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies
165 Avenue de Bretagne
59000 Lille, France
tel: +33 3 20 08 24 98
http://www.hygeos.com
GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C