[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Number of computes limitation



>
> We just set up a 128 node (dual CPU's so 256 node) Linux/Condor 6.6.10
> cluster and seem to be running into problems with the head node
handing
> out work to the compute nodes. It looks like the head node won't hand
out
> more that 70 or 80 jobs at time, and will wait for those 70 or 80 jobs
to
> complete before handing out other jobs in the queue?
> 
> We have a test cluster of 75 nodes and have no problems with that so I
am
> wondering if we are hitting some limitation with Condor itself,
handling
> more than 70 - 80 nodes per head node?

What exactly do you mean by 'head node'?  Is it your
collector/negotiator and a schedd (and that's your only schedd) ?

It's hard to say much without more information.  Turn up the logging to
D_FULLDEBUG (set ALL_DEBUG = D_FULLDEBUG) and take a look at the
ScheddLog and NegotiatorLog.  Also, what are your values for
MAX_JOBS_RUNNING and RESERVED_SWAP?  See

http://docs.optena.com/display/CONDOR/How+To+Increase+Debugging+Messages
, 
http://docs.optena.com/display/CONDOR/MAX_JOBS_RUNNING, and 
http://docs.optena.com/display/CONDOR/RESERVED_SWAP

Mike Yoder
Principal Member of Technical Staff
Ask Mike: http://docs.optena.com
Direct  : +1.408.321.9000
Fax     : +1.408.321.9030
Mobile  : +1.408.497.7597
yoderm@xxxxxxxxxx

Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com