[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Limitations for Windows Submit Machines
- Date: Wed, 10 Nov 2010 09:23:50 -0600
- From: "John (TJ) Knoeller" <johnkn@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Limitations for Windows Submit Machines
Condor 7.5 for windows has some code in to re-use the shadow process
rather than starting a new one for each job, that should help quite a
bit with your problems communicating between the shadow processes and
As for overall limits, we are still waiting for word from the field but
I did some testing. I submitted a set of 5000 10 minute jobs with
MAX_JOBS_RUNNING of about 2300* from a machine with 4 cores, and 16 gigs
of ram. The submit machine was sluggish, and condor_q took about a
minute to respond, but I saw no other problems.
We can't often test at large scale with jobs that do real work, and this
particular test was probably the lightest possible work per job on the
submit machine. I'm eager to hear from condor sites that have realistic
workloads and a variety of job sizes.
*The cluster could only run about 2300 jobs at a time
On 11/10/2010 3:44 AM, Julian Exner wrote:
does anyone have experience with the limitations for Windows submit machines
with recent versions of Condor and Windows?
We are running a pool of potentially 520 available nodes with Windows XP. The
machine used to submit into this cluster is a fairly powerful (8 cores,
24 GiB RAM) server running Windows Server 2008 x64. The Condor version is 7.4.1.
For long running jobs (about 30 minutes), it works to submit a cluster with
about 8000 jobs with MAX_JOBS_RUNNING at 520. But with shorter jobs (5 minutes)
, we run into issues with the communication between the condor_shadow processes
and condor_schedd and condor_q not responding correctly any more. With
MAX_JOBS_RUNNING at 250 these problems don't appear.
Another problems seems to be the overall size of the queue managed by Condor.
With about 20k jobs in the queue, condor_q takes minutes to respond if it does
Is there anything one can do to improve the performance in this Windows
environment, or are we at the end of the line here? The bottleneck seems to be
the performance of condor_schedd. Were there any improvements in newer versions
With best regards,
PS: Sorry for the previous incomplete posts, but I had some issues with my
Companies web mailer.
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
You can also unsubscribe by visiting
The archives can be found at: