[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Limitations for Windows Submit Machines



Hi,

thanks for the tip on using Condor 7.5. I've updated the submit machine
to 7.5.4 and tried it with MAX_JOBS_RUNNING = 520. It all worked fine with
several submitted clusters and a total of about 20k jobs in the queue. This
is a massive performance improvement for us.

Cheers,
  Julian

----- Ursprüngliche Mail -----
> Von: "John (TJ) Knoeller" <johnkn@xxxxxxxxxxx>
> An: condor-users@xxxxxxxxxxx
> Gesendet: Mittwoch, 10. November 2010 16:23:50
> Betreff: Re: [Condor-users] Limitations for Windows Submit Machines
> Condor 7.5 for windows has some code in to re-use the shadow process
> rather than starting a new one for each job, that should help quite a
> bit with your problems communicating between the shadow processes and
> condor_schedd.
> 
> As for overall limits, we are still waiting for word from the field
> but
> I did some testing. I submitted a set of 5000 10 minute jobs with
> MAX_JOBS_RUNNING of about 2300* from a machine with 4 cores, and 16
> gigs
> of ram. The submit machine was sluggish, and condor_q took about a
> minute to respond, but I saw no other problems.
> 
> We can't often test at large scale with jobs that do real work, and
> this
> particular test was probably the lightest possible work per job on the
> submit machine. I'm eager to hear from condor sites that have
> realistic
> workloads and a variety of job sizes.
> 
> -tj
> 
> *The cluster could only run about 2300 jobs at a time
> 
> On 11/10/2010 3:44 AM, Julian Exner wrote:
> > Hi,
> >
> > does anyone have experience with the limitations for Windows submit
> > machines
> > with recent versions of Condor and Windows?
> > We are running a pool of potentially 520 available nodes with
> > Windows XP. The
> > machine used to submit into this cluster is a fairly powerful (8
> > cores,
> > 24 GiB RAM) server running Windows Server 2008 x64. The Condor
> > version is 7.4.1.
> > For long running jobs (about 30 minutes), it works to submit a
> > cluster with
> > about 8000 jobs with MAX_JOBS_RUNNING at 520. But with shorter jobs
> > (5 minutes)
> > , we run into issues with the communication between the
> > condor_shadow processes
> > and condor_schedd and condor_q not responding correctly any more.
> > With
> > MAX_JOBS_RUNNING at 250 these problems don't appear.
> > Another problems seems to be the overall size of the queue managed
> > by Condor.
> > With about 20k jobs in the queue, condor_q takes minutes to respond
> > if it does
> > at all.
> > Is there anything one can do to improve the performance in this
> > Windows
> > environment, or are we at the end of the line here? The bottleneck
> > seems to be
> > the performance of condor_schedd. Were there any improvements in
> > newer versions
> > of Condor?
> >
> > With best regards,
> > Julian
> >
> > PS: Sorry for the previous incomplete posts, but I had some issues
> > with my
> > Companies web mailer.
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> > with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/