[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Limitations for Windows Submit Machines



Those numbers tally with the limits we have experienced on windows in the 7.2 series.

In fact going to over 250 is very significant, we never try going over that from one box (though we tend to always have some quick/slowjobs in any batch).

There are changes in the 7.5 development release intended to bump up the windows limits to more than this (though not to the same capabilities of the *nix ones) so you may want to risk that. I haven't used it so can't recommend it either way (though I'd love to know).

I would point out that using condor this way means you have a central point of failure which is likely to bite you later, you may not mind about that though.

Matt

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Julian Exner
Sent: 10 November 2010 09:45
To: Condor-Users Mail List
Subject: [Condor-users] Limitations for Windows Submit Machines

Hi,

does anyone have experience with the limitations for Windows submit machines 
with recent versions of Condor and Windows?
We are running a pool of potentially 520 available nodes with Windows XP. The
machine used to submit into this cluster is a fairly powerful (8 cores, 
24 GiB RAM) server running Windows Server 2008 x64. The Condor version is 7.4.1.
For long running jobs (about 30 minutes), it works to submit a cluster with 
about 8000 jobs with MAX_JOBS_RUNNING at 520. But with shorter jobs (5 minutes)
, we run into issues with the communication between the condor_shadow processes 
and condor_schedd and condor_q not responding correctly any more. With 
MAX_JOBS_RUNNING at 250 these problems don't appear.
Another problems seems to be the overall size of the queue managed by Condor.
With about 20k jobs in the queue, condor_q takes minutes to respond if it does
at all.
Is there anything one can do to improve the performance in this Windows 
environment, or are we at the end of the line here? The bottleneck seems to be
the performance of condor_schedd. Were there any improvements in newer versions
of Condor?

With best regards,
Julian

PS: Sorry for the previous incomplete posts, but I had some issues with my
Companies web mailer.  
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

--------------
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
--------------