It’s a while since I did much with Condor but there was a
registry setting which needed to be adjusted to allow many concurrent jobs on
Windows. My memory is rather vague but I think in Condor.ini there is a limit
for number of concurrent jobs. Worth looking at.
Philip Crawford B Comp Sc, MIEEE
Medicine Computer Support Unit
The University of NSW, Sydney, NSW, 2052
Phone: +61-2-9385 2564
Fax: +61-2-9385 1258
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dr Alan O Cais
Sent: Thursday, 30 August 2007 11:39 AM
To: Condor-Users Mail List
Subject: [Condor-users] Networking problems
I manage a large(ish) cluster of Windows machines with only a few users
submitting a large number (100s) of jobs from a single machine through a Cygwin
ssh server. When about 200 jobs are started the ssh server begins to fall over
and only accepts new connections intermittently (as in it doesn't work most of
the time) until the jobs have completed. Also about 50 of the machines are
currently stuck in a preempting state (and have been for over a day even though
I deleted the actual jobs yesterday afternoon).
I figured this is probably a port/network problem but was hoping someone might
throw some light my way. I'm running XP on all boxes, Condor 6.8.3 and its a
Uni-wide 10/100 ethernet network. Each job at present involves the transfer of
about 50MB of data.