[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Networking problems

It’s a while since I did much with Condor but there was a registry setting which needed to be adjusted to allow many concurrent jobs on Windows. My memory is rather vague but I think in Condor.ini there is a limit for number of concurrent jobs. Worth looking at.





Philip Crawford B Comp Sc, MIEEE
Medicine Computer Support Unit
The University of NSW, Sydney, NSW, 2052

Phone: +61-2-9385 2564

Fax: +61-2-9385 1258

Email: p.crawford@xxxxxxxxxxx


From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dr Alan O Cais
Sent: Thursday, 30 August 2007 11:39 AM
To: Condor-Users Mail List
Subject: [Condor-users] Networking problems


Hi all,

I manage a large(ish) cluster of Windows machines with only a few users submitting a large number (100s) of jobs from a single machine through a Cygwin ssh server. When about 200 jobs are started the ssh server begins to fall over and only accepts new connections intermittently (as in it doesn't work most of the time) until the jobs have completed. Also about 50 of the machines are currently stuck in a preempting state (and have been for over a day even though I deleted the actual jobs yesterday afternoon).

I figured this is probably a port/network problem but was hoping someone might throw some light my way. I'm running XP on all boxes, Condor 6.8.3 and its a Uni-wide 10/100 ethernet network. Each job at present involves the transfer of about 50MB of data.