[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Networking problems

Hi all,

I manage a large(ish) cluster of Windows machines with only a few users submitting a large number (100s) of jobs from a single machine through a Cygwin ssh server. When about 200 jobs are started the ssh server begins to fall over and only accepts new connections intermittently (as in it doesn't work most of the time) until the jobs have completed. Also about 50 of the machines are currently stuck in a preempting state (and have been for over a day even though I deleted the actual jobs yesterday afternoon).

I figured this is probably a port/network problem but was hoping someone might throw some light my way. I'm running XP on all boxes, Condor 6.8.3 and its a Uni-wide 10/100 ethernet network. Each job at present involves the transfer of about 50MB of data.