[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Network problems



Hi Adam,

My guess it that your shadow's have used all the (100) available ports - see section 3.10.8.2 of the manual. The maximum number of running jobs you can support is 19 and I suspect you are limited to 100 ports by firewall restrictions. Your two options are to define MAX_JOBS_RUNNING or enlarge the firewall hole and increase the number of ports by changing HIGHPORT and LOWPORT.

Cheers,

Andrew


On 12 May 2006, at 14:48, Adam Thorn wrote:

I've noticed various messages in some of my logs that look rather like
networking problems to me, but the network management is handled by people other than myself. Can anyone suggest what the problem(s) might be, so I
can complain more effectively? Many of the MasterLogs on my pool have
frequent messages like:

5/10 17:36:17 Sock::bindWithin - failed to bind any port within (9600 ~
9700)
5/10 17:36:17 SafeSock::connect bind() failed: _state = 1

And having just submitted a cluster of jobs, the SchedLog on my submit
node has multiple entries like:

5/12 14:38:28 IO: Failed to read packet header
5/12 14:38:29 Started shadow for job.. etc

which repeat for each shadow that has been started.

Adam
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users