[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Computers missing from Condor pool



Rob,

> Thank you for your reply. I've been wary of changing to TCP because of 
> the warnings in condor_config and the manual, as well as the effect it 
> might have on network / system load, but I'm willing to explore this 
> option further.

There are some other things to look at with UDP.  Monitor the output
of "netstat -su" looking at "packet receive errors".  If this number
is going up then you are losing packets.


One thing is to increase the size of the collector buffer:

COLLECTOR_SOCKET_BUFSIZE = 10000000


Another is to increase some system parameters:

/etc/sysctl.conf:

net.core.rmem_default = 65535
net.core.wmem_default = 65535
net.core.rmem_max = 8388607
net.core.wmem_max = 8388607
net.ipv4.tcp_wmem = 4096 65536 8388607
net.ipv4.tcp_rmem = 4096 65536 8388607


We have also done this:

MASTER_UPDATE_INTERVAL = $RANDOM_CHOICE(290,291,292,293,294,295,296,297,298,299,301,302,303,304,305,306,307,308,309,310)
UPDATE_INTERVAL        = $RANDOM_CHOICE(290,291,292,293,294,295,296,297,298,299,301,302,303,304,305,306,307,308,309,310)

... on the compute nodes to keep them from flooding the collector all
at the same time (since they tend to sync up if you ever do a
condor_reconfig -all).

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
(608) 262 - 9479	University of Wisconsin, Madison