[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Computers missing from Condor pool



Hello,

Daniel Forrest wrote:
There are some other things to look at with UDP.  Monitor the output
of "netstat -su" looking at "packet receive errors".  If this number
is going up then you are losing packets.

As it turns out, we were getting ~10% UDP loss during peak hours. We've increased the kernel buffer limits, and haven't lost a packet since.

James wrote:
Do a "condor_status -l | condor_updates_stats | grep "Stats:"  And
check for lost updates.

The change to the UDP buffer has decreased the percentage of lost updates as shown by condor_updates_stats by quite a bit; mostly 0-2% lost updates with some spikes at 10%, compared to 10-30% all round before the buffer increase. While this is definitely an improvement, we're still not satisfied with the number of hosts in the pool; ping sweeps still show some 20% additional live hosts the collector doesn't know about.

Is there anything else we could try?

Regards,

Rob de Graaf