[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] incomplete udp for command 0 / command 2



Rob,

> After monitoring condor udp traffic for a while, I've found an
> interesting problem.. sometimes clients will start misbehaving, and
> send only part of the data needed for update_master_ad /
> update_startd_ad commands.
> 
> Has anyone seen this? Any ideas on what's causing it?

Yes, and my first thought is, is this Windows?

<snip>

> Both clients are windows XP, condor version 6.8.6. I've noticed this
> behavior on both, sometimes one, sometimes the other. The problem is
> gone after a condor_restart, but will eventually re-occur. The
> client logfiles don't show anything interesting.
> 
> Any ideas on how to debug / fix this would be welcome.

This is a problem with UDP under Windows, it considers a packet "sent"
when the sendto() call is made, not when the packet has actually hit
the wire.  So if sendto() is called too rapidly (e.g. when collector
update packets are split) you can lose the previous UDP packet if it
hasn't really been sent yet.

What we did was add "D_NETWORK" to the MASTER_DEBUG and STARTD_DEBUG
flags in the config file.  The added delay of logging the UDP packets
seems to be enough to keep this from happening.

You can alternatively use "UPDATE_COLLECTOR_WITH_TCP = True" and avoid
UDP entirely.

-- 
Daniel K. Forrest	Laboratory for Molecular and
forrest@xxxxxxxxxxxxx	Computational Genomics
(608) 262 - 9479	University of Wisconsin, Madison