[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Condor shuts down our network



UDP, which is used by default for updating ads, does not guarantee that each packet arrives only once. Can this flooding be the result of some fault in the network between the machine and the CM? I guess it can be verified by either (1) switching to TCP for updating ads (reconfiguration of the remote machine required), or (2) checking whether timestamps contained in the received ads are all the same or not (hopefully there are timestamps in the ads). Last but not least, there's also the possibility that the logging software is actually faulty.

Alexander Klyubin

David Vestal wrote:
It appears to be an update of machine status. Our logging software logged 57874 of them in just over 56 seconds.

That's certainly not right.


You're right. A tiny bit of other messages are mixed in. When I separated out all except the messages sent from this one node to the central manager, I found that this node had sent 56159 messages to the CM, over the course of three minutes, 34 seconds. Still, that's clearly way too many. You could argue that the logging may be faulty, but there remains the fact that it brought our network, and the network of the grid node, to a halt, so there was definitely significant traffic being exchanged.


We've never seen it before. I'm certainly not willing to rule out a Condor
bug, but I'd expect to see it somewhere else.

The ad you're seeing is the Master Ad - the update that it gets sent at is
controlled by the MASTER_UPDATE_INTERVAL. It should default to 300 seconds.


We have not changed this default value.


The best thing to check would be to see the machine that's sending all of
the ads in.


I'll see if we can get in tomorrow; the computer is not under our control.


It's also worth looking in the CollectorLog on the central manager, and seeing
how many ads it's receiving.


CollectorLog has no mention of receiving ads from that node at that time.

-David
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>

Attachment: signature.asc
Description: OpenPGP digital signature