[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Condor shuts down our network



> > 
> > It appears to be an update of machine status.  Our logging software 
> > logged 57874 of them in just over 56 seconds.
> 
> That's certainly not right.

You're right.  A tiny bit of other messages are mixed in.  When I separated 
out all except the messages sent from this one node to the central manager, 
I found that this node had sent 56159 messages to the CM, over the course 
of three minutes, 34 seconds.  Still, that's clearly way too many.  You 
could argue that the logging may be faulty, but there remains the fact that 
it brought our network, and the network of the grid node, to a halt, so 
there was definitely significant traffic being exchanged.

> 
> We've never seen it before. I'm certainly not willing to rule out a Condor
> bug, but I'd expect to see it somewhere else.
> 
> The ad you're seeing is the Master Ad - the update that it gets sent at is
> controlled by the MASTER_UPDATE_INTERVAL. It should default to 300 seconds.

We have not changed this default value.

> 
> The best thing to check would be to see the machine that's sending all of
> the ads in. 

I'll see if we can get in tomorrow; the computer is not under our control.

> 
> It's also worth looking in the CollectorLog on the central manager, and seeing
> how many ads it's receiving. 

CollectorLog has no mention of receiving ads from that node at that time.

-David
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>