[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Collector error - ERROR: receiving new UDP message but found a long message still waiting to be closed




For the curious, the cause of the problem in the specific case appearing your logs is this:

1. Your collector is receiving a UDP message from a daemon that is trying to use a security session that the collector does not recognize. This could happen, for example, if the collector has restarted since the last time that daemon communicated with the collector. The collector informs the daemon of the problem and the two then set up a new security session for exchanging UDP messages. Should be no problem.

2. But here's the problem: in 6.8.8 and earlier versions, when rejecting the initial UDP message that was using the invalid security session, the collector would leave behind some half-finished business. In earlier versions prior to 6.8.8, this could potentially cause problems when reading subsequent UDP messages. In 6.8.8, the problem is detected when the next message arrives, and other than the annoying message in the logs, all should be fine. In 7.0.0, the root cause of the problem has been fixed so the error message does not get triggered.

--Dan

Wojtek Goscinski wrote:
Thanks for your help Dan.
For interests sake, can you comment on the actual cause?

On Feb 20, 2008 2:29 AM, Dan Bradley <dan@xxxxxxxxxxxx <mailto:dan@xxxxxxxxxxxx>> wrote:


    Upgrading to 7.0.0 should make this error message go away.  However, I
    wouldn't recommend upgrading for this reason alone, because 6.8.8
    automatically recovers from this unexpected state, and the
    specific case
    of it that you are encountering has a known cause that is benign.
    Versions of Condor prior to 6.8.8 did not emit this error message, but
    only because the error went undetected.  The only known case where the
    undetected error could cause serious problems is when submitting
    jobs to
    a 6.9.3 or later schedd to be run on 6.8.7 and earlier startds.

    --Dan

    Wojtek Goscinski wrote:
    >
    > Howdy,
    >
    > Can anyone tell me what might be causing the following errors
    > Collector, Sched and Match log? In particular, I'm interested in the
    > "ERROR: received new UDP..." messages.
    > I believe this has started occuring since upgrading to condor 6.8.8
    > installed through VDT. I'm running under Scientific Linux SL release
    > 5.1 and the pool is managing around a hundred hosts without too many
    > issues.
    >
    > 2/19 16:05:29 DC_AUTHENTICATE: attempt to open invalid session
    > sloth1:4772:1202982643:4157, failing.
    > 2/19 16:05:29 ERROR: receiving new UDP message but found a long
    > message still waiting to be closed (consumed=0). Closing it now.
    > 2/19 16:05:29 WARNING:  No master ad for < vm1@MONASH-F9AD285F >
    > 2/19 16:05:29 StartdAd     : Inserting ** "< vm1@MONASH-F9AD285F ,
    > 118.138.174.24 <http://118.138.174.24> <http://118.138.174.24> >"
    > 2/19 16:05:29 stats: Inserting new hashent for
    > 'Start':'vm1@MONASH-F9AD285F':'118.138.174.24
    <http://118.138.174.24> <http://118.138.174.24>'
    > 2/19 16:05:29 StartdPvtAd  : Inserting ** "< vm1@MONASH-F9AD285F ,
    > 118.138.174.24 <http://118.138.174.24> <http://118.138.174.24> >"
    > 2/19 16:05:29 stats: Inserting new hashent for
    > 'StartdPvt':'vm1@MONASH-F9AD285F':'118.138.174.24
    <http://118.138.174.24> <http://118.138.174.24>'
    > 2/19 16:05:38 DC_AUTHENTICATE: attempt to open invalid session
    > sloth1:5870:1203380734:1165, failing.
    > 2/19 16:05:39 ERROR: receiving new UDP message but found a long
    > message still waiting to be closed (consumed=0). Closing it now.
    > 2/19 16:05:39 DC_AUTHENTICATE: attempt to open invalid session
    > sloth1:5870:1203380734:1165, failing.
    > 2/19 16:05:39 ERROR: receiving new UDP message but found a long
    > message still waiting to be closed (consumed=0). Closing it now.
    >
    >
    > Any hints on what might cause this error are most welcome!
    >
    > Regards,
    >
    > James
    >
    >
    >
    ------------------------------------------------------------------------
    >
    > _______________________________________________
    > Condor-users mailing list
    > To unsubscribe, send a message to
    condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
    > subject: Unsubscribe
    > You can also unsubscribe by visiting
    > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
    >
    > The archives can be found at:
    > https://lists.cs.wisc.edu/archive/condor-users/
    >
    _______________________________________________
    Condor-users mailing list
    To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
    <mailto:condor-users-request@xxxxxxxxxxx> with a
    subject: Unsubscribe
    You can also unsubscribe by visiting
    https://lists.cs.wisc.edu/mailman/listinfo/condor-users

    The archives can be found at:
    https://lists.cs.wisc.edu/archive/condor-users/


------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/