[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status stuck



Andrew,
>  - a few other of the same PERMISSION DENIED for QUERY_STARTD_PVT_ADS
Based upon the info you've given, all signs point to the collector as
needing to be restarted, or that your security settings have changed
or are preventing the querying classads.

Probably need to look at the security settings, and if those haven't
changed since when condor_status was working, try restarting your
collector process. Hope that helps!

Good Luck,
Jason


-- 
===================================
Jason A. Stowe

Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com

On Thu, Mar 27, 2008 at 12:04 PM, Pleat, Andrew C. <andrew.pleat@xxxxxxx> wrote:
> One other unusual message which most likely is unrelated is:
>
>  on execution machine CollectorLog periodically (~ every 10 minutes):
>  - Trying to query collector < (central manager) : 9618 >
>  - condor_read(): Socket closed when trying to read 5 bytes from ... 9618
>  - IO: EOF reading packet header
>  - Couldn't fetch ads: communication error
>  - Aborting negotiation cycle
>
>  and on central manager at same time:
>  - DaemonCore: PERMISION DENIED to unknown user from host (the execution
>  machine above:9625> for command 49 (UPDATE_NEGOTIATOR_AD)
>  - a few other of the same PERMISSION DENIED for QUERY_STARTD_PVT_ADS
>
>  again, no idea if it's related but something to fix...
>
>  thanks again
>
>
>
>  -----Original Message-----
>  From: condor-users-bounces@xxxxxxxxxxx
>
>
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Pleat, Andrew C.
>  Sent: Thursday, March 27, 2008 11:35 AM
>  To: Condor-Users Mail List
>  Subject: Re: [Condor-users] condor_status stuck
>
>  - condor_status on central manager is hanging
>  - condor_status is hanging on other machines as well
>  - CollectorLog
>         - lots of apparently normal messages up until 10:30 and then
>  silence
>         - only unusual message is at 10:17:
>                 - can't send UPDATE_COLLECTOR_AD to collector ((nul):
>  Failed to send UDP update command to collector
>                 - Housekeeper: Ready to clean old ads
>                 -   <bunch of 'Cleaning' messages>
>                 - then resume normal messages up until 10:30 silence
>  - condor_status eventually failed (tens of minutes later):
>         - SECMAN:2003:TCP connection to <... : 9618> failed
>  - subsequently CollectorLog shows:
>         - condor_collector (CONDOR_COLLECTOR) STARTING UP
>         - this must be the master restarting it (as Steve Timm
>  indicated)
>  - reissued 'condor_status' - again stuck
>  - MasterLog
>         - at 11:25 shows:
>                 - NEGOTIATOR recovered
>                 - COLLECTOR recovered
>                 - SCHEDD recovered
>  - the 'condor_restart -subsystem schedd' that I issued initially final
>  went through (although now I now understand it wasn't the likely
>  culprit)
>  - reissued 'condor_q' and same result : Failed to fetch ads ... : 9679
>         - note the port changed
>
>  thanks for the responses
>
>  -----Original Message-----
>  From: condor-users-bounces@xxxxxxxxxxx
>  [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
>  Sent: Thursday, March 27, 2008 11:15 AM
>  To: Condor-Users Mail List
>  Subject: Re: [Condor-users] condor_status stuck
>
>
>
>  Hi Andrew -
>
>  Based upon your clues below, everything points to the condor_collector
>  process not responding.    What does the CollectorLog on your central
>  manager machine have to say for itself?    Can you run "condor_status"
>  on your central manager?
>
>  thanks,
>  Todd
>
>
>  Pleat, Andrew C. wrote:
>  >
>  >
>  > Condor 6.8.5
>  >
>  > Occasionally, there's some sort of lock-up occuring in my cluster.
>  > The symptoms are:
>  >
>  > - condor_status hangs indefinitely
>  > - condor_q hangs for about a minute and returns 'Failed to fetch ads
>  > from: <... : 9683> : ..'
>  > - condor_restart -subsystem schedd hangs
>  >         - I tried this based on looking at condor_users mail
>  > - condor processes still running (although no apparent activity)
>  >
>  > Logs:
>  > - MasterLog shows normal activity
>  > - NegotiatorLog seems to have stopped reporting
>  >         - normally it writes messages every 5 minutes
>  >         - the last report was "Getting all public ads ..."
>  > - SchedLog reports 'Called reschedule_negotiator()' as last message
>  >         - a condor_submit_dag had been performed in the same time
>  frame
>  >         - normally, the next message is "Activity on stashed
>  > negotiator socket"
>  > - StartLog has nothing special (although file is still being touched)
>  >         - the only other file still being touched is MasterLog
>  >
>  > My conclusion would be the negotiator is somehow stuck.
>  >
>  > any ideas
>  >
>  > thank you
>  > andy pleat
>  >
>  >
>  >
>  >
>  > ----------------------------------------------------------------------
>  > --
>  >
>  > _______________________________________________
>  > Condor-users mailing list
>  > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
>  > with a
>  > subject: Unsubscribe
>  > You can also unsubscribe by visiting
>  > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>  >
>  > The archives can be found at:
>  > https://lists.cs.wisc.edu/archive/condor-users/
>
>
>  --
>  Todd Tannenbaum                       University of Wisconsin-Madison
>  Condor Project Research               Department of Computer Sciences
>  tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
>  Phone: (608) 263-7132                 Madison, WI 53706-1685
>  _______________________________________________
>  Condor-users mailing list
>  To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>  a
>  subject: Unsubscribe
>  You can also unsubscribe by visiting
>  https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>  The archives can be found at:
>  https://lists.cs.wisc.edu/archive/condor-users/
>  _______________________________________________
>  Condor-users mailing list
>  To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
>  a
>  subject: Unsubscribe
>  You can also unsubscribe by visiting
>  https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>  The archives can be found at:
>  https://lists.cs.wisc.edu/archive/condor-users/
>  _______________________________________________
>  Condor-users mailing list
>  To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>  subject: Unsubscribe
>  You can also unsubscribe by visiting
>  https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
>  The archives can be found at:
>  https://lists.cs.wisc.edu/archive/condor-users/
>