[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_status stuck



Title: condor_status stuck

Condor 6.8.5

Occasionally, there's some sort of lock-up occuring in my cluster.  The symptoms are:

- condor_status hangs indefinitely
- condor_q hangs for about a minute and returns 'Failed to fetch ads from: <... : 9683> : ..'
- condor_restart -subsystem schedd hangs
        - I tried this based on looking at condor_users mail
- condor processes still running (although no apparent activity)

Logs:
- MasterLog shows normal activity
- NegotiatorLog seems to have stopped reporting
        - normally it writes messages every 5 minutes
        - the last report was "Getting all public ads ..."
- SchedLog reports 'Called reschedule_negotiator()' as last message
        - a condor_submit_dag had been performed in the same time frame
        - normally, the next message is "Activity on stashed negotiator socket"
- StartLog has nothing special (although file is still being touched)
        - the only other file still being touched is MasterLog

My conclusion would be the negotiator is somehow stuck.

any ideas

thank you
andy pleat