[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status stuck

Hi Andrew -

Based upon your clues below, everything points to the condor_collector process not responding. What does the CollectorLog on your central manager machine have to say for itself? Can you run "condor_status" on your central manager?


Pleat, Andrew C. wrote:

Condor 6.8.5

Occasionally, there's some sort of lock-up occuring in my cluster. The symptoms are:

- condor_status hangs indefinitely
- condor_q hangs for about a minute and returns 'Failed to fetch ads from: <... : 9683> : ..'
- condor_restart -subsystem schedd hangs
        - I tried this based on looking at condor_users mail
- condor processes still running (although no apparent activity)

- MasterLog shows normal activity
- NegotiatorLog seems to have stopped reporting
        - normally it writes messages every 5 minutes
        - the last report was "Getting all public ads ..."
- SchedLog reports 'Called reschedule_negotiator()' as last message
        - a condor_submit_dag had been performed in the same time frame
- normally, the next message is "Activity on stashed negotiator socket"
- StartLog has nothing special (although file is still being touched)
        - the only other file still being touched is MasterLog

My conclusion would be the negotiator is somehow stuck.

any ideas

thank you
andy pleat


Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/

Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685