[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] CondorMaster restarting sometime


we have condor 7.4.1 running and observed that on the nodes running a startd 
the condor_master process is stopping with exit code 0 and starting from time to time.
This happens on arbitrary nodes at arbitrary time. We have not been
able yet to correlate this with a particular kind of jobs.
We increased the verbosity on some nodes and collected the logs. 

I took the time around such an event and put the CKPTLog, MasterLog and
StartLog of the startd node and the CollectorLog of the submit host into
a tar ball:


Unfortunately, we have been too slow - the log rotate erased the
corresponding events in the StarterLogs.

If you need the configuration or more logging please tell us.

Thank you and cheers,

Attachment: signature.asc
Description: Digital signature