[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] central manager crash

Hi Donald,

On Tuesday, 4 September, 2012 at 4:09 PM, Shrum, Donald C wrote:

My central manager has crashed.  Assuming I build a new machine will all running jobs be lost as a result or will the submit nodes negotiate everything back with the new central manager?
Condor was designed to be pretty resilient. When the CM goes down your schedulers lose the ability to form new claims on machines, but their existing claims should hold as long as CLAIM_WORKLIFE doesn't expire them. Which means they'll keep running jobs from their queues on the claims.

When you bring up a new central manager and point everything at it, negotiation should resume near normal. You'll have lost your collector database contents so expect some churn because of submitter EUP updates being inaccurate. But for the most part: it just starts to work as expected again.

- Ian
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools