Re: [Condor-users] condor fault tolerance

Paul Marshall wrote:

I haven't been able to find any more up-to-date information on this issue:


Could someone point me in the right direction? What is the best way to
decrease the time that it takes Condor to recognize a node has failed
and drop it from the system?

There's work going on to reverse the keepalive message direction:


You can experiment with that on your own by setting the following on both your startd's and schedd's:


-- Lans Carstensen