[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Restart a Daemon That Exited



On Mon, Mar 8, 2010 at 11:59 AM, Adam Smola <adam.smola@xxxxxxxxx> wrote:
Hello All,

I have added several daemons to the list of daemons managed by
condor_master by adding them to the condor_config file. Typically when
these processes crash, condor restarts them within 10-30 seconds,
however on occasion (I think after they have repeatedly failed after
starting) it takes substantially longer for condor to start them
again. Is there anyone way to use a command line operation to force
condor_master to start the process before the timer has expired.

IIRC Condor uses an exponential backoff for daemon restarts. So a daemon constantly failing takes longer and longer to restart in an attempt to avoid possible overloaded systems just continually killing the daemon.

You could try telling the master to turn on all sub-daemons on the machine with:

condor_on -name <hostname>

That might work to override the backoff timer.

- Ian