[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] STARTD_CRON stops running
- Date: Mon, 07 Nov 2011 11:29:34 -0500
- From: Sarah Williams <saewill@xxxxxxxxx>
- Subject: [Condor-users] STARTD_CRON stops running
Hello Condor users & experts,
I am using STARTD_CRON to do periodic health checks on worker nodes.
However, on some of the nodes I see that the script is no longer logging
any output, and the nodes do not detect unhealthy states. Using
condor_config_val -dump, I see the CRON settings are in place:
CRON_JOBLIST = nodecheck
CRON_NODECHECK_EXECUTABLE = /usr/local/sbin/condor_node_check.sh
CRON_NODECHECK_KILL = true
CRON_NODECHECK_MODE = periodic
CRON_NODECHECK_PERIOD = 15m
CRON_NODECHECK_RECONFIG = false
STARTD_CRON_NAME = CRON
The script is world-executable, and the log file is world-writable. My
version of condor is 7.6.0-1.
I wonder if I am being affected by the following bug.
Is there any way to expose the current value of CRON_*_SENT? I see many
instances of this message in StartLog:
StartLog.old:11/01/11 05:08:27 CronJob: Job 'nodecheck' not idle!
Is there a way to reset CRON_*_SENT without killing running jobs?