[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fast shutdown happens frequently on one node



Hi, Jaime:

    I believe there is something weird on network or hardware level, which triggers the default setting of this condor parameter. 

   Cheers,Gang

On 26/08/2015 09:20, Gang Qin wrote:
Hi, Jaime:

  Neither DAEMON_SHUTDOWN nor DAEMON_SHUTDOWN_FAST is defined:

node029:~# condor_config_val -master -verbose DAEMON_SHUTDOWN_FAST
Not defined: DAEMON_SHUTDOWN_FAST
node029:~# condor_config_val -master -verbose DAEMON_SHUTDOWN
Not defined: DAEMON_SHUTDOWN


node029:~# condor_config_val -master -config
Configuration source:
    /etc/condor/condor_config
Local configuration sources:
    /etc/condor/config.d/security.config
    /etc/condor/config.d/wn-slots.config
    /etc/condor/config.d/wn-wn.config
    /etc/condor/condor_config.local

 Also checked all files under /etc/condor and nothing related with those 2 parameters.

  Cheers,Gang


From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Jaime Frey [jfrey@xxxxxxxxxxx]
Sent: Tuesday, August 25, 2015 8:02 PM
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Fast shutdown happens frequently on one node

On Aug 25, 2015, at 11:01 AM, Gang Qin <Gang.Qin@xxxxxxxxxxxxx> wrote:

Dear condor Experts:

  From time to time the condor service on one machine turns off automatically, and in the MasterLog I can see: 

08/25/15 15:39:54 The DaemonShutdownFast _expression_ "1000000" evaluated to TRUE: starting fast shutdown
08/25/15 15:39:54 Got SIGQUIT.  Performing fast shutdown.
08/25/15 15:39:54 Sent SIGQUIT to STARTD (pid 3333)
08/25/15 15:40:00 AllReaper unexpectedly called on pid 3333, status 0.
08/25/15 15:40:00 The STARTD (pid 3333) exited with status 0
08/25/15 15:40:00 All daemons are gone.  Exiting.
08/25/15 15:40:00 **** condor_master (condor_MASTER) pid 3306 EXITING WITH STATUS 99


However, in the configuration files I didn't see the setting of  DAEMON_SHUTDOWN or DAEMON_SHUTDOWN_FAST.


node029:~# condor_config_val -dump | grep SHUTDOWN
EVENTD_SHUTDOWN_CLEANUP_INTERVAL = 3600
EVENTD_SHUTDOWN_CONSTRAINT = 
EVENTD_SHUTDOWN_SLOW_START_INTERVAL = 0
EVENTD_SHUTDOWN_TIME = 
EVENTD_SIMULATE_SHUTDOWNS = 
NEGOTIATOR_TRIM_SHUTDOWN_THRESHOLD = 0
SHUTDOWN_FAST_TIMEOUT = 300
SHUTDOWN_GRACEFUL_TIMEOUT = 
STARTD_FACTORY_SCRIPT_SHUTDOWN_PARTITION = 
STARTD_NOCLAIM_SHUTDOWN = 0

  Any idea what lead to this fast shutdown ? 

Try running ‘condor_config_val -master -verbose DAEMON_SHUTDOWN_FAST’. That will query the master daemon for the value of that parameter and where the value was set, rather than reading the config files directly.
Your daemons may be using a different configuration file than the command line tools.
You can also run ‘condor_config_val -master -config’ to see which configuration files the master read.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/