[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problem with condor_off using -name vs -addr - different behaviors



Hello,

I'm using trying to use condor_off -peaceful on execution nodes to allow their current jobs to finish before shutting down. However I am having some problems with unexpected bahavior.

$ /usr/sbin/condor_off -v
$CondorVersion: 7.5.5 Jan 26 2011 BuildID: 308936 $
$CondorPlatform: I686-LINUX_x86_rhap_5 $

Trying to shutdown master using -addr
$ /usr/sbin/condor_off -peaceful -addr <x.x.x.41:40926> -subsystem master
Sent "Kill-Daemon-Peacefully" command to master at <x.x.x.41:40926>

1/28 10:58:53 DaemonCore: Command Socket at <x.x.x.41:40926>
1/28 10:58:53 Started DaemonCore process "/opt/condor-7.2.4/sbin/condor_startd", pid and pgroup = 1589
1/28 11:01:58 The STARTD (pid 1589) exited with status 0
1/28 11:01:58 restarting /opt/condor-7.2.4/sbin/condor_startd in 10 seconds
1/28 11:02:09 Started DaemonCore process "/opt/condor-7.2.4/sbin/condor_startd", pid and pgroup = 1703

Master does not shutdown, Startd is shutdown but Master restarts it

Trying to shutdown just Startd
$ /usr/sbin/condor_off -peaceful -addr <x.x.x.41:40926> -subsystem startd
Sent "Set-Peaceful-Shutdown" command to startd at <x.x.x.41:40926>

Nothing new in MasterLog

However if I use -name instead of -addr
$ /usr/sbin/condor_off -peaceful -name "vm41" -subsystem master
Sent "Kill-Daemon-Peacefully" command to master vm41

1/28 12:56:38 Got SIGTERM. Performing graceful shutdown.
1/28 12:56:38 Sent SIGTERM to STARTD (pid 7351)
1/28 12:56:38 The STARTD (pid 7351) exited with status 0
1/28 12:56:38 All daemons are gone.  Exiting.
1/28 12:56:38 **** condor_master (condor_MASTER) pid 1576 EXITING WITH STATUS 0

Just Startd using -name
$ /usr/sbin/condor_off -peaceful -name "vm41" -subsystem startd
Sent "Set-Peaceful-Shutdown" command to startd vm41
Sent "Kill-Daemon-Peacefully" command to master vm41

1/28 13:02:45 Handling daemon-specific command for "STARTD"
1/28 13:02:45 Sent SIGTERM to STARTD (pid 7605)
1/28 13:02:45 The STARTD (pid 7605) exited with status 0

Not all execution nodes have a valid hostname so I need to use their address, but I get different results. Is this some undocumented behavior or a bug with condor_off?
Any known workaround to get the same behavior as -name using -addr?

Thanks!