[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Job disconnected



Dear all,

My job's log showed:
22 (044.000.000) 08/21 12:59:51 Job disconnected, attempting to reconnect
    Local schedd and job shadow died, schedd now running again
    Trying to reconnect to slot1_1@xxxxxxxxxxxxxxxxxxxxxx <10.42.0.25:9618?addrs=10.42.0.25-9618+[--1]-9618&noUDP&sock=2544_8d06_4>

When I looked in log directory I found core files and the MasterLog displays:

08/21/17 12:38:48 Can't open directory "/condor/local/paraty/config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
08/21/17 12:38:48 Cannot open /condor/local/paraty/config: No such file or directory
08/21/17 12:38:48 ******************************************************
08/21/17 12:38:48 ** condor_master (CONDOR_MASTER) STARTING UP
08/21/17 12:38:48 ** /condor/install_ubuntu/sbin/condor_master
08/21/17 12:38:48 ** SubsystemInfo: name=MASTER type=MASTER(2) class="DAEMON"(1)
08/21/17 12:38:48 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
08/21/17 12:38:48 ** $CondorVersion: 8.6.3 May 08 2017 BuildID: 404928 $
08/21/17 12:38:48 ** $CondorPlatform: x86_64_Ubuntu14 $
08/21/17 12:38:48 ** PID = 4380
08/21/17 12:38:48 ** Log last touched 8/21 12:37:46
08/21/17 12:38:48 ******************************************************
08/21/17 12:38:48 Using config source: /condor/install_centos/etc/condor_config
08/21/17 12:38:48 Using local config sources:
08/21/17 12:38:48    /condor/local/paraty/condor_config.local
08/21/17 12:38:48 config Macros = 72, Sorted = 72, StringBytes = 2083, TablesBytes = 2640
08/21/17 12:38:48 CLASSAD_CACHING is OFF
08/21/17 12:38:48 Daemon Log is logging: D_ALWAYS D_ERROR
08/21/17 12:38:49 Removed /tmp/condor-lock.0.533212494411604/shared_port_ad (assuming it is left over from previous run)
08/21/17 12:38:49 SharedPortEndpoint: waiting for connections to named socket 4380_e4e7
08/21/17 12:38:49 SharedPortEndpoint: failed to open /tmp/condor-lock.0.533212494411604/shared_port_ad: No such file or directory
08/21/17 12:38:49 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
08/21/17 12:38:49 DaemonCore: private command socket at <10.42.0.25:0?sock=4380_e4e7>
08/21/17 12:38:49 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
08/21/17 12:38:49 Master restart (GRACEFUL) is watching /condor/install_ubuntu/sbin/condor_master (mtime:1502961752)
08/21/17 12:38:49 Collector port not defined, will use default: 9618
08/21/17 12:38:49 Started DaemonCore process "/condor/install_ubuntu/libexec/condor_shared_port", pid and pgroup = 4403
08/21/17 12:38:49 Waiting for /tmp/condor-lock.0.533212494411604/shared_port_ad to appear.
08/21/17 12:38:49 systemd watchdog notification support not available.
08/21/17 12:38:50 Found /tmp/condor-lock.0.533212494411604/shared_port_ad.
08/21/17 12:38:50 Started DaemonCore process "/condor/install_ubuntu/sbin/condor_schedd", pid and pgroup = 4404
08/21/17 12:38:50 Started DaemonCore process "/condor/install_ubuntu/sbin/condor_startd", pid and pgroup = 4405
08/21/17 12:38:54 Setting ready state 'Ready' for STARTD
Stack dump for process 4380 at timestamp 1503313127 (9 frames)
/condor/install_ubuntu/sbin/../lib/libcondor_utils_8_6_3.so(dprintf_dump_stack+0x72)[0x7f8af261fd32]
/condor/install_ubuntu/sbin/../lib/libcondor_utils_8_6_3.so(_Z18linux_sig_coredumpi+0x24)[0x7f8af27f5ca4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f8af0d10390]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x13)[0x7f8af0a32573]
/condor/install_ubuntu/sbin/../lib/libcondor_utils_8_6_3.so(_ZN8Selector7executeEv+0xa6)[0x7f8af261ca16]
/condor/install_ubuntu/sbin/../lib/libcondor_utils_8_6_3.so(_ZN10DaemonCore6DriverEv+0x1052)[0x7f8af27e8fb2]
/condor/install_ubuntu/sbin/../lib/libcondor_utils_8_6_3.so(_Z7dc_mainiPPc+0x13a4)[0x7f8af27f9314]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f8af0955830]
/condor/install_ubuntu/sbin/condor_master[0x40a70f]
08/21/17 12:59:48 Can't open directory "/condor/local/paraty/config" as PRIV_UNKNOWN, errno: 2 (No such file or directory)
08/21/17 12:59:48 Cannot open /condor/local/paraty/config: No such file or directory
08/21/17 12:59:48 ******************************************************
08/21/17 12:59:48 ** condor_master (CONDOR_MASTER) STARTING UP
08/21/17 12:59:48 ** /condor/install_ubuntu/sbin/condor_master
08/21/17 12:59:48 ** SubsystemInfo: name=MASTER type=MASTER(2) class="DAEMON"(1)
08/21/17 12:59:48 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
08/21/17 12:59:48 ** $CondorVersion: 8.6.3 May 08 2017 BuildID: 404928 $
08/21/17 12:59:48 ** $CondorPlatform: x86_64_Ubuntu14 $
08/21/17 12:59:48 ** PID = 4718
08/21/17 12:59:48 ** Log last touched 8/21 12:58:47
08/21/17 12:59:48 ******************************************************
08/21/17 12:59:48 Using config source: /condor/install_centos/etc/condor_config
08/21/17 12:59:48 Using local config sources:
08/21/17 12:59:48    /condor/local/paraty/condor_config.local
08/21/17 12:59:48 config Macros = 72, Sorted = 72, StringBytes = 2083, TablesBytes = 2640
08/21/17 12:59:48 CLASSAD_CACHING is OFF
08/21/17 12:59:48 Daemon Log is logging: D_ALWAYS D_ERROR
08/21/17 12:59:49 Removed /tmp/condor-lock.0.533212494411604/shared_port_ad (assuming it is left over from previous run)
08/21/17 12:59:49 SharedPortEndpoint: waiting for connections to named socket 4718_bbea
08/21/17 12:59:49 SharedPortEndpoint: failed to open /tmp/condor-lock.0.533212494411604/shared_port_ad: No such file or directory
08/21/17 12:59:49 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
08/21/17 12:59:49 DaemonCore: private command socket at <10.42.0.25:0?sock=4718_bbea>
08/21/17 12:59:49 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False)
08/21/17 12:59:49 Master restart (GRACEFUL) is watching /condor/install_ubuntu/sbin/condor_master (mtime:1502961752)
08/21/17 12:59:49 Collector port not defined, will use default: 9618
08/21/17 12:59:49 Started DaemonCore process "/condor/install_ubuntu/libexec/condor_shared_port", pid and pgroup = 4740
08/21/17 12:59:49 Waiting for /tmp/condor-lock.0.533212494411604/shared_port_ad to appear.
08/21/17 12:59:49 systemd watchdog notification support not available.
08/21/17 12:59:50 Found /tmp/condor-lock.0.533212494411604/shared_port_ad.
08/21/17 12:59:50 Started DaemonCore process "/condor/install_ubuntu/sbin/condor_schedd", pid and pgroup = 4743
08/21/17 12:59:50 Started DaemonCore process "/condor/install_ubuntu/sbin/condor_startd", pid and pgroup = 4744
08/21/17 12:59:54 Setting ready state 'Ready' for STARTD

It seems that my daemons are restarting every 15 minutes and I do not know why.


Università Paris-Sud
Hervà LEMAITRE

U1000 "Neuroimagerie en Psychiatrie"
Service hospitalier FrÃdÃric Joliot - 4, Place du GÃnÃral Leclerc
91401 Orsay