[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] ShadowLog: Failed to send EOM to the startd ??



Hi,

Since a couple of days I get these lines in the ShadowLog:


03/28/14 14:25:18 ******************************************************
03/28/14 14:25:18 ** condor_shadow (CONDOR_SHADOW) STARTING UP
03/28/14 14:25:18 ** /usr/sbin/condor_shadow
03/28/14 14:25:18 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
03/28/14 14:25:18 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
03/28/14 14:25:18 ** $CondorVersion: 8.1.1 Oct 25 2013 BuildID: RH-8.1.1-0.3.fc20 $
03/28/14 14:25:18 ** $CondorPlatform: I686-Fedora_20 $
03/28/14 14:25:18 ** PID = 24932
03/28/14 14:25:18 ** Log last touched 3/28 14:25:17
03/28/14 14:25:18 ******************************************************
03/28/14 14:25:18 Using config source: /etc/condor/condor_config
03/28/14 14:25:18 Using local config sources: 
03/28/14 14:25:18    /etc/condor/config.d/00personal_condor.config
03/28/14 14:25:18    /etc/condor/config.d/90skku_condor.config
03/28/14 14:25:18 CLASSAD_CACHING is OFF
03/28/14 14:25:18 DaemonCore: command socket at <xxx.xxx.140.72:43578?noUDP>
03/28/14 14:25:18 DaemonCore: private command socket at <xxx.xxx.140.72:43578>
03/28/14 14:25:18 Initializing a VANILLA shadow for job 25.0
03/28/14 14:26:18 (25.0) (24932): condor_write(): Socket closed when trying to write 3264 bytes to startd slot1@comnet-PC086, fd is 5
03/28/14 14:26:18 (25.0) (24932): Buf::write(): condor_write() failed
03/28/14 14:26:18 (25.0) (24932): slot1@comnet-PC086: DCStartd::activateClaim: Failed to send EOM to the startd
03/28/14 14:26:18 (25.0) (24932): Job 25.0 is being evicted from slot1@comnet-PC086
03/28/14 14:26:18 (25.0) (24932): logEvictEvent with unknown reason (108), aborting
03/28/14 14:26:18 (25.0) (24932): **** condor_shadow (condor_SHADOW) pid 24932 EXITING WITH STATUS 108


None of the jobs is getting anywhere.
The 'condor_q' output is flipping from status R to status I, ad inifinitum.

What could be the issue here?

Thank you.
Rob.