[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] ShadowLog: Failed to send EOM to the startd ??



Hi Rob,

Typically, that indicates a security-related configuration issue in the startd or starter.

Can you find the corresponding startd/starter logs and see if there's an authorization denied?

Brian

On Mar 28, 2014, at 12:37 AM, Stub <spamrefuse@xxxxxxxxx> wrote:

Hi,

Since a couple of days I get these lines in the ShadowLog:


03/28/14 14:25:18 ******************************************************
03/28/14 14:25:18 ** condor_shadow (CONDOR_SHADOW) STARTING UP
03/28/14 14:25:18 ** /usr/sbin/condor_shadow
03/28/14 14:25:18 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1)
03/28/14 14:25:18 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON
03/28/14 14:25:18 ** $CondorVersion: 8.1.1 Oct 25 2013 BuildID: RH-8.1.1-0.3.fc20 $
03/28/14 14:25:18 ** $CondorPlatform: I686-Fedora_20 $
03/28/14 14:25:18 ** PID = 24932
03/28/14 14:25:18 ** Log last touched 3/28 14:25:17
03/28/14 14:25:18 ******************************************************
03/28/14 14:25:18 Using config source: /etc/condor/condor_config
03/28/14 14:25:18 Using local config sources: 
03/28/14 14:25:18    /etc/condor/config.d/00personal_condor.config
03/28/14 14:25:18    /etc/condor/config.d/90skku_condor.config
03/28/14 14:25:18 CLASSAD_CACHING is OFF
03/28/14 14:25:18 DaemonCore: command socket at <xxx.xxx.140.72:43578?noUDP>
03/28/14 14:25:18 DaemonCore: private command socket at <xxx.xxx.140.72:43578>
03/28/14 14:25:18 Initializing a VANILLA shadow for job 25.0
03/28/14 14:26:18 (25.0) (24932): condor_write(): Socket closed when trying to write 3264 bytes to startd slot1@comnet-PC086, fd is 5
03/28/14 14:26:18 (25.0) (24932): Buf::write(): condor_write() failed
03/28/14 14:26:18 (25.0) (24932): slot1@comnet-PC086: DCStartd::activateClaim: Failed to send EOM to the startd
03/28/14 14:26:18 (25.0) (24932): Job 25.0 is being evicted from slot1@comnet-PC086
03/28/14 14:26:18 (25.0) (24932): logEvictEvent with unknown reason (108), aborting
03/28/14 14:26:18 (25.0) (24932): **** condor_shadow (condor_SHADOW) pid 24932 EXITING WITH STATUS 108


None of the jobs is getting anywhere.
The 'condor_q' output is flipping from status R to status I, ad inifinitum.

What could be the issue here?

Thank you.
Rob.



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/