[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SharedPortEndpoint error in dag.dagman.out



Duncan,


Are the DAGs completing correctly in spite of the error messages?


DAGMan is daemoncore, so it could be that it's trying to find the shared port daemon even though it doesn't actually need to (DAGMan doesn't open a command port).


I'll see if I can reproduce this...


Kent

--

R. Kent Wenger (wenger@xxxxxxxxxxx, 608-262-6627,

http://www.cs.wisc.edu/~wenger/)

Computer Sciences Department

University of Wisconsin-Madison




From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Duncan Brown <dabrown@xxxxxxx>
Sent: Friday, February 10, 2017 12:39 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] SharedPortEndpoint error in dag.dagman.out
 
Hi all,

Since upgrading to 8.6, users are reporting the following error in their dag.dagman.out files:

02/10/17 13:20:41 SharedPortEndpoint: failed to open ./shared_port_ad: No such file or directory
02/10/17 13:20:41 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.
02/10/17 13:21:41 SharedPortEndpoint: failed to open ./shared_port_ad: No such file or directory
02/10/17 13:21:41 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.

I see some discussion about this in the archives for the regular daemons, but not for dagman.

The first occurrence is at the top of the log, then it repeats:

02/08/17 11:34:32 ******************************************************
02/08/17 11:34:32 ** condor_scheduniv_exec.5452054.0 (CONDOR_DAGMAN) STARTING UP
02/08/17 11:34:32 ** /usr/bin/condor_dagman
02/08/17 11:34:32 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1)
02/08/17 11:34:32 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON
02/08/17 11:34:32 ** $CondorVersion: 8.6.0 Jan 26 2017 BuildID: 395190 $
02/08/17 11:34:32 ** $CondorPlatform: x86_64_RedHat7 $
02/08/17 11:34:32 ** PID = 1344275
02/08/17 11:34:32 ** Log last touched 2/8 11:03:46
02/08/17 11:34:32 ******************************************************
02/08/17 11:34:32 Using config source: /etc/condor/condor_config
02/08/17 11:34:32 Using local config sources:
02/08/17 11:34:32    /etc/condor/config.d/00_gwms_general.config
02/08/17 11:34:32    /etc/condor/config.d/02_gwms_schedds.config
02/08/17 11:34:32    /etc/condor/config.d/03_gwms_local.config
02/08/17 11:34:32    /etc/condor/config.d/90_gwms_dns.config
02/08/17 11:34:32    /etc/condor/config.d/92_flocking_osg_ligo.config
02/08/17 11:34:32    /etc/condor/config.d/99_gratia-gwms.conf
02/08/17 11:34:32    /etc/condor/config.d/99_gratia.conf
02/08/17 11:34:32    /etc/condor/condor_config.local
02/08/17 11:34:32 config Macros = 170, Sorted = 170, StringBytes = 8181, TablesBytes = 6224
02/08/17 11:34:32 CLASSAD_CACHING is ENABLED
02/08/17 11:34:32 Daemon Log is logging: D_ALWAYS D_ERROR
02/08/17 11:34:32 DaemonCore: No command port requested.
02/08/17 11:34:32 SharedPortEndpoint: waiting for connections to named socket 1344275_18a2
02/08/17 11:34:32 SharedPortEndpoint: failed to open ./shared_port_ad: No such file or directory
02/08/17 11:34:32 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s.

Any ideas?

Cheers,
Duncan.

--

Duncan Brown                         http://dbrown10.expressions.syr.edu
Charles Brightman Professor of Physics     Room 263-1 Physics Department
Director of the Graduate Program      Syracuse University, NY 13244, USA
Phone: 315 443 5993                                    Fax: 315 443 9103

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/