[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Collector not responding using shared port daemon



Iâm trying to setup a cluster to use the shared port daemon, including the Collector using the shared port daemon.  The collector is not responding to any requests.

I get these messages in the shared port log:
06/24/15 17:09:19 SharedPortClient - server response deadline has passed for collector as requested by SCHEDD <X.X.X.X:9619?noUDP&sock=57033_c968_3> on <X.X.X.X:56458>

Here are the relevant configuration options:
AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST = true
COLLECTOR_USES_SHARED_PORT = true
MAX_SHARED_PORT_LOG = $(MAX_DEFAULT_LOG)
SHARED_PORT = $(LIBEXEC)/condor_shared_port
SHARED_PORT_ADDRESS_REWRITING = false
SHARED_PORT_ARGS = -p 9619
SHARED_PORT_DAEMON_AD_FILE = $(LOCK)/shared_port_ad
SHARED_PORT_DEBUG = D_FULLDEBUG
SHARED_PORT_DEFAULT_ID = 
SHARED_PORT_LOG = $(LOG)/SharedPortLog
SHARED_PORT_MAX_FILE_DESCRIPTORS = 4096
SHARED_PORT_PORT = 9619
USE_SHARED_PORT = True
COLLECTOR_HOST = X.X.X.X:9619?sock=collector
COLLECTOR_ARGS = -sock collector
COLLECTOR_PORT = 9618

And the version information:
06/24/15 16:58:49 ** $CondorVersion: 8.3.5 Apr 28 2015 $
06/24/15 16:58:49 ** $CondorPlatform: X86_64-CentOS_6.6 $

From the top of the CollectorLog:
06/24/15 17:15:56 DaemonCore: non-shared command socket at <X.X.X.X:33947>
06/24/15 17:15:56 Daemoncore: Listening at <0.0.0.0:33947> on TCP (ReliSock) and UDP (SafeSock).
06/24/15 17:15:56 DaemonCore: command socket at <X.X.X.X:9619?noUDP&sock=collector>
06/24/15 17:15:56 DaemonCore: private command socket at <X.X.X.X:9619?noUDP&sock=collector>

Let me know if any more information is required to help debug.

-Derek




Attachment: smime.p7s
Description: S/MIME cryptographic signature