[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Collector not responding using shared port daemon



On 6/24/2015 5:18 PM, Derek Weitzel wrote:
I’m trying to setup a cluster to use the shared port daemon, including the Collector using the shared port daemon.  The collector is not responding to any requests.


Is your goal to have everything listen on port 9618 (default), or have
everything listen on port 9619? From your config knobs below it looks like you want everything on 9619, so not certain why collector_port is set to 9618... Ie at first blush your config below seems a bit conflicting.

With v8.3.5, if you want to use the shared port daemon on your central manager (including the collector), I think all you need to change from the default config is:
  USE_SHARED_PORT = True
And I think using a non-standard port (eg 9619) would just need the following change from the default config:
  COLLECTOR_HOST = $(CONDOR_HOST):9619
  USE_SHARED_PORT = TRUE

By "default config", I mean the defaults built-into the HTCondor v8.3.5 daemons, not what is sitting around in an old v8.2 condor_config file :). And yes, the settings to do this in v8.3.x are much more simplified (and unfortunately somewhat different) from v8.2.x. Be warned, I am not at a machine where I can test my bold claims above, but am working from memory on ticket https://goo.gl/9ReCch

Early in v8.5, the plan is for everything (including the collector) to be setup to use shared_port by default out of the box, so hopefully discussions like this will soon become extinct.

hope the above helps,
Todd


I get these messages in the shared port log:
06/24/15 17:09:19 SharedPortClient - server response deadline has passed for collector as requested by SCHEDD <X.X.X.X:9619?noUDP&sock=57033_c968_3> on <X.X.X.X:56458>

Here are the relevant configuration options:
AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST = true
COLLECTOR_USES_SHARED_PORT = true
MAX_SHARED_PORT_LOG = $(MAX_DEFAULT_LOG)
SHARED_PORT = $(LIBEXEC)/condor_shared_port
SHARED_PORT_ADDRESS_REWRITING = false
SHARED_PORT_ARGS = -p 9619
SHARED_PORT_DAEMON_AD_FILE = $(LOCK)/shared_port_ad
SHARED_PORT_DEBUG = D_FULLDEBUG
SHARED_PORT_DEFAULT_ID =
SHARED_PORT_LOG = $(LOG)/SharedPortLog
SHARED_PORT_MAX_FILE_DESCRIPTORS = 4096
SHARED_PORT_PORT = 9619
USE_SHARED_PORT = True
COLLECTOR_HOST = X.X.X.X:9619?sock=collector
COLLECTOR_ARGS = -sock collector
COLLECTOR_PORT = 9618

And the version information:
06/24/15 16:58:49 ** $CondorVersion: 8.3.5 Apr 28 2015 $
06/24/15 16:58:49 ** $CondorPlatform: X86_64-CentOS_6.6 $

 From the top of the CollectorLog:
06/24/15 17:15:56 DaemonCore: non-shared command socket at <X.X.X.X:33947>
06/24/15 17:15:56 Daemoncore: Listening at <0.0.0.0:33947> on TCP (ReliSock) and UDP (SafeSock).
06/24/15 17:15:56 DaemonCore: command socket at <X.X.X.X:9619?noUDP&sock=collector>
06/24/15 17:15:56 DaemonCore: private command socket at <X.X.X.X:9619?noUDP&sock=collector>

Let me know if any more information is required to help debug.

-Derek






_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685