[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Changed connection port......but now there are problems....




If you choose to run your collector on a non-standard port, then you must configure COLLECTOR_HOST with that port number on all hosts in your condor pool. Verify that the following command shows the correct port number in all cases:

condor_config_val COLLECTOR_HOST

When you see this in the logs:

04/21 17:23:36 Sock::bindWithin - failed to bind any port within (5900 ~ 5905)

it means your port range is not big enough.

According to the manual, a rough guideline is that a submit machine needs a port range that is 5 + 5*NumRunningJobs and an execute machine needs 5 + 5*NumberOfSlots.

In 7.5.0 (development branch), a new feature was added that allows all condor daemons to share a single incoming network port. This feature currently only works under unix.

--Dan

michele pierri wrote:
Hi,
On my network I have a firewall...so I have to configure Condor to work properly.
Without firewall my Condor configuration works properly.

So I have placed it under a firewall.
I found this open port :
5900-5905
5009
5190

I have opened my central manager $CONDOR_CONFIG and modified it in this mode:

##  This setting primarily allows you to change the port that the
##  collector is listening on.  By default, the collector uses port
##  9618, but you can set the port with a ":port", such as:
##  COLLECTOR_HOST = $(CONDOR_HOST):1234
COLLECTOR_HOST  = $(CONDOR_HOST):5009

## The NEGOTIATOR_HOST parameter has been deprecated.  The port where
## the negotiator is listening is now dynamically allocated and the IP
## and port are now obtained from the collector, just like all the
## other daemons.  However, if your pool contains any machines that
## are running version 6.7.3 or earlier, you can uncomment this
## setting to go back to the old fixed-port (9614) for the negotiator.
NEGOTIATOR_HOST = $(CONDOR_HOST):5190

## HIGHPORT and LOWPORT let you set the range of ports that Condor
## will use. This may be useful if you are behind a firewall. By
## default, Condor uses port 9618 for the collector, 9614 for the
## negotiator, and system-assigned (apparently random) ports for
## everything else. HIGHPORT and LOWPORT only affect these
## system-assigned ports, but will restrict them to the range you
## specify here. If you want to change the well-known ports for the
## collector or negotiator, see COLLECTOR_HOST or NEGOTIATOR_HOST.
## Note that both LOWPORT and HIGHPORT must be at least 1024 if you
## are not starting your daemons as root.  You may also specify
## different port ranges for incoming and outgoing connections by
## using IN_HIGHPORT/IN_LOWPORT and OUT_HIGHPORT/OUT_LOWPORT.
HIGHPORT = 5905
LOWPORT = 5900


The problem is that when I type condor_status on the central manager the output is that it can't connect to the central manager.

My Collector Log:
04/21 17:23:28 ******************************************************
04/21 17:23:28 ** condor_collector (CONDOR_COLLECTOR) STARTING UP
04/21 17:23:28 ** /home/michele/condor-7.4.2/sbin/condor_collector
04/21 17:23:28 ** SubsystemInfo: name=COLLECTOR type=COLLECTOR(3) class=DAEMON(1) 04/21 17:23:28 ** Configuration: subsystem:COLLECTOR local:<NONE> class:DAEMON
04/21 17:23:28 ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
04/21 17:23:28 ** $CondorPlatform: I386-LINUX_DEBIAN50 $
04/21 17:23:28 ** PID = 3591
04/21 17:23:28 ** Log last touched 4/21 17:23:28
04/21 17:23:28 ******************************************************
04/21 17:23:28 Using config source: /home/michele/condor-7.4.2/etc/condor_config
04/21 17:23:28 Using local config sources:
04/21 17:23:28 /home/michele/condor-7.4.2/local.hermes/condor_config.local
04/21 17:23:28 DaemonCore: Command Socket at <xxx.xxx.xxx:1180>
04/21 17:23:28 In ViewServer::Init()
04/21 17:23:28 In CollectorDaemon::Init()
04/21 17:23:28 In ViewServer::Config()
04/21 17:23:28 In CollectorDaemon::Config()
04/21 17:23:28 OfflineCollectorPlugin::configure: no persistent store was defined for off-line ads.
04/21 17:23:28 enable: Creating stats hash table
04/21 17:23:28 Enabling CCB Server.
04/21 17:23:41 (Sending 0 ads in response to query)
04/21 17:23:41 MasterAd     : Inserting ** "< hermes.pin.unifi.it >"
04/21 17:23:41 stats: Inserting new hashent for 'Master':'hermes.pin.unifi.it':'xxx.xxx.xxx' 04/21 17:24:37 ScheddAd : Inserting ** "< hermes.pin.unifi.it , xxx.xxx.xxx >" 04/21 17:24:37 stats: Inserting new hashent for 'Schedd':'hermes.pin.unifi.it':'xxx.xxx.xxx'


My SchedLog:
04/21 17:24:32 (pid:3624) ******************************************************
04/21 17:24:32 (pid:3624) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
04/21 17:24:32 (pid:3624) ** /home/michele/condor-7.4.2/sbin/condor_schedd
04/21 17:24:32 (pid:3624) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 04/21 17:24:32 (pid:3624) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 04/21 17:24:32 (pid:3624) ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
04/21 17:24:32 (pid:3624) ** $CondorPlatform: I386-LINUX_DEBIAN50 $
04/21 17:24:32 (pid:3624) ** PID = 3624
04/21 17:24:32 (pid:3624) ** Log last touched 4/21 17:23:41
04/21 17:24:32 (pid:3624) ****************************************************** 04/21 17:24:32 (pid:3624) Using config source: /home/michele/condor-7.4.2/etc/condor_config
04/21 17:24:32 (pid:3624) Using local config sources:
04/21 17:24:32 (pid:3624) /home/michele/condor-7.4.2/local.hermes/condor_config.local
04/21 17:24:32 (pid:3624) DaemonCore: Command Socket at <xxx.xxx.xxx:5903>
04/21 17:24:32 (pid:3624) History file rotation is enabled.
04/21 17:24:32 (pid:3624)   Maximum history file size is: 20971520 bytes
04/21 17:24:32 (pid:3624)   Number of rotated history files is: 2

MasterLog:
04/21 17:23:28 ******************************************************
04/21 17:23:28 ** condor_master (CONDOR_MASTER) STARTING UP
04/21 17:23:28 ** /home/michele/condor-7.4.2/sbin/condor_master
04/21 17:23:28 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 04/21 17:23:28 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
04/21 17:23:28 ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
04/21 17:23:28 ** $CondorPlatform: I386-LINUX_DEBIAN50 $
04/21 17:23:28 ** PID = 3462
04/21 17:23:28 ** Log last touched 4/21 17:23:28
04/21 17:23:28 ******************************************************
04/21 17:23:28 Using config source: /home/michele/condor-7.4.2/etc/condor_config
04/21 17:23:28 Using local config sources:
04/21 17:23:28 /home/michele/condor-7.4.2/local.hermes/condor_config.local
04/21 17:23:28 DaemonCore: Command Socket at <xxx.xxx.xxx:5904>
04/21 17:23:28 Started DaemonCore process "/home/michele/condor-7.4.2/sbin/condor_collector", pid and pgroup = 3591 04/21 17:23:31 Started DaemonCore process "/home/michele/condor-7.4.2/sbin/condor_negotiator", pid and pgroup = 3592 04/21 17:23:31 Started DaemonCore process "/home/michele/condor-7.4.2/sbin/condor_schedd", pid and pgroup = 3593 04/21 17:23:36 Sock::bindWithin - failed to bind any port within (5900 ~ 5905)
04/21 17:23:36 Failed to start non-blocking update to unknown.
04/21 17:23:41 The SCHEDD (pid 3593) exited with status 4
04/21 17:23:41 Sending obituary for "/home/michele/condor-7.4.2/sbin/condor_schedd" 04/21 17:23:41 restarting /home/michele/condor-7.4.2/sbin/condor_schedd in 10 seconds 04/21 17:23:51 Sock::bindWithin - failed to bind any port within (5900 ~ 5905)
04/21 17:23:51 Failed to bind to command ReliSock
04/21 17:23:51 (Make sure your IP address is correct in /etc/hosts.)
04/21 17:23:51 BindAnyCommandPort() failed
04/21 17:23:51 ERROR: Create_Process failed trying to start /home/michele/condor-7.4.2/sbin/condor_schedd 04/21 17:23:51 restarting /home/michele/condor-7.4.2/sbin/condor_schedd in 11 seconds 04/21 17:24:02 Sock::bindWithin - failed to bind any port within (5900 ~ 5905)
04/21 17:24:02 Failed to bind to command ReliSock
04/21 17:24:02 (Make sure your IP address is correct in /etc/hosts.)
04/21 17:24:02 BindAnyCommandPort() failed
04/21 17:24:02 ERROR: Create_Process failed trying to start /home/michele/condor-7.4.2/sbin/condor_schedd 04/21 17:24:02 restarting /home/michele/condor-7.4.2/sbin/condor_schedd in 13 seconds 04/21 17:24:15 Sock::bindWithin - failed to bind any port within (5900 ~ 5905)
04/21 17:24:15 Failed to bind to command ReliSock
04/21 17:24:15 (Make sure your IP address is correct in /etc/hosts.)
04/21 17:24:15 BindAnyCommandPort() failed
04/21 17:24:15 ERROR: Create_Process failed trying to start /home/michele/condor-7.4.2/sbin/condor_schedd 04/21 17:24:15 restarting /home/michele/condor-7.4.2/sbin/condor_schedd in 17 seconds 04/21 17:24:32 Started DaemonCore process "/home/michele/condor-7.4.2/sbin/condor_schedd", pid and pgroup = 3624

NegotiatorLog:
04/21 17:26:41 ---------- Started Negotiation Cycle ----------
04/21 17:26:41 Phase 1:  Obtaining ads from collector ...
04/21 17:26:41   Getting all public ads ...
04/21 17:26:41   Sorting 2 ads ...
04/21 17:26:41   Getting startd private ads ...
04/21 17:26:41 Sock::bindWithin - failed to bind any port within (5900 ~ 5905)
04/21 17:26:41 Couldn't fetch ads: communication error
04/21 17:26:41 Aborting negotiation cycle

StartLog:
04/20 17:06:04 ******************************************************
04/20 17:06:04 ** condor_startd (CONDOR_STARTD) STARTING UP
04/20 17:06:04 ** /home/michele/condor-7.4.2/sbin/condor_startd
04/20 17:06:04 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1) 04/20 17:06:04 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
04/20 17:06:04 ** $CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
04/20 17:06:04 ** $CondorPlatform: I386-LINUX_DEBIAN50 $
04/20 17:06:04 ** PID = 9648
04/20 17:06:04 ** Log last touched 4/20 17:06:01
04/20 17:06:04 ******************************************************
04/20 17:06:04 Using config source: /home/michele/condor-7.4.2/etc/condor_config
04/20 17:06:04 Using local config sources:
04/20 17:06:04 /home/michele/condor-7.4.2/local.hermes/condor_config.local
04/20 17:06:04 DaemonCore: Command Socket at <xxx.xxx.xxx:47057>
04/20 17:06:05 VM-gahp server reported an internal error
04/20 17:06:05 VM universe will be tested to check if it is available
04/20 17:06:05 History file rotation is enabled.
04/20 17:06:05   Maximum history file size is: 20971520 bytes
04/20 17:06:05   Number of rotated history files is: 2
04/20 17:06:05 slot1: New machine resource allocated
04/20 17:06:05 slot2: New machine resource allocated
04/20 17:06:05 About to run initial benchmarks.
04/20 17:06:10 Completed initial benchmarks.
04/20 17:06:10 slot1: State change: IS_OWNER is false
04/20 17:06:10 slot1: Changing state: Owner -> Unclaimed
04/20 17:06:10 slot2: State change: IS_OWNER is false
04/20 17:06:10 slot2: Changing state: Owner -> Unclaimed
04/20 17:06:57 Got SIGHUP.  Re-reading config files.
04/20 17:06:57 History file rotation is enabled.
04/20 17:06:57   Maximum history file size is: 20971520 bytes
04/20 17:06:57   Number of rotated history files is: 2
04/20 17:07:00 Got SIGTERM. Performing graceful shutdown.
04/20 17:07:00 shutdown graceful
04/20 17:07:00 Deleting Cronmgr
04/20 17:07:00 attempt to connect to <xxx.xxx.xxx:9618> failed: Connection refused (connect errno = 111).
04/20 17:07:00 Failed to send update to collector hermes.pin.unifi.it.
04/20 17:07:00 attempt to connect to <xxx.xxx.xxx:9618> failed: Connection refused (connect errno = 111).
04/20 17:07:00 Failed to send update to collector hermes.pin.unifi.it.
04/20 17:07:00 All resources are free, exiting.
04/20 17:07:00 **** condor_startd (condor_STARTD) pid 9648 EXITING WITH STATUS 0


Thanks in advice.



------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/