[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Unable to set up shared port



I'm trying to set up a shared port. I've tried multiple configurations but none of them works. Could someone please take a look on my config?

(I)

CONDOR_HOST = $(HOSTNAME)
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
ALLOW_WRITE = $(HOSTNAME), *.iehk.rwth-aachen.de


Original configuration. No shared port. Everything works fine.

[root@ax_condor /]# condor_status <--------------------- No error

[root@ax_condor /]# nmap localhost <-------------------- 9618 is open
PORT     STATE SERVICE
9618/tcp open  condor

[root@ax_condor /]# ps -ef | grep condor
condor         8       1  0 10:53 ?        00:00:00 /usr/sbin/condor_master -f -t
root          33       8  0 10:53 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 997
condor        34       8  0 10:53 ?        00:00:00 condor_collector -f
condor        35       8  0 10:53 ?        00:00:00 condor_negotiator -f
condor        36       8  0 10:53 ?        00:00:00 condor_schedd -f


(II)

CONDOR_HOST = $(HOSTNAME)
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
ALLOW_WRITE = $(HOSTNAME), *.iehk.rwth-aachen.de
USE_SHARED_PORT = TRUE
SHARED_PORT_ARGS = -p 9614
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT


As soon as I add a shared port to my config, my collector stops listening to 9618. Actually, I thought that the shared port wouldn't effect the collector port...

[root@ax_condor /]# condor_status <--------------------- Error
Error: communication error
CEDAR:6001:Failed to connect to <172.17.0.2:9618>

[root@ax_condor /]# nmap localhost <-------------------- 9618 is closed
All 1000 scanned ports on localhost (127.0.0.1) are closed

[root@ax_condor /]# ps -ef | grep condor
condor         8       1  0 10:56 ?        00:00:00 /usr/sbin/condor_master -f -t
root          33       8  0 10:56 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 997
condor        34       8  0 10:56 ?        00:00:00 condor_shared_port -f -p 9614
condor        35       8  0 10:56 ?        00:00:00 condor_collector -f
condor        36       8  0 10:56 ?        00:00:00 condor_negotiator -f
condor        37       8  0 10:56 ?        00:00:00 condor_schedd -f


(III)

CONDOR_HOST = $(HOSTNAME)
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
ALLOW_WRITE = $(HOSTNAME), *.iehk.rwth-aachen.de
USE_SHARED_PORT = TRUE
SHARED_PORT_ARGS = -p 9614
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
COLLECTOR_HOST = $(COLLECTOR_HOST):9618
COLLECTOR_ARGS = -p 9618


So, I've tried to explicitly define a collector port... But the collector seems to ignore this option if a shared port is enabled.

[root@ax_condor /]# condor_status <--------------------- Error
Error: communication error
CEDAR:6001:Failed to connect to <172.17.0.2:9618>

[root@ax_condor /]# nmap localhost <-------------------- 9618 is closed
All 1000 scanned ports on localhost (127.0.0.1) are closed

[root@ax_condor /]# ps -ef | grep condor
condor         7       1  0 10:59 ?        00:00:00 /usr/sbin/condor_master -f -t
root          32       7  0 10:59 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 997
condor        33       7  0 10:59 ?        00:00:00 condor_shared_port -f -p 9614
condor        34       7  0 10:59 ?        00:00:00 condor_collector -f -p 9618
condor        35       7  0 10:59 ?        00:00:00 condor_negotiator -f
condor        36       7  0 10:59 ?        00:00:00 condor_schedd -f


(IV)


CONDOR_HOST = $(HOSTNAME)
DAEMON_LIST = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD
ALLOW_WRITE = $(HOSTNAME), *.iehk.rwth-aachen.de
USE_SHARED_PORT = TRUE
SHARED_PORT_ARGS = -p 9614
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
COLLECTOR_HOST = $(COLLECTOR_HOST):9614?sock=collector
COLLECTOR_ARGS = -p 9614 -sock collector


So, I've defined a unique socket for my collector. This fixes the problem...

[root@ax_condor /]# condor_status <--------------------- No error

[root@ax_condor /]# nmap localhost <-------------------- 9618 is closed
All 1000 scanned ports on localhost (127.0.0.1) are closed

[root@ax_condor /]# ps -ef | grep condor
condor         8       1  0 11:01 ?        00:00:00 /usr/sbin/condor_master -f -t
root          33       8  0 11:01 ?        00:00:00 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 997
condor        34       8  0 11:01 ?        00:00:00 condor_shared_port -f -p 9614
condor        35       8  0 11:01 ?        00:00:00 condor_collector -f -p 9614 -sock collector
condor        36       8  0 11:01 ?        00:00:00 condor_negotiator -f
condor        37       8  0 11:01 ?        00:00:00 condor_schedd -f\


(V)

... however only on localhost. On any other host I'm getting the following error:

C:\Users\lkosch>condor_status
Error: communication error
CEDAR:6001:Failed to connect to <ax_condor:9614?sock=collector>