[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Issues with firewall when USE_SHARED_PORT = True



Hi Tood,

Sorry for late reply but was trying a few things which didn't help. Have clean-installed again this time without allowing the automatic config file option.


Execute-only host:
sudo netstat -tlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:9618            0.0.0.0:*               LISTEN      2190/condor_shared_
tcp        0      0 0.0.0.0:54011           0.0.0.0:*               LISTEN      2190/condor_shared_

Using standard (Debian) config with the following overrides
cat 10myconfig
# HT Condor - Execute-only node
# which HTCondor daemons to run on this machine
DAEMON_LIST = STARTD, MASTER
# who receives emails when something goes wrong
CONDOR_ADMIN = root@localhost
# how much memory should NOT be available to HTCondor
RESERVED_MEMORY =
# label to identify the local filesystem in a HTCondor pool
FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
# label to identify the user id of the system in a HTCondor pool
# (this need to be a fully qualified domain name)
UID_DOMAIN = $(FULL_HOSTNAME)
# which machine is the central manager of this HTCondor pool
CONDOR_HOST = storage1
# what machines can access HTCondor daemons on this machine
ALLOW_WRITE = 127.0.0.1,node01,storage1
# contact information where HTCondor sends usage statistics
CONDOR_DEVELOPERS = htcondor-admin@xxxxxxxxxxx
CONDOR_DEVELOPERS_COLLECTOR = condor.cs.wisc.edu
# Use shared port
USE_SHARED_PORT = True
SHARED_PORT_ARGS = -p 9618
DAEMON_SOCKET_DIR = /var/lib/condor_sock_dir


SharedPortLog attached



sudo iptables -L
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  --  anywhere             anywhere             state RELATED,ESTABLISHED
ACCEPT     all  --  loopback/8           loopback/8
ACCEPT     tcp  --  node01               node01               tcp dpt:9618
ACCEPT     tcp  --  storage1             node01               tcp dpt:9618
Chain FORWARD (policy DROP)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

MarkJ


________________________________
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
To: TarotApprentice <tarotapprentice@xxxxxxxxx>; HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> 
Sent: Tuesday, 3 April 2018, 4:20
Subject: Re: [HTCondor-users] Issues with firewall when USE_SHARED_PORT = True



> Config is one Central manager + submit node and one Execute-only node. I 
> have USE_SHARED_PORT enabled on both nodes. I can do condor_q, 
> condor_status commands fine. I need to enable a firewall. I used 
> iptables on the central manager node and allowed port 9618 as input. As 
> soon as I do this it is unable to complete the above commands which time 
> out and give the following error

    Did you restart HTCondor after enabling USE_SHARED_PORT?  I 
wouldn't expect daemons configured to use shared port to have any listen 
ports of their own.  What does 'condor_config_val USE_SHARED_PORT' say?

    Is the directory DAEMON_SOCKET_DIR writeable by the condor user, 
or whichever user you're running the HTCondor daemon as?

> I gather the daemons use randomly allocated ports. Do I need to use a 
> fixed port for each one and allow it through as well?

    No.  When everything's working right, all the daemons will share a 
single port (hence the name of the knob).


> Do I need to use SHARED_PORT on both the central manager and the execute 
> nodes or is only required on one of them?

    It depends on your firewall requirements.  If your execute node 
doesn't need a firewall, you don't need to use shared port on it.

- ToddM
04/09/18 21:34:24 Setting maximum file descriptors to 4096.
04/09/18 21:34:24 ******************************************************
04/09/18 21:34:24 ** condor_shared_port (CONDOR_SHARED_PORT) STARTING UP
04/09/18 21:34:24 ** /usr/lib/condor/libexec/condor_shared_port
04/09/18 21:34:24 ** SubsystemInfo: name=SHARED_PORT type=SHARED_PORT(11) class=DAEMON(1)
04/09/18 21:34:24 ** Configuration: subsystem:SHARED_PORT local:<NONE> class:DAEMON
04/09/18 21:34:24 ** $CondorVersion: 8.4.11 Feb 06 2017 BuildID: Debian-8.4.11~dfsg.1-1 Debian-8.4.11~dfsg.1-1 $
04/09/18 21:34:24 ** $CondorPlatform: ARMV7L-Raspbian_ $
04/09/18 21:34:24 ** PID = 2190
04/09/18 21:34:24 ** Log last touched time unavailable (No such file or directory)
04/09/18 21:34:24 ******************************************************
04/09/18 21:34:24 Using config source: /etc/condor/condor_config
04/09/18 21:34:24 Using local config sources:
04/09/18 21:34:24    /etc/condor/config.d/10myconfig
04/09/18 21:34:24    /etc/condor/condor_config.local
04/09/18 21:34:24 config Macros = 61, Sorted = 61, StringBytes = 1650, TablesBytes = 1736
04/09/18 21:34:24 CLASSAD_CACHING is ENABLED
04/09/18 21:34:24 Daemon Log is logging: D_ALWAYS D_ERROR
04/09/18 21:34:24 Daemoncore: Listening at <0.0.0.0:9618> on TCP (ReliSock).
04/09/18 21:34:24 DaemonCore: command socket at <192.168.1.8:9618?addrs=192.168.1.8-9618&noUDP>
04/09/18 21:34:24 DaemonCore: private command socket at <192.168.1.8:9618?addrs=192.168.1.8-9618>
04/09/18 21:34:24 main_init() called
04/09/18 21:34:24 About to update statistics in shared_port daemon ad file at /var/lock/condor/shared_port_ad :
ForkedChildrenPeak = 0
ForkedChildrenCurrent = 0
RequestsFailed = 0
RequestsSucceeded = 0
MyAddress = "<192.168.1.8:9618?addrs=192.168.1.8-9618&noUDP>"
RequestsBlocked = 0
SharedPortCommandSinfuls = "<192.168.1.8:9618>"
RequestsPendingPeak = 0
RequestsPendingCurrent = 0
04/09/18 21:36:36 attempt to connect to 192.168.1.8 <192.168.1.8:47863> failed: Connection timed out (connect errno = 110).  Will keep trying for 402 total seconds (270 to go).

04/09/18 21:41:06 attempt to connect to 192.168.1.8 <192.168.1.8:47863> failed: Connection timed out (connect errno = 110).
04/09/18 21:41:06 connect_socketpair(): failed to connect() to that.
04/09/18 21:41:06 Failed to connect to loopback socket, so failing to connect via local shared port access to daemon at <192.168.1.8:0>.
04/09/18 21:41:06 ChildAliveMsg: failed to send DC_CHILDALIVE to parent daemon at <192.168.1.8:0> (try 1 of 3): CEDAR:6001:Failed to connect to <192.168.1.8:0?sock=2161_d840>
04/09/18 21:43:21 attempt to connect to 192.168.1.8 <192.168.1.8:54011> failed: Connection timed out (connect errno = 110).  Will keep trying for 402 total seconds (267 to go).

04/09/18 21:47:48 attempt to connect to 192.168.1.8 <192.168.1.8:54011> failed: Connection timed out (connect errno = 110).
04/09/18 21:47:48 connect_socketpair(): failed to connect() to that.
04/09/18 21:47:48 Failed to connect to loopback socket, so failing to connect via local shared port access to daemon at <192.168.1.8:0>.
04/09/18 21:47:48 ChildAliveMsg: failed to send DC_CHILDALIVE to parent daemon at <192.168.1.8:0> (try 2 of 3): CEDAR:6001:Failed to connect to <192.168.1.8:0?sock=2161_d840>|CEDAR:6001:Failed to connect to <192.168.1.8:0?sock=2161_d840>
04/09/18 21:47:48 ChildAliveMsg: giving up because deadline expired for sending DC_CHILDALIVE to parent.
04/09/18 21:47:48 ERROR "FAILED TO SEND INITIAL KEEP ALIVE TO OUR PARENT <192.168.1.8:0?sock=2161_d840>" at line 9902 in file /build/condor-Mcn4Qc/condor-8.4.11~dfsg.1/src/condor_daemon_core.V6/daemon_core.cpp