[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Shared Port Directives not being used? Standard universe jobs failing when firewall active between machines



Hello,

We seem to be having the exact same problem with our Condor pool as described in the below unanswered posting regarding Shared Port directives seemingly not being used;

http://permalink.gmane.org/gmane.comp.distributed.condor.user/28184

Both submitting machine & central manager are configured to use a shared port of 9618 & iptables' firewall allows both UDP & TCP traffic on that port on both the central manager & submitter machine;

Central Manager Machine:
CONDOR_HOST = starpulse.star.uclan.ac.uk
COLLECTOR_NAME = UCLan - Starlink
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
SHARED_PORT_ARGS = -p 9618
COLLECTOR_HOST = $(CONDOR_HOST):9618?sock=collector
USE_SHARED_PORT = TRUE

Submitting Machine:
CONDOR_HOST = starpulse.star.uclan.ac.uk
COLLECTOR_NAME = UCLan - Starlink
DAEMON_LIST = MASTER, SCHEDD, STARTD
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
SHARED_PORT_ARGS = -p 9618
COLLECTOR_HOST = $(CONDOR_HOST):9618?sock=collector
USE_SHARED_PORT = TRUE

For a standard universe condor_compiled C code HelloWorld executable, in the submitting machine's ShadowLog, we see 

failed to connect to scheduler on <192.168.70.56:42024>

Does this indicate that the Shared Port is not in fact being used? 

These high-numbered ports are of course blocked by IPTables, so standard universe jobs are not running on our pool while the iptables firewall on each machine is running.

When firewalls on both central manager & submitting machine are disabled, standard universe jobs run & complete fine.

Any advice would be very helpful,

Many thanks,
Paul Browne


*********************
Starlink/SDO System Administrator
Leighton Building LE114
x3564
+44-1772-893564
star@xxxxxxxxxxx
*********************