[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Shared Port Directives not being used? Standard universe jobs failing when firewall active between machines



Ah, I see! Thanks very much for that info, I missed that entirely.

- Paul

*********************
Starlink/SDO System Administrator
Leighton Building LE114
x3564
+44-1772-893564
star@xxxxxxxxxxx
*********************
________________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Dan Bradley [dan@xxxxxxxxxxxx]
Sent: 11 October 2013 16:11
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] Shared Port Directives not being used? Standard universe jobs failing when firewall active between machines

Hi Paul,

In the fine print in the manual, it says, " The TCP connections required to manage standard universe jobs do not make use of shared ports."  Sorry!

http://research.cs.wisc.edu/htcondor/manual/v8.0/3_7Networking_includes.html#SECTION00472000000000000000

--Dan

On 10/11/13 9:34 AM, Star wrote:

Hello,

We seem to be having the exact same problem with our Condor pool as described in the below unanswered posting regarding Shared Port directives seemingly not being used;

http://permalink.gmane.org/gmane.comp.distributed.condor.user/28184

Both submitting machine & central manager are configured to use a shared port of 9618 & iptables' firewall allows both UDP & TCP traffic on that port on both the central manager & submitter machine;

Central Manager Machine:
CONDOR_HOST = starpulse.star.uclan.ac.uk
COLLECTOR_NAME = UCLan - Starlink
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
SHARED_PORT_ARGS = -p 9618
COLLECTOR_HOST = $(CONDOR_HOST):9618?sock=collector
USE_SHARED_PORT = TRUE

Submitting Machine:
CONDOR_HOST = starpulse.star.uclan.ac.uk
COLLECTOR_NAME = UCLan - Starlink
DAEMON_LIST = MASTER, SCHEDD, STARTD
DAEMON_LIST = $(DAEMON_LIST), SHARED_PORT
SHARED_PORT_ARGS = -p 9618
COLLECTOR_HOST = $(CONDOR_HOST):9618?sock=collector
USE_SHARED_PORT = TRUE

For a standard universe condor_compiled C code HelloWorld executable, in the submitting machine's ShadowLog, we see

failed to connect to scheduler on <192.168.70.56:42024>

Does this indicate that the Shared Port is not in fact being used?

These high-numbered ports are of course blocked by IPTables, so standard universe jobs are not running on our pool while the iptables firewall on each machine is running.

When firewalls on both central manager & submitting machine are disabled, standard universe jobs run & complete fine.

Any advice would be very helpful,

Many thanks,
Paul Browne


*********************
Starlink/SDO System Administrator
Leighton Building LE114
x3564
+44-1772-893564
star@xxxxxxxxxxx<mailto:star@xxxxxxxxxxx>
*********************
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/