[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] I can not run condor_master on the 2nd node



Unfortunately I have still problem with it. I have set the parameters at both nodes at those values:

BIND_ALL_INTERFACES = FALSE
NETWORK_INTERFACE =  172.19.37.*


condor_status shows me from both nodes only the main node where all daemons are running (emperor is the main, magellan the 2nd one).

labounek@magellan:~$ condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

slot10@xxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:46
slot11@xxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:47
slot12@xxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:48
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 2682 11+01:41:43
slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 2682 11+01:41:46
slot3@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 2682 11+01:41:47
slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      1.000 2682 11+01:41:48
slot5@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.380 2682 11+01:41:49
slot6@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:50
slot7@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:51
slot8@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:44
slot9@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.000 2682 11+01:41:45
                     Total Owner Claimed Unclaimed Matched Preempting Backfill

        X86_64/LINUX    12     0       0        12       0          0        0

               Total    12     0       0        12       0          0        0
labounek@magellan:~$


But condor_master and condor_startd are not still running at the magellan.

labounek@magellan:~$ sudo condor_master
labounek@magellan:~$ ps -ef | egrep condor_
labounek    3368    2174  0 14:37 pts/0    00:00:00 grep -E condor_
labounek@magellan:~$


I was looking at the port availability emperor looks ok:

labounek@emperor:~$ netstat -an | grep 9618 | grep LISTEN
tcp        0      0 172.19.37.11:9618       0.0.0.0:*               LISTEN    
labounek@emperor:~$ netstat -an | grep 9618 | grep udp
udp        0      0 172.19.37.11:9618       0.0.0.0:*                         
labounek@emperor:~$


For magellan, I am not getting any output.

labounek@magellan:~$ netstat -an | grep 9618 | grep LISTEN
labounek@magellan:~$ netstat -an | grep 9618 | grep udp
labounek@magellan:~$


I was looking at magellan's iptables list, but it is empty. So 9618 port should not be disabled.

labounek@magellan:~$ sudo iptables -nL
Chain INPUT (policy ACCEPT)
target     prot opt source               destination        

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination        

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination        
labounek@magellan:~$


Please, any ideas? I am stuck.

Regards,
Rene

Dne 15.2.2016 v 11:28 Steffen Grunewald napsal(a):
On Mon, Feb 15, 2016 at 09:52:53AM +0100, René Labounek wrote:
And, how should I change this parameter? Only comment it and not to
use it? Or set some specific IP address here?

When I have commented the variable NETWORK_INTERFACE, the computers
are not still communicating.

labounek@node1:~$ sudo condor_reconfig
Sent "Reconfig" command to local master
labounek@node1:~$ sudo service condor start
labounek@node1:~$ condor_status
Error: communication error
CEDAR:6001:Failed to connect to <node1_IP_adress:9618
<http://172.19.37.11:9618>>
labounek@node1:~$

labounek@node2:~$ sudo condor_reconfig
Can't connect to local master
labounek@node2:~$

Regards,
Rene

Dne 15.2.2016 v 09:21 Steffen Grunewald napsal(a):
# the following settings will restrict HTCondor's network access to
the internal
# network
BIND_ALL_INTERFACES = FALSE
NETWORK_INTERFACE =  127.0.0.1
This is the local loopback interface - which cannot connect to any
other machine...

# make HTCondor ignore UID domain name mismatch on systems without a fully
# qualified domain name (safe because the personal HTCondor does not allow
# remote access
TRUST_UID_DOMAIN = TRUE


      
I see two options:
- use 
	BIND_ALL_INTERFACES = True
  (that should work immediately, but perhaps is too permissive) or
- set NETWORK_INTERFACE to something that makes sense for your network setup, e.g.
	NETWORK_INTERFACE="172.19.37.*"
  (if that's the network both machines share)

- S
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/