[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] I can not run condor_master on the 2nd node



On Mon, Feb 15, 2016 at 09:14:58AM +0100, Labounek René wrote:
> Dear Condor users,
> 
> I am trying to set a small grid of 2 computers for the begining. I
> am not able to run master and startd on the 2nd node.
> 
> System and condor versions for both nodes:
> 
> labounek@node1$ condor_version
> $CondorVersion: 8.4.0 Sep 23 2015 BuildID:
> Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
> $CondorPlatform: X86_64-Debian_8 $
> labounek@node1$
> 
> I have set /etc/condor/config.d/00debconf files based on this manual.
> 
> https://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/
> 
> node1 file (changed CONDOR_HOST and ALLOW_WRITE):
> DAEMON_LIST = STARTD, SCHEDD, COLLECTOR, NEGOTIATOR, MASTER
> # who receives emails when something goes wrong
> CONDOR_ADMIN = root@localhost
> # how much memory should NOT be available to HTCondor
> RESERVED_MEMORY =
> # label to identify the local filesystem in a HTCondor pool
> FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> # label to identify the user id of the system in a HTCondor pool
> # (this need to be a fully qualified domain name)
> UID_DOMAIN = $(FULL_HOSTNAME)
> # which machine is the central manager of this HTCondor pool
> # CONDOR_HOST = 127.0.0.1
> CONDOR_HOST = node1_IP_adress
> # what machines can access HTCondor daemons on this machine
> # ALLOW_WRITE = 127.0.0.1
> ALLOW_WRITE = node1_IP_adress, node2_IP_adress
> # contact information where HTCondor sends usage statistics
> CONDOR_DEVELOPERS = htcondor-admin@xxxxxxxxxxx
> CONDOR_DEVELOPERS_COLLECTOR = condor.cs.wisc.edu
> # the following settings will restrict HTCondor's network access to
> the internal
> # network
> BIND_ALL_INTERFACES = FALSE
> NETWORK_INTERFACE =  127.0.0.1

This is the local loopback interface - which cannot connect to any
other machine...

> # make HTCondor ignore UID domain name mismatch on systems without a fully
> # qualified domain name (safe because the personal HTCondor does not allow
> # remote access
> TRUST_UID_DOMAIN = TRUE
> # allow HTCondor jobs to run with the same priority as any other
> machine activity
> # always start jobs once they are submitted
> START = TRUE
> # never suspend jobs
> SUSPEND = FALSE
> # always continue jobs
> CONTINUE = TRUE
> # never preempt
> PREEMPT = FALSE
> # never kill
> KILL = FALSE
> 
> node2 file (changed DAEMON_LIST, CONDOR_HOST and ALLOW_WRITE):
> DAEMON_LIST = MASTER, STARTD
> # who receives emails when something goes wrong
> CONDOR_ADMIN = root@localhost
> # how much memory should NOT be available to HTCondor
> RESERVED_MEMORY =
> # label to identify the local filesystem in a HTCondor pool
> FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
> # label to identify the user id of the system in a HTCondor pool
> # (this need to be a fully qualified domain name)
> UID_DOMAIN = $(FULL_HOSTNAME)
> # which machine is the central manager of this HTCondor pool
> # CONDOR_HOST = 127.0.0.1
> CONDOR_HOST = node1_IP_adress
> # what machines can access HTCondor daemons on this machine
> # ALLOW_WRITE = 127.0.0.1
> ALLOW_WRITE = node2_IP_adress
> ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
> # contact information where HTCondor sends usage statistics
> CONDOR_DEVELOPERS = htcondor-admin@xxxxxxxxxxx
> CONDOR_DEVELOPERS_COLLECTOR = condor.cs.wisc.edu
> # the following settings will restrict HTCondor's network access to
> the internal
> # network
> BIND_ALL_INTERFACES = FALSE
> NETWORK_INTERFACE =  127.0.0.1
> # make HTCondor ignore UID domain name mismatch on systems without a fully
> # qualified domain name (safe because the personal HTCondor does not allow
> # remote access
> TRUST_UID_DOMAIN = TRUE
> # allow HTCondor jobs to run with the same priority as any other
> machine activity
> # always start jobs once they are submitted
> START = TRUE
> # never suspend jobs
> SUSPEND = FALSE
> # always continue jobs
> CONTINUE = TRUE
> # never preempt
> PREEMPT = FALSE
> # never kill
> KILL = FALSE
> 
> condor_reconfig problem:
> 
> labounek@node1:~$ sudo condor_reconfig
> [sudo] password for labounek:
> ERROR
> SECMAN:2010:Received "DENIED" from server for user
> unauthenticated@unmapped using method (no authentication).
> Can't send Reconfig command to local master
> labounek@node1:~$
> 
> labounek@node2:~$ sudo condor_reconfig
> Can't connect to local master
> labounek@node2:~$
> 
> At node1, all deamons are running. At node2, any deamon is not running.
> 
> Could somebody help, please?
> 
> Regards,
> Rene
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1
D-14476 Potsdam-Golm
Germany
~~~
Fon: +49-331-567 7274
Fax: +49-331-567 7298
Mail: steffen.grunewald(at)aei.mpg.de
~~~