[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] I can not run condor_master on the 2nd node



Dear Condor users,

I am trying to set a small grid of 2 computers for the begining. I am not able to run master and startd on the 2nd node.

System and condor versions for both nodes:

labounek@node1$ condor_version
$CondorVersion: 8.4.0 Sep 23 2015 BuildID: Debian-8.4.0~dfsg.1-1~nd80+1 Debian-8.4.0~dfsg.1-1~nd80+1 $
$CondorPlatform: X86_64-Debian_8 $
labounek@node1$

I have set /etc/condor/config.d/00debconf files based on this manual.

https://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/

node1 file (changed CONDOR_HOST and ALLOW_WRITE):
DAEMON_LIST = STARTD, SCHEDD, COLLECTOR, NEGOTIATOR, MASTER
# who receives emails when something goes wrong
CONDOR_ADMIN = root@localhost
# how much memory should NOT be available to HTCondor
RESERVED_MEMORY =
# label to identify the local filesystem in a HTCondor pool
FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
# label to identify the user id of the system in a HTCondor pool
# (this need to be a fully qualified domain name)
UID_DOMAIN = $(FULL_HOSTNAME)
# which machine is the central manager of this HTCondor pool
# CONDOR_HOST = 127.0.0.1
CONDOR_HOST = node1_IP_adress
# what machines can access HTCondor daemons on this machine
# ALLOW_WRITE = 127.0.0.1
ALLOW_WRITE = node1_IP_adress, node2_IP_adress
# contact information where HTCondor sends usage statistics
CONDOR_DEVELOPERS = htcondor-admin@xxxxxxxxxxx
CONDOR_DEVELOPERS_COLLECTOR = condor.cs.wisc.edu
# the following settings will restrict HTCondor's network access to the internal
# network
BIND_ALL_INTERFACES = FALSE
NETWORK_INTERFACE =  127.0.0.1
# make HTCondor ignore UID domain name mismatch on systems without a fully
# qualified domain name (safe because the personal HTCondor does not allow
# remote access
TRUST_UID_DOMAIN = TRUE
# allow HTCondor jobs to run with the same priority as any other machine activity
# always start jobs once they are submitted
START = TRUE
# never suspend jobs
SUSPEND = FALSE
# always continue jobs
CONTINUE = TRUE
# never preempt
PREEMPT = FALSE
# never kill
KILL = FALSE

node2 file (changed DAEMON_LIST, CONDOR_HOST and ALLOW_WRITE):
DAEMON_LIST = MASTER, STARTD
# who receives emails when something goes wrong
CONDOR_ADMIN = root@localhost
# how much memory should NOT be available to HTCondor
RESERVED_MEMORY =
# label to identify the local filesystem in a HTCondor pool
FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
# label to identify the user id of the system in a HTCondor pool
# (this need to be a fully qualified domain name)
UID_DOMAIN = $(FULL_HOSTNAME)
# which machine is the central manager of this HTCondor pool
# CONDOR_HOST = 127.0.0.1
CONDOR_HOST = node1_IP_adress
# what machines can access HTCondor daemons on this machine
# ALLOW_WRITE = 127.0.0.1
ALLOW_WRITE = node2_IP_adress
ALLOW_WRITE = $(ALLOW_WRITE), $(CONDOR_HOST)
# contact information where HTCondor sends usage statistics
CONDOR_DEVELOPERS = htcondor-admin@xxxxxxxxxxx
CONDOR_DEVELOPERS_COLLECTOR = condor.cs.wisc.edu
# the following settings will restrict HTCondor's network access to the internal
# network
BIND_ALL_INTERFACES = FALSE
NETWORK_INTERFACE =  127.0.0.1
# make HTCondor ignore UID domain name mismatch on systems without a fully
# qualified domain name (safe because the personal HTCondor does not allow
# remote access
TRUST_UID_DOMAIN = TRUE
# allow HTCondor jobs to run with the same priority as any other machine activity
# always start jobs once they are submitted
START = TRUE
# never suspend jobs
SUSPEND = FALSE
# always continue jobs
CONTINUE = TRUE
# never preempt
PREEMPT = FALSE
# never kill
KILL = FALSE

condor_reconfig problem:

labounek@node1:~$ sudo condor_reconfig
[sudo] password for labounek:
ERROR
SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
Can't send Reconfig command to local master
labounek@node1:~$

labounek@node2:~$ sudo condor_reconfig
Can't connect to local master
labounek@node2:~$

At node1, all deamons are running. At node2, any deamon is not running.

Could somebody help, please?

Regards,
Rene