[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor Configuration Trouble



Hi,

I've tried to setup condor between two nodes.

When I run "condor_status" I get:

Error: communication error
CEDAR:6001:Failed to connect to <159.203.152.145:9618>


My /etc/condor/condor_config file:


MY_FULL_HOSTNAME = abxx.xxx (here I put my hostname

## ÂPathnames
RUN Â Â = $(LOCAL_DIR)/run/condor
LOG Â Â = $(LOCAL_DIR)/log/condor
LOCK Â Â= $(LOCAL_DIR)/lock/condor
SPOOL Â = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN Â Â = $(RELEASE_DIR)/bin
LIB = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN Â Â= $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE Â = $(RELEASE_DIR)/share/condor

PROCD_ADDRESS = $(RUN)/procd_pipe

JAVA_CLASSPATH_DEFAULT = $(SHARE) $(SHARE)/scimark2lib.jar .

## ÂWhat machine is your central manager?

CONDOR_HOST = $(MY_FULL_HOSTNAME)

## ÂThis macro determines what daemons the condor_master will start and keep its
Âwatchful eyes on.
## ÂThe list is a comma or space separated list of subsystem names

NETWORK_INTERFACE = 10.0.x.x (here I put my ip address)

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD


My /etc/condor/condor_config.local file:


CONDOR_ADMIN Â Â Â Â Â Â Â Â Â Â= prometheus.abxx.xxx

#FILESYSTEM_DOMAIN Â Â Â Â Â Â Â = 10.0.x.x
#CONDOR_ADMIN Â Â Â Â Â Â Â Â Â Â= prometheus@xxxxxxxx

FILESYSTEM_DOMAIN Â Â Â Â Â Â Â = abxx.xxx
UID_DOMAIN Â Â Â Â Â Â Â Â Â Â Â= abxx.xxx

# each slot gets a CPU
NUM_SLOTS Â Â Â Â Â Â Â Â Â Â Â = 1
NUM_SLOTS_TYPE_1 Â Â Â Â Â Â Â Â= 1
SLOT_TYPE_1 Â Â Â Â Â Â Â Â Â Â = cpus=100%
SLOT_TYPE_1_PARTITIONABLE Â Â Â = True
USE_NFS Â Â Â Â Â Â Â Â Â Â Â Â = True
DAGMAN_LOG_ON_NFS_IS_ERROR Â Â Â= FALSE

KEEP_POOL_HISTORY Â Â Â Â Â Â Â = True
POOL_HISTORY_DIR Â Â Â Â Â Â Â Â= /var/spool/condor
POOL_HISTORY_MAX_STORAGE Â Â Â Â= 100000000
POOL_HISTORY_SAMPLING_INTERVAL Â= 60


ALLOW_READ Â Â Â Â Â Â Â Â Â Â Â= abxx.xxx
ALLOW_WRITE Â Â Â Â Â Â Â Â Â Â = abxx.xxx
ALLOW_ADMINISTRATOR Â Â Â Â Â Â = $(CONDOR_HOST)
ALLOW_OWNER Â Â Â Â Â Â Â Â Â Â = abxx.xxx, $(ALLOW_ADMINISTRATOR)
HOSTALLOW_ADMINISTRATOR Â Â Â Â = abuo.com

DAEMON_LIST Â Â Â Â Â Â Â Â Â Â = $(DAEMON_LIST)
#START Â Â Â Â Â Â Â Â Â Â Â Â Â= ($(START)) && target.AcctGroup =?= "group_pseu
do_operational_processing"
NEGOTIATOR_MATCHLIST_CACHING Â Â= FALSE
NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = TRUE
PRIORITY_HALFLIFE Â Â Â Â Â Â Â = 1.79769e+308


Condor MasterLog:

02/03/17 15:56:46 restarting /usr/sbin/condor_collector in 10 seconds
02/03/17 15:56:46 attempt to connect to <10.0.2.15:9618> failed: Connection refu
sed (connect errno = 111).
02/03/17 15:56:46 ERROR: SECMAN:2003:TCP connection to collector abxx.xxx failed
.
02/03/17 15:56:46 Failed to start non-blocking update to <10.0.2.15:9618>.
02/03/17 15:56:56 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 65480
02/03/17 15:56:58 SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:56:58 ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:56:58 Failed to start non-blocking update to <10.0.2.15:9618>.
02/03/17 15:57:11 WARNING: forward resolution of abxx.xxx doesn't match 10.0.0.3
0!
02/03/17 15:57:11 Got SIGTERM. Performing graceful shutdown.
02/03/17 15:57:18 SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:57:18 ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/03/17 15:57:18 Failed to send update to collector abuo.com.
02/03/17 15:57:18 Sent SIGTERM to STARTD (pid 64673)
02/03/17 15:57:18 AllReaper unexpectedly called on pid 64673, status 0.
02/03/17 15:57:18 The STARTD (pid 64673) exited with status 0
02/03/17 15:57:19 All STARTDs are gone. Stopping other daemons Gracefully
02/03/17 15:57:19 Sent SIGTERM to COLLECTOR (pid 65480)
02/03/17 15:57:19 Sent SIGTERM to NEGOTIATOR (pid 64671)
02/03/17 15:57:19 Sent SIGTERM to SCHEDD (pid 64672)
02/03/17 15:57:19 AllReaper unexpectedly called on pid 65480, status 0.
02/03/17 15:57:19 The COLLECTOR (pid 65480) exited with status 0
02/03/17 15:57:19 AllReaper unexpectedly called on pid 64671, status 0.
02/03/17 15:57:19 The NEGOTIATOR (pid 64671) exited with status 0
02/03/17 15:57:19 AllReaper unexpectedly called on pid 64672, status 0.
02/03/17 15:57:19 The SCHEDD (pid 64672) exited with status 0
02/03/17 15:57:19 All daemons are gone. Exiting.
02/03/17 15:57:19 **** condor_master (condor_MASTER) pid 4179 EXITING WITH STATUS 0


My CollectorLog:

02/03/17 15:56:58 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address = 10.0.2.15
02/03/17 15:56:58 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:56:58 CollectorAd Â: Inserting ** "< My Pool - abxx.xxx@xxxxxxxx >"
02/03/17 15:56:58 stats: Inserting new hashent for 'Collector':'My Pool - abxx.xxx@xxxxxxxx':'10.0.x.x'
02/03/17 15:57:18 attempt to connect to <159.203.152.145:9618> failed: timed out after 20 seconds.
02/03/17 15:57:18 Failed to send update to collector abxx.xxx.
02/03/17 15:57:18 Unable to send UPDATE_COLLECTOR_AD to all configured collectors
02/03/17 15:57:18 WARNING: forward resolution of abxx.xxx doesn't match 10.0.2.15!
02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: NEGOTIATOR authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address
Â= 10.0.2.15
02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 15 (INVALIDATE_MASTER_ADS), access level ADVERTISE_MASTER: reason: cached result for ADVERTISE_MASTER; see first case for the full reason
02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:57:18 WARNING: forward resolution of abxx.xxx doesn't match 10.0.2.15!
02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 13 (INVALIDATE_STARTD_ADS), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address = 10.0.2.15
02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!
02/03/17 15:57:19 Got SIGTERM. Performing graceful shutdown.
02/03/17 15:57:19 **** condor_collector (condor_COLLECTOR) pid 65480 EXITING WITH STATUS 0