[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor Configuration Trouble



Hi Uchenna,
looking your log file, looks like you're using VirtualBox with a 1st NAT interface (IP 10.0.2.15 always) and another which have the real network IP.
When you have that structure you must indicate to HTCondor the IP it should use to connections, you does it with this line in the condor_config.local file:
NETWORK_INTERFACE=159.203.152.145 # From your example, this is the real IP of the master.
In the nodes you have to do the same but with each node's IP in net 159.203.

As Zach told, remember to add the hostnames (with correct IPs) in the /etc/hosts file or in the DNS if you have one.

Hope this help you.

On Wed, Feb 8, 2017 at 11:47 AM, Zach Miller <zmiller@xxxxxxxxxxx> wrote:

It appears your ALLOW_READ, ALLOW_WRITE, etc. configuration settings are causing the âPERMISSION DENIEDâ errors you are seeing.

Â

Do you have DNS entries for what you call âabxx.xxxâ? Reverse DNS? This line in your log seems like trouble:

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ WARNING: forward resolution of abxx.xxx doesn't match 10.0.0.30!

Â

Â

If the DNS isnât set up, try setting the configuration like this:

Â

ALLOW_READ = 10.0.*

[similar for other ALLOW_ settings]

Â

Â

Cheers,

-zach

Â

Â

Â

Â

On 2/7/17, 11:18 PM, "HTCondor-users on behalf of Uchenna Ojiaku" <htcondor-users-bounces@cs.wisc.edu on behalf of ucojiaku@xxxxxxxxx> wrote:

Â

Hi,

Â

I've tried to setup condor between two nodes.

Â

When I run "condor_status" I get:

Â

Error: communication error

CEDAR:6001:Failed to connect to <159.203.152.145:9618>

Â

Â

My /etc/condor/condor_config file:

Â

Â

MY_FULL_HOSTNAME = abxx.xxx (here I put my hostname

Â

## ÂPathnames

RUN Â Â = $(LOCAL_DIR)/run/condor

LOG Â Â = $(LOCAL_DIR)/log/condor

LOCK Â Â= $(LOCAL_DIR)/lock/condor

SPOOL Â = $(LOCAL_DIR)/lib/condor/spool

EXECUTE = $(LOCAL_DIR)/lib/condor/execute

BIN Â Â = $(RELEASE_DIR)/bin

LIB = $(RELEASE_DIR)/lib64/condor

INCLUDE = $(RELEASE_DIR)/include/condor

SBIN Â Â= $(RELEASE_DIR)/sbin

LIBEXEC = $(RELEASE_DIR)/libexec/condor

SHARE Â = $(RELEASE_DIR)/share/condor

Â

PROCD_ADDRESS = $(RUN)/procd_pipe

Â

JAVA_CLASSPATH_DEFAULT = $(SHARE) $(SHARE)/scimark2lib.jar .

Â

## ÂWhat machine is your central manager?

Â

CONDOR_HOST = $(MY_FULL_HOSTNAME)

Â

## ÂThis macro determines what daemons the condor_master will start and keep its

Âwatchful eyes on.

## ÂThe list is a comma or space separated list of subsystem names

Â

NETWORK_INTERFACE = 10.0.x.x (here I put my ip address)

Â

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD

Â

Â

My /etc/condor/condor_config.local file:

Â

Â

CONDOR_ADMIN Â Â Â Â Â Â Â Â Â Â= prometheus.abxx.xxx

Â

#FILESYSTEM_DOMAIN Â Â Â Â Â Â Â = 10.0.x.x

#CONDOR_ADMIN Â Â Â Â Â Â Â Â Â Â= prometheus@xxxxxxxx

Â

FILESYSTEM_DOMAIN Â Â Â Â Â Â Â = abxx.xxx

UID_DOMAIN Â Â Â Â Â Â Â Â Â Â Â= abxx.xxx

Â

# each slot gets a CPU

NUM_SLOTS Â Â Â Â Â Â Â Â Â Â Â = 1

NUM_SLOTS_TYPE_1 Â Â Â Â Â Â Â Â= 1

SLOT_TYPE_1 Â Â Â Â Â Â Â Â Â Â = cpus=100%

SLOT_TYPE_1_PARTITIONABLE Â Â Â = True

USE_NFS Â Â Â Â Â Â Â Â Â Â Â Â = True

DAGMAN_LOG_ON_NFS_IS_ERROR Â Â Â= FALSE

Â

KEEP_POOL_HISTORY Â Â Â Â Â Â Â = True

POOL_HISTORY_DIR Â Â Â Â Â Â Â Â= /var/spool/condor

POOL_HISTORY_MAX_STORAGE Â Â Â Â= 100000000

POOL_HISTORY_SAMPLING_INTERVAL Â= 60

Â

Â

ALLOW_READ Â Â Â Â Â Â Â Â Â Â Â= abxx.xxx

ALLOW_WRITE Â Â Â Â Â Â Â Â Â Â = abxx.xxx

ALLOW_ADMINISTRATOR Â Â Â Â Â Â = $(CONDOR_HOST)

ALLOW_OWNER Â Â Â Â Â Â Â Â Â Â = abxx.xxx, $(ALLOW_ADMINISTRATOR)

HOSTALLOW_ADMINISTRATOR Â Â Â Â = abuo.com

Â

DAEMON_LIST Â Â Â Â Â Â Â Â Â Â = $(DAEMON_LIST)

#START Â Â Â Â Â Â Â Â Â Â Â Â Â= ($(START)) && target.AcctGroup =?= "group_pseu

do_operational_processing"

NEGOTIATOR_MATCHLIST_CACHING Â Â= FALSE

NEGOTIATOR_ALLOW_QUOTA_OVERSUBSCRIPTION = TRUE

PRIORITY_HALFLIFE Â Â Â Â Â Â Â = 1.79769e+308

Â

Â

Condor MasterLog:

Â

02/03/17 15:56:46 restarting /usr/sbin/condor_collector in 10 seconds

02/03/17 15:56:46 attempt to connect to <10.0.2.15:9618> failed: Connection refu

sed (connect errno = 111).

02/03/17 15:56:46 ERROR: SECMAN:2003:TCP connection to collector abxx.xxx failed

.

02/03/17 15:56:46 Failed to start non-blocking update to <10.0.2.15:9618>.

02/03/17 15:56:56 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 65480

02/03/17 15:56:58 SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).

02/03/17 15:56:58 ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).

02/03/17 15:56:58 Failed to start non-blocking update to <10.0.2.15:9618>.

02/03/17 15:57:11 WARNING: forward resolution of abxx.xxx doesn't match 10.0.0.3

0!ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ

02/03/17 15:57:11 Got SIGTERM. Performing graceful shutdown.

02/03/17 15:57:18 SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).

02/03/17 15:57:18 ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).

02/03/17 15:57:18 Failed to send update to collector abuo.com.

02/03/17 15:57:18 Sent SIGTERM to STARTD (pid 64673)

02/03/17 15:57:18 AllReaper unexpectedly called on pid 64673, status 0.

02/03/17 15:57:18 The STARTD (pid 64673) exited with status 0

02/03/17 15:57:19 All STARTDs are gone. Stopping other daemons Gracefully

02/03/17 15:57:19 Sent SIGTERM to COLLECTOR (pid 65480)

02/03/17 15:57:19 Sent SIGTERM to NEGOTIATOR (pid 64671)

02/03/17 15:57:19 Sent SIGTERM to SCHEDD (pid 64672)

02/03/17 15:57:19 AllReaper unexpectedly called on pid 65480, status 0.

02/03/17 15:57:19 The COLLECTOR (pid 65480) exited with status 0

02/03/17 15:57:19 AllReaper unexpectedly called on pid 64671, status 0.

02/03/17 15:57:19 The NEGOTIATOR (pid 64671) exited with status 0

02/03/17 15:57:19 AllReaper unexpectedly called on pid 64672, status 0.

02/03/17 15:57:19 The SCHEDD (pid 64672) exited with status 0

02/03/17 15:57:19 All daemons are gone. Exiting.

02/03/17 15:57:19 **** condor_master (condor_MASTER) pid 4179 EXITING WITH STATUS 0

Â

Â

My CollectorLog:

Â

02/03/17 15:56:58 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: ADVERTISE_MASTER authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address = 10.0.2.15

02/03/17 15:56:58 DC_AUTHENTICATE: Command not authorized, done!

02/03/17 15:56:58 CollectorAd Â: Inserting ** "< My Pool - abxx.xxx@xxxxxxxx >"

02/03/17 15:56:58 stats: Inserting new hashent for 'Collector':'My Pool - abxx.xxx@xxxxxxxx':'10.0.x.x'

02/03/17 15:57:18 attempt to connect to <159.203.152.145:9618> failed: timed out after 20 seconds.

02/03/17 15:57:18 Failed to send update to collector abxx.xxx.

02/03/17 15:57:18 Unable to send UPDATE_COLLECTOR_AD to all configured collectors

02/03/17 15:57:18 WARNING: forward resolution of abxx.xxx doesn't match 10.0.2.15!

02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: NEGOTIATOR authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address

Â= 10.0.2.15

02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!

02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 15 (INVALIDATE_MASTER_ADS), access level ADVERTISE_MASTER: reason: cached result for ADVERTISE_MASTER; see first case for the full reason

02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!

02/03/17 15:57:18 WARNING: forward resolution of abxx.xxx doesn't match 10.0.2.15!

02/03/17 15:57:18 PERMISSION DENIED to unauthenticated@unmapped from host 10.0.2.15 for command 13 (INVALIDATE_STARTD_ADS), access level ADVERTISE_STARTD: reason: ADVERTISE_STARTD authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 10.0.2.15, hostname size = 0, original ip address = 10.0.2.15

02/03/17 15:57:18 DC_AUTHENTICATE: Command not authorized, done!

02/03/17 15:57:19 Got SIGTERM. Performing graceful shutdown.

02/03/17 15:57:19 **** condor_collector (condor_COLLECTOR) pid 65480 EXITING WITH STATUS 0

Â

Â

Â

Â

Â

Â

Â


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Edier Alberto Zapata HernÃndez
Ingeniero de Soporte en Infraestructura
CIER - Sur