[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Installing HTCondor on Centos 7. SECMAN errors on startup



Good morning,

I am trying to get a standalone Condor installation running, but I am encountering SECMAN and permission denied errors in the condor logs when condor starts. Additionally, when I try to run condor_status, I get SECMAN:2007, communication error. I am starting from the CentOS 7 "Minimal" Installation. I have reviewed the Troubleshooting section in the Condor guide and past mailing list issues. It is unclear what I should try next -- perhaps uninstall the YUM/RPM and install from source?

Any help or suggestions would be appreciated.

Best regards,

Thomas Taylor

#
# My Installation:
#
Host OS: Windows 8.1 Pro, Hyper-V (16 GB RAM, 1 TB HDD)
OS: CentOS Linux release 7.1.1503 (Core) "Minimal" install, x86_64
CPU: Intel i7
Memory: 2 GB
Disk: 40 GB
Condor: 8.2.8, x86_64, installed from Condor rhel7/YUM repo


Here are some of the things I've tried:
1. Turned off firewalld (firewall-cmd --state #not running)
2. Set condor_config.local ALLOW_* and HOSTALLOW_* to *
3. Changed use SECURITY : HOST_BASED to SECURE
4. Confirmed condor port 9618 open (nmap 10.0.0.148 #9618/tcp open Âcondor)
5. Removed, cleaned, and reinstalled condor from the YUM repo (as root)
6. Added a DNS entry for the machine to the domain's DNS server
7. Confirmed I can connect to other services running on machine, even on port 9618 (ssh, python SimpleHTTPServer 9618)



#
# condor_status -debug
#
05/21/15 10:01:55 condor_read(): timeout reading 5 bytes from collector at <10.0.0.148:9618>.
05/21/15 10:01:55 IO: Failed to read packet header
05/21/15 10:01:55 SECMAN: no classad from server, failing
Error: communication error
SECMAN:2007:Failed to end classad message.


#
# /etc/condor/condor_config.local
#
CONDOR_HOST = 10.0.0.148
# use SECURITY : HOST_BASED
use SECURITY : STRONG

ALLOW_ADMINISTRATOR = $(CONDOR_HOST)
HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST)
ALLOW_OWNER = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR)
HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(ALLOW_ADMINISTRATOR)
ALLOW_READ = *
HOSTALLOW_READ = *
ALLOW_WRITE = *
HOSTALLOW_WRITE = *
ALLOW_NEGOTIATOR = $(COLLECTOR_HOST)
HOSTALLOW_NEGOTIATOR = $(COLLECTOR_HOST)
ALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
HOSTALLOW_NEGOTIATOR_SCHEDD = $(COLLECTOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
ALLOW_WRITE_COLLECTOR = $(ALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_COLLECTOR = $(ALLOW_WRITE), $(FLOCK_FROM)
ALLOW_WRITE_STARTD Â Â= $(ALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_STARTD Â Â= $(ALLOW_WRITE), $(FLOCK_FROM)
ALLOW_READ_COLLECTOR Â= $(ALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_COLLECTOR Â= $(ALLOW_READ), $(FLOCK_FROM)
ALLOW_READ_STARTD Â Â = $(ALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_STARTD Â Â = $(ALLOW_READ), $(FLOCK_FROM)
ALLOW_CLIENT = *
HOSTALLOW_CLIENT = *


#
# /var/logs/condor - Condor Logs
#
# List all errors reported in the Condor Logs
# grep -i error /var/logs/condor/*
#
CollectorLog:05/21/15 09:22:58 Daemon Log is logging: D_ALWAYS D_ERROR
CollectorLog:05/21/15 09:35:42 ERROR "FAILED TO SEND INITIAL KEEP ALIVE TO OUR PARENT <10.0.0.148:47494>" at line 9477 in file /slots/06/dir_25022/userdir/.tmpcAsU6A/BUILD/condor-8.2.8/src/condor_daemon_core.V6/daemon_core.cpp
MasterLog:05/21/15 09:22:58 Daemon Log is logging: D_ALWAYS D_ERROR
MasterLog:05/21/15 09:25:04 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:deadline for security handshake with <10.0.0.148:9618> has expired.
MasterLog:05/21/15 09:35:42 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:TCP connection to <10.0.0.148:9618> failed.
MasterLog:05/21/15 09:36:12 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|AUTHENTICATE:1002:Failure performing handshake
NegotiatorLog:05/21/15 09:22:59 Daemon Log is logging: D_ALWAYS D_ERROR D_MATCH
NegotiatorLog:05/21/15 09:24:00 ERROR: SECMAN:2007:Failed to end classad message.
NegotiatorLog:05/21/15 09:24:00 Couldn't fetch ads: communication error
NegotiatorLog:05/21/15 09:26:01 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:deadline for security handshake with <10.0.0.148:9618> has expired.
NegotiatorLog:05/21/15 09:35:42 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2007:Failed to end classad message.
SchedLog:05/21/15 09:22:59 (pid:1194) Daemon Log is logging: D_ALWAYS D_ERROR
SchedLog:05/21/15 09:25:06 (pid:1194) ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:deadline for security handshake with <10.0.0.148:9618> has expired.
SchedLog:05/21/15 09:49:22 (pid:1194) ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2007:Failed to end classad message.
StartLog:05/21/15 09:22:59 Daemon Log is logging: D_ALWAYS D_ERROR
StartLog:05/21/15 09:22:59 You can safely ignore the above error if you're not using hibernation
StartLog:05/21/15 09:23:00 VM-gahp server reported an internal error
StartLog:05/21/15 09:25:05 ERROR: SECMAN:2004:Was waiting for TCP auth session to <10.0.0.148:9618>, but it failed.
StartLog:05/21/15 09:25:05 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:deadline for security handshake with <10.0.0.148:9618> has expired.
StartLog:05/21/15 09:49:22 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2007:Failed to end classad message.


#
# Excerpts from the Condor Logs
#
05/21/15 09:22:58 ******************************************************
05/21/15 09:22:58 ** condor_master (CONDOR_MASTER) STARTING UP
05/21/15 09:22:58 ** /usr/sbin/condor_master
05/21/15 09:22:58 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
05/21/15 09:22:58 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
05/21/15 09:22:58 ** $CondorVersion: 8.2.8 Apr 07 2015 BuildID: UW_development $
05/21/15 09:22:58 ** $CondorPlatform: X86_64-RedHat_7.0 $
05/21/15 09:22:58 ** PID = 1190
05/21/15 09:22:58 ** Log last touched time unavailable (No such file or directory)
05/21/15 09:22:58 ******************************************************
---
05/21/15 09:22:59 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 1193
05/21/15 09:22:59 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 1194
05/21/15 09:22:59 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 1195
05/21/15 09:25:04 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:deadline for security handshake with <10.0.0.148:9618> has
expired.
05/21/15 09:25:04 Failed to start non-blocking update to <10.0.0.148:9618>.
05/21/15 09:30:04 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:deadline for security handshake with <10.0.0.148:9618> has
expired.
---
05/21/15 09:35:04 Failed to start non-blocking update to <10.0.0.148:9618>.
05/21/15 09:35:42 DefaultReaper unexpectedly called on pid 1192, status 1024.
05/21/15 09:35:42 The COLLECTOR (pid 1192) exited with status 4
05/21/15 09:35:42 Sending obituary for "/usr/sbin/condor_collector"
05/21/15 09:35:42 restarting /usr/sbin/condor_collector in 10 seconds
05/21/15 09:35:42 attempt to connect to <10.0.0.148:9618> failed: Connection refused (connect errno = 111).
05/21/15 09:35:42 ERROR: SECMAN:2004:Failed to create security session to <10.0.0.148:9618> with TCP.|SECMAN:2003:TCP connection to <10.0.0.148:9618> failed.
05/21/15 09:35:42 Failed to start non-blocking update to <10.0.0.148:9618>.
05/21/15 09:35:52 Collector port not defined, will use default: 9618
05/21/15 09:35:52 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 1316
05/21/15 09:36:12 condor_read(): timeout reading 5 bytes from <10.0.0.148:9618>.
05/21/15 09:36:12 IO: Failed to read packet header
05/21/15 09:36:12 AUTHENTICATE: handshake failed!

05/21/15 09:49:32 ******************************************************
05/21/15 09:49:32 ** condor_collector (CONDOR_COLLECTOR) STARTING UP
05/21/15 09:49:32 ******************************************************
---
05/21/15 09:49:32 DaemonCore: command socket at <10.0.0.148:9618>
05/21/15 09:49:32 DaemonCore: private command socket at <10.0.0.148:9618>
05/21/15 09:49:32 In ViewServer::Init()
05/21/15 09:49:32 In CollectorDaemon::Init()
05/21/15 09:49:32 In ViewServer::Config()
05/21/15 09:49:32 In CollectorDaemon::Config()
05/21/15 09:49:32 ABSENT_REQUIREMENTS = None
05/21/15 09:49:32 OfflineCollectorPlugin::configure: no persistent store was defined for off-line ads.
05/21/15 09:49:32 enable: Creating stats hash table
05/21/15 09:49:32 Enabling CCB Server.
05/21/15 09:49:32 attempt to connect to <10.0.0.148:47494> failed: Permission denied (connect errno = 13). Will keep trying for 387 total seconds (387 to go).