[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Couldn't fetch ads: communication error




I install a condor pool.
this machine is ubuntu, cordor version is 8.0.5.
I saw some "Failed" in condor's log.
Is it normal?

condor log:
NegotiatorLog:
11/22/15 16:33:52 ---------- Started Negotiation Cycle ----------
11/22/15 16:33:52 Phase 1:  Obtaining ads from collector ...
11/22/15 16:33:52   Getting Scheduler, Submitter and Machine ads ...
11/22/15 16:33:53   Sorting 47 ads ...
11/22/15 16:33:53   Getting startd private ads ...
11/22/15 16:33:53 condor_write(): Socket closed when trying to write 87 bytes to collector at <10.1.1.101:9618>, fd is 12
11/22/15 16:33:53 Buf::write(): condor_write() failed
11/22/15 16:33:53 Couldn't fetch ads: communication error
11/22/15 16:33:53 Aborting negotiation cycle

SchedLog:
11/22/15 16:33:52 (pid:9434) Failed to execute /usr/sbin/condor_shadow.std, ignoring

ProcLog:
11/22/15 17:14:50 : ProcAPI: new boottime = 1446086402; old_boottime = 1446086402; /proc/stat boottime = 1446086402; /proc/uptime boottime = 1446086402
11/22/15 17:14:50 : process 9621 (not in monitored family) has exited
11/22/15 17:14:50 : no methods have determined process 9655 to be in a monitored family
11/22/15 17:14:50 : ...snapshot complete

codnor config:
DAEMON_LIST = COLLECTOR, NEGOTIATOR, SCHEDD, STARTD, MASTER
# who receives emails when something goes wrong
#CONDOR_ADMIN = root@localhost
# how much memory should NOT be available to HTCondor
RESERVED_MEMORY = 
# label to identify the local filesystem in a HTCondor pool
FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
# label to identify the user id of the system in a HTCondor pool
# (this need to be a fully qualified domain name)
UID_DOMAIN = $(FULL_HOSTNAME)
# which machine is the central manager of this HTCondor pool
CONDOR_HOST = 10.1.1.101
# what machines can access HTCondor daemons on this machine
ALLOW_WRITE = *



The condor host machine:
CentOS, condor version is 8.5.0 
condor config:
use ROLE: Submit, Execute , CentralManager
CONDOR_HOST = 10.1.1.101
use SECURITY: HOST_BASED
ALLOW_ADMINISTRATOR = 10.1.1.101
ALLOW_DAEMON = *
ALLOW_WRITE = *
ALLOW_ADVERTISE_MASTER = *
ALLOW_NEGOTIATOR = $(CONDOR_HOST)
ALLOW_READ = *

--------
Thanks,
Allen