[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] stuck submit jobs



Dear Condor administrators

I submitted a job, but the job stuck at idle state.
The machine configuration is Scientific Linux CERN 6.5, condor 8.2.3, and the host machine has both global IP address and local IP address.
I think configuration of IP address or DNS is something wrong.
I write the log below and attach the configuration file.
Could you tell me how to fix it?

Thank you in advance.
Best regards,

---
MasterLog
10/07/14 15:30:43 ******************************************************
    10/07/14 15:30:43 ** condor_master (CONDOR_MASTER) STARTING UP
    10/07/14 15:30:43 ** /usr/sbin/condor_master
10/07/14 15:30:43 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 10/07/14 15:30:43 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 10/07/14 15:30:43 ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
    10/07/14 15:30:43 ** $CondorPlatform: x86_64_RedHat6 $
    10/07/14 15:30:43 ** PID = 2358
    10/07/14 15:30:43 ** Log last touched 10/7 15:30:43
10/07/14 15:30:43 ******************************************************
    10/07/14 15:30:43 Using config source: /etc/condor/condor_config
    10/07/14 15:30:43 Using local config sources:
    10/07/14 15:30:43    /etc/condor/config.d/condor_config.local
    10/07/14 15:30:43    /etc/condor/config.d/condor_config.local
10/07/14 15:30:43 config Macros = 65, Sorted = 65, StringBytes = 2025, TablesBytes = 2396
    10/07/14 15:30:43 CLASSAD_CACHING is OFF
    10/07/14 15:30:43 Daemon Log is logging: D_ALWAYS D_ERROR
    10/07/14 15:30:43 DaemonCore: command socket at <192.168.12.1:41030>
10/07/14 15:30:43 DaemonCore: private command socket at <192.168.12.1:41030> 10/07/14 15:30:43 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1412124630) 10/07/14 15:30:43 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 8370 10/07/14 15:30:43 Waiting for /var/log/condor/.collector_address to appear. 10/07/14 15:30:43 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.12.1 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 192.168.12.1,bepp01,bepp01.bepp.rcapp.kyushu-u.ac.jp, hostname size = 2, original ip address = 192.168.12.1
    10/07/14 15:30:44 Found /var/log/condor/.collector_address.
10/07/14 15:30:44 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 8372 10/07/14 15:30:44 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 8373 10/07/14 15:30:44 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.12.1 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason 10/07/14 15:30:44 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.12.1 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: cached result for DAEMON; see first case for the full reason


CollectorLog
    10/07/14 15:45:43 Housekeeper:  Done cleaning
10/07/14 15:45:44 PERMISSION DENIED to unauthenticated user from host 192.168.12.1 for command 49 (UPDATE_NEGOTIATOR_AD), access level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for the full reason 10/07/14 15:45:45 PERMISSION DENIED to unauthenticated@unmapped from host 192.168.12.1 for command 10 (QUERY_STARTD_PVT_ADS), access level NEGOTIATOR: reason: cached result for NEGOTIATOR; see first case for the full reason 10/07/14 15:45:48 PERMISSION DENIED to unauthenticated user from host 192.168.12.1 for command 2 (UPDATE_MASTER_AD), access level ADVERTISE_MASTER: reason: cached result for ADVERTISE_MASTER; see first case for the full reason 10/07/14 15:45:51 PERMISSION DENIED to unauthenticated user from host 192.168.12.1 for command 1 (UPDATE_SCHEDD_AD), access level ADVERTISE_SCHEDD: reason: cached result for ADVERTISE_SCHEDD; see first case for the full reason 10/07/14 15:45:51 PERMISSION DENIED to unauthenticated user from host 192.168.12.1 for command 11 (UPDATE_SUBMITTOR_AD), access level ADVERTISE_SCHEDD: reason: cached result for ADVERTISE_SCHEDD; see first case for the full reason 10/07/14 15:46:06 DC_AUTHENTICATE: attempt to open invalid session bepp01:2365:1412654380:10, failing; this session was requested by <192.168.12.65:58496> with return address <192.168.12.1:37289> 10/07/14 15:46:06 attempt to connect to <192.168.12.1:37289> failed: Connection refused (connect errno = 111). 10/07/14 15:46:06 Failed to send DC_INVALIDATE_KEY to daemon at <192.168.12.1:37289>: SECMAN:2003:TCP connection to daemon at <192.168.12.1:37289> failed. 10/07/14 15:46:07 DC_AUTHENTICATE: attempt to open invalid session bepp01:2365:1412654379:5, failing; this session was requested by <192.168.12.53:54534> with return address <192.168.12.1:47275> 10/07/14 15:46:07 attempt to connect to <192.168.12.1:47275> failed: Connection refused (connect errno = 111). 10/07/14 15:46:07 Failed to send DC_INVALIDATE_KEY to daemon at <192.168.12.1:47275>: SECMAN:2003:TCP connection to daemon at <192.168.12.1:47275> failed. 10/07/14 15:46:07 DC_AUTHENTICATE: attempt to open invalid session bepp01:2365:1412654380:12, failing; this session was requested by <192.168.12.56:55342> with return address <192.168.12.1:46531> 10/07/14 15:46:07 attempt to connect to <192.168.12.1:46531> failed: Connection refused (connect errno = 111). 10/07/14 15:46:07 Failed to send DC_INVALIDATE_KEY to daemon at <192.168.12.1:46531>: SECMAN:2003:TCP connection to daemon at <192.168.12.1:46531> failed. 10/07/14 15:46:08 DC_AUTHENTICATE: attempt to open invalid session bepp01:2365:1412654380:7, failing; this session was requested by <192.168.12.55:34093> with return address <192.168.12.1:48268> 10/07/14 15:46:08 attempt to connect to <192.168.12.1:48268> failed: Connection refused (connect errno = 111). 10/07/14 15:46:08 Failed to send DC_INVALIDATE_KEY to daemon at <192.168.12.1:48268>: SECMAN:2003:TCP connection to daemon at <192.168.12.1:48268> failed. 10/07/14 15:46:09 DC_AUTHENTICATE: attempt to open invalid session bepp01:2365:1412654379:3, failing; this session was requested by <192.168.12.52:51590> with return address <192.168.12.1:44349> 10/07/14 15:46:09 attempt to connect to <192.168.12.1:44349> failed: Connection refused (connect errno = 111). 10/07/14 15:46:09 Failed to send DC_INVALIDATE_KEY to daemon at <192.168.12.1:44349>: SECMAN:2003:TCP connection to daemon at <192.168.12.1:44349> failed.


NegotiatorLog
    10/07/14 15:40:45 ---------- Started Negotiation Cycle ----------
    10/07/14 15:40:45 Phase 1:  Obtaining ads from collector ...
    10/07/14 15:40:45   Getting startd private ads ...
    10/07/14 15:40:45 Couldn't fetch ads: communication error
    10/07/14 15:40:45 Aborting negotiation cycle

SchedLog
10/07/14 15:30:43 (pid:8272) **** condor_schedd (condor_SCHEDD) pid 8272 EXITING WITH STATUS 0
    10/07/14 15:30:44 (pid:8373) Setting maximum file descriptors to 4096.
10/07/14 15:30:44 (pid:8373) ****************************************************** 10/07/14 15:30:44 (pid:8373) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
    10/07/14 15:30:44 (pid:8373) ** /usr/sbin/condor_schedd
10/07/14 15:30:44 (pid:8373) ** SubsystemInfo: name=SCHEDD type=SCHEDD(5) class=DAEMON(1) 10/07/14 15:30:44 (pid:8373) ** Configuration: subsystem:SCHEDD local:<NONE> class:DAEMON 10/07/14 15:30:44 (pid:8373) ** $CondorVersion: 8.2.3 Sep 30 2014 BuildID: 274619 $
    10/07/14 15:30:44 (pid:8373) ** $CondorPlatform: x86_64_RedHat6 $
    10/07/14 15:30:44 (pid:8373) ** PID = 8373
    10/07/14 15:30:44 (pid:8373) ** Log last touched 10/7 15:30:43
10/07/14 15:30:44 (pid:8373) ****************************************************** 10/07/14 15:30:44 (pid:8373) Using config source: /etc/condor/condor_config
    10/07/14 15:30:44 (pid:8373) Using local config sources:
    10/07/14 15:30:44 (pid:8373) /etc/condor/config.d/condor_config.local
    10/07/14 15:30:44 (pid:8373) /etc/condor/config.d/condor_config.local
10/07/14 15:30:44 (pid:8373) config Macros = 66, Sorted = 66, StringBytes = 2068, TablesBytes = 2432
    10/07/14 15:30:44 (pid:8373) CLASSAD_CACHING is ENABLED
    10/07/14 15:30:44 (pid:8373) Daemon Log is logging: D_ALWAYS D_ERROR
10/07/14 15:30:44 (pid:8373) DaemonCore: command socket at <192.168.12.1:54168> 10/07/14 15:30:44 (pid:8373) DaemonCore: private command socket at <192.168.12.1:54168>
    10/07/14 15:30:44 (pid:8373) History file rotation is enabled.
10/07/14 15:30:44 (pid:8373) Maximum history file size is: 20971520 bytes
    10/07/14 15:30:44 (pid:8373)   Number of rotated history files is: 2
10/07/14 15:30:49 (pid:8373) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 10/07/14 15:30:49 (pid:8373) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 10/07/14 15:30:49 (pid:8373) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 10/07/14 15:30:49 (pid:8373) Sent ad to central manager for hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10/07/14 15:30:49 (pid:8373) Sent ad to 1 collectors for hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10/07/14 15:35:50 (pid:8373) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 10/07/14 15:35:50 (pid:8373) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 10/07/14 15:35:50 (pid:8373) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 10/07/14 15:35:50 (pid:8373) Sent ad to central manager for hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10/07/14 15:35:50 (pid:8373) Sent ad to 1 collectors for hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10/07/14 15:40:51 (pid:8373) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s 10/07/14 15:40:51 (pid:8373) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 10/07/14 15:40:51 (pid:8373) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load 10/07/14 15:40:51 (pid:8373) Sent ad to central manager for hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 10/07/14 15:40:51 (pid:8373) Sent ad to 1 collectors for hyamaguc@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Hiroshi Yamaguchi

######################################################################
##
##  condor_config
##
##  This is the global configuration file for condor. This is where
##  you define where the local config file is. Any settings
##  made here may potentially be overridden in the local configuration
##  file.  KEEP THAT IN MIND!  To double-check that a variable is
##  getting set from the configuration file that you expect, use
##  condor_config_val -v <variable name>
##
##  condor_config.annotated is a more detailed sample config file
##
##  Unless otherwise specified, settings that are commented out show
##  the defaults that are used if you don't define a value.  Settings
##  that are defined here MUST BE DEFINED since they have no default
##  value.
##
######################################################################

CONDOR_ADMIN  = root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

#NETWORK_INTERFACE = 192.168.12.1

SEC_DEFAULT_AUTHENTICATION = NEVER
SEC_DEFAULT_NEGOTIATION    = NEVER

CONDOR_HOST   = bepp01.bepp.rcapp.kyushu-u.ac.jp
FULL_HOSTNAME = bepp01.bepp.rcapp.kyushu-u.ac.jp

RELEASE_DIR = /usr

LOCAL_DIR = /var

LOCAL_CONFIG_FILE = /etc/condor/config.d/condor_config.local
#REQUIRE_LOCAL_CONFIG_FILE = true

LOCAL_CONFIG_DIR = /etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

use SECURITY : HOST_BASED

ALLOW_READ              = bepp01.bepp.rcapp.kyushu-u.ac.jp
ALLOW_WRITE             = hkt*.bepp.rcapp.kyushu-u.ac.jp
FLOCK_FROM              =
FLOCK_TO                =
#ALLOW_ADMINISTRATOR     = $(CONDOR_HOST)
#ALLOW_NEGOTIATOR        = $(CONDOR_HOST), $(IP_ADDRESS)
#ALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(IP_ADDRESS)
#ALLOW_WRITE_COLLECTOR   = $(ALLOW_WRITE) 
#ALLOW_WRITE_STARTD      = $(ALLOW_WRITE)
#ALLOW_READ_COLLECTOR    = $(ALLOW_READ)
#ALLOW_READ_STARTD       = $(ALLOW_READ)
#HOSTALLOW_READ          = $(ALLOW_READ)
#HOSTALLOW_WRITE         = $(ALLOW_WRITE)
ALLOW_DAEMON            = condor_pool@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/*, condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/$(IP_ADDRESS)
ALLOW_NEGOTIATOR        = condor_pool@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/$(CONDOR_HOST)


##--------------------------------------------------------------------
## Values set by the rpm patch script:
##--------------------------------------------------------------------

## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password

##  Pathnames
RUN     = $(LOCAL_DIR)/run/condor
LOG     = $(LOCAL_DIR)/log/condor
LOCK    = $(LOCAL_DIR)/lock/condor
SPOOL   = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BIN     = $(RELEASE_DIR)/bin
LIB     = $(RELEASE_DIR)/lib64/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN    = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec/condor
SHARE   = $(RELEASE_DIR)/share/condor

PROCD_ADDRESS = $(RUN)/procd_pipe
NETWORK_INTERFACE = 192.168.12.1
CONDOR_HOST = $(FULL_HOSTNAME)

COLLECTOR_NAME = Personal Condor at $(FULL_HOSTNAME)

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD