[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] job take 20min to start



How long does it normally take jobs to start running?

I have reinstalled condor using the ubuntu deb package and things seem to be working better but it still takes 20 minutes for my sleep.sub example job to start. I have just the basic condor_config:

lgramling@osboxes:condor$ cat /etc/condor/condor_config
######################################################################
##
##Â condor_config
##
##Â This is the global configuration file for condor. This is where
##Â you define where the local config file is. Any settings
##Â made here may potentially be overridden in the local configuration
## file. KEEP THAT IN MIND! To double-check that a variable is
##Â getting set from the configuration file that you expect, use
##Â condor_config_val -v <variable name>
##
##Â condor_config.annotated is a more detailed sample config file
##
##Â Unless otherwise specified, settings that are commented out show
## the defaults that are used if you don't define a value. Settings
##Â that are defined here MUST BE DEFINED since they have no default
##Â value.
##
######################################################################

##Â Where have you installed the bin, sbin and lib condor directories?ÂÂ
RELEASE_DIR = /usr

## Where is the local condor directory for each host? This is where the local config file(s), logs and
##Â spool/execute directories are located. this is the default for Linux and Unix systems.
LOCAL_DIR = /var

##Â Where is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /etc/condor/condor_config.local
##Â If your configuration is on a shared file system, then this might be a better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
##Â If the local config file is not present, is it an error? (WARNING: This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = false

##Â The normal way to do configuration with RPMs is to read all of the
##Â files in a given directory that don't match a regex as configuration files.
##Â Config files are read in lexicographic order.
LOCAL_CONFIG_DIR = /etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

##Â Use a host-based security policy. By default CONDOR_HOST and the local machine will be allowed
use SECURITY : HOST_BASED
##Â To expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts
#ALLOW_WRITE = *.cs.wisc.edu
##Â FLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
##Â FLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO = condor.cs.wisc.edu, cm.example.edu

##--------------------------------------------------------------------
## Values set by the debian patch script:
##--------------------------------------------------------------------

## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password

##Â Pathnames
RUNÂÂÂÂ = $(LOCAL_DIR)/run/condor
LOGÂÂÂÂ = $(LOCAL_DIR)/log/condor
LOCKÂÂÂ = $(LOCAL_DIR)/lock/condor
SPOOLÂÂ = $(LOCAL_DIR)/lib/condor/spool
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
BINÂÂÂÂ = $(RELEASE_DIR)/bin
LIBÂÂÂÂ = $(RELEASE_DIR)/lib/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBINÂÂÂ = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/lib/condor/libexec
SHAREÂÂ = $(RELEASE_DIR)/share/condor

PROCD_ADDRESS = $(RUN)/procd_pipe

##Â What machine is your central manager?

CONDOR_HOST = $(FULL_HOSTNAME)

##Â This macro determines what daemons the condor_master will start and keep its watchful eyes on.
##Â The list is a comma or space separated list of subsystem names

DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD, STARTD


###########


lgramling@osboxes:condor$ condor_config_val -config
Configuration source:
ÂÂÂÂÂÂÂ /etc/condor/condor_config
Local configuration sources:
ÂÂÂÂÂÂÂ /home/lgramling/.condor/user_config

lgramling@osboxes:condor$ cat /home/lgramling/.condor/user_config
CONDOR_HOSTÂÂ = osboxes

#####################################################################
##Â This is a Configuration that will cause your Condor jobs to
## always run. This is intended for testing only.
######################################################################

##Â This mode will cause your jobs to start on a machine an will let
## them run to completion. Condor will ignore all of what is going
##Â on in the machine (load average, keyboard activity, etc.)

TESTINGMODE_WANT_SUSPENDÂÂÂÂÂÂÂ = False
TESTINGMODE_WANT_VACATEÂÂÂÂÂÂÂÂ = False
TESTINGMODE_STARTÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = True
TESTINGMODE_SUSPENDÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = False
TESTINGMODE_CONTINUEÂÂÂÂÂÂÂÂÂÂÂ = True
TESTINGMODE_PREEMPTÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = False
TESTINGMODE_KILLÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ = False
TESTINGMODE_PERIODIC_CHECKPOINT = False
TESTINGMODE_PREEMPTION_REQUIREMENTS = False
TESTINGMODE_PREEMPTION_RANK = 0
START=True
RANK=0



02/18/16 16:25:01 (pid:3413) Number of Active Workers 0
02/18/16 16:25:56 (pid:3413) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/18/16 16:25:56 (pid:3413) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:25:56 (pid:3413) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:25:56 (pid:3413) Sent ad to central manager for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:25:56 (pid:3413) Sent ad to 1 collectors for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:25:56 (pid:3413) Can't find address for negotiator
02/18/16 16:25:56 (pid:3413) Failed to send RESCHEDULE to unknown daemon:
02/18/16 16:25:56 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:25:56 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:25:56 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:25:56 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:25:56 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:25:56 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:26:08 (pid:3413) Number of Active Workers 0
02/18/16 16:30:56 (pid:3413) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/18/16 16:30:56 (pid:3413) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:30:56 (pid:3413) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:30:56 (pid:3413) Sent ad to central manager for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:30:56 (pid:3413) Sent ad to 1 collectors for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:30:56 (pid:3413) Can't find address for negotiator
02/18/16 16:30:56 (pid:3413) Failed to send RESCHEDULE to unknown daemon:
02/18/16 16:30:56 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:30:56 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:30:56 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:30:56 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:30:56 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:30:56 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:35:56 (pid:3413) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/18/16 16:35:56 (pid:3413) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:35:56 (pid:3413) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:35:56 (pid:3413) Sent ad to central manager for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:35:56 (pid:3413) Sent ad to 1 collectors for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:35:56 (pid:3413) Can't find address for negotiator
02/18/16 16:35:56 (pid:3413) Failed to send RESCHEDULE to unknown daemon:
02/18/16 16:35:56 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:35:56 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:35:56 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:35:56 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:35:56 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:35:56 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:36:55 (pid:3413) Number of Active Workers 0
02/18/16 16:37:01 (pid:3413) Number of Active Workers 0
02/18/16 16:40:57 (pid:3413) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/18/16 16:40:57 (pid:3413) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:40:57 (pid:3413) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:40:57 (pid:3413) Sent ad to central manager for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:40:57 (pid:3413) Sent ad to 1 collectors for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:40:57 (pid:3413) Can't find address for negotiator
02/18/16 16:40:57 (pid:3413) Failed to send RESCHEDULE to unknown daemon:
02/18/16 16:40:57 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:40:57 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:40:57 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:40:57 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:40:57 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:40:57 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:44:17 (pid:3413) Number of Active Workers 0
02/18/16 16:44:43 (pid:3413) Number of Active Workers 0
02/18/16 16:45:58 (pid:3413) TransferQueueManager stats: active up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
02/18/16 16:45:58 (pid:3413) TransferQueueManager upload 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:45:58 (pid:3413) TransferQueueManager download 1m I/O load: 0 bytes/s 0.000 disk load 0.000 net load
02/18/16 16:45:58 (pid:3413) Sent ad to central manager for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:45:58 (pid:3413) Sent ad to 1 collectors for lgramling@xxxxxxxxxxxxxxxxxxx
02/18/16 16:45:58 (pid:3413) Haven't heard from negotiator, trying to claim local startd @ <10.0.0.182:27456?addrs=10.0.0.182-27456>
02/18/16 16:45:58 (pid:3413) Checking consistency running and runnable jobs
02/18/16 16:45:58 (pid:3413) Tables are consistent
02/18/16 16:45:58 (pid:3413) Rebuilt prioritized runnable job list in 0.000s.
02/18/16 16:45:58 (pid:3413) Claiming local startd slot 2 at <10.0.0.182:27456?addrs=10.0.0.182-27456>
02/18/16 16:45:58 (pid:3413) Checking consistency running and runnable jobs
02/18/16 16:45:58 (pid:3413) Tables are consistent
02/18/16 16:45:58 (pid:3413) Rebuilt prioritized runnable job list in 0.000s. (Expedited rebuild because no match was found)
02/18/16 16:45:58 (pid:3413) Negotiator gone, trying to use our local startd
02/18/16 16:45:58 (pid:3413) Can't find address for negotiator
02/18/16 16:45:58 (pid:3413) Failed to send RESCHEDULE to unknown daemon:
02/18/16 16:45:58 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:45:58 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:45:58 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:45:58 (pid:3413) SECMAN: FAILED: Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:45:58 (pid:3413) ERROR: SECMAN:2010:Received "DENIED" from server for user unauthenticated@unmapped using method (no authentication).
02/18/16 16:45:58 (pid:3413) Failed to start non-blocking update to <127.0.1.1:9618>.
02/18/16 16:45:58 (pid:3413) Starting add_shadow_birthdate(2.0)
02/18/16 16:45:58 (pid:3413) Started shadow for job 2.0 on slot2@xxxxxxxxxxxxxxxxxxx <10.0.0.182:27456?addrs=10.0.0.182-27456> for lgramling, (shadow pid = 3776)
02/18/16 16:46:06 (pid:3413) Shadow pid 3776 for job 2.0 reports job exit reason 100.
02/18/16 16:46:06 (pid:3413) Checking consistency running and runnable jobs
02/18/16 16:46:06 (pid:3413) Tables are consistent
02/18/16 16:46:06 (pid:3413) Rebuilt prioritized runnable job list in 0.000s.
02/18/16 16:46:06 (pid:3413) match (slot2@xxxxxxxxxxxxxxxxxxx <10.0.0.182:27456?addrs=10.0.0.182-27456> for lgramling) out of jobs; relinquishing
02/18/16 16:46:06 (pid:3413) Match record (slot2@xxxxxxxxxxxxxxxxxxx <10.0.0.182:27456?addrs=10.0.0.182-27456> for lgramling, 2.0) deleted
02/18/16 16:46:06 (pid:3413) Completed RELEASE_CLAIM to startd slot2@xxxxxxxxxxxxxxxxxxx <10.0.0.182:27456?addrs=10.0.0.182-27456> for lgramling
02/18/16 16:47:33 (pid:3413) Number of Active Workers 0
02/18/16 16:49:41 (pid:3413) Received a superuser command
lgramling@osboxes:condor$ condor_history
ÂIDÂÂÂÂ OWNERÂÂÂÂÂÂÂÂÂ SUBMITTEDÂÂ RUN_TIMEÂÂÂÂ ST COMPLETEDÂÂ CMDÂÂÂÂÂÂÂÂÂÂÂ
ÂÂ 2.0ÂÂ lgramlingÂÂÂÂÂÂ 2/18 16:25ÂÂ 0+00:00:08 CÂÂ 2/18 16:46 /home/lgramling/condor/sleep.sh