[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs stuck in queue



Hi:

On Fri, 2011-08-19 at 17:36 -0300, Fabricio Cannini wrote:

> *nodes:*
> CONDOR_HOST = master
> UID_DOMAIN = internal.domain
> FILESYSTEM_DOMAIN = internal.domain
> SEC_DEFAULT_NEGOTIATION = OPTIONAL
> ALLOW_READ = $(CONDOR_HOST),172.17.8.*
> ALLOW_WRITE = $(CONDOR_HOST),172.17.8.*
> ALLOW_NEGOTIATOR = $(CONDOR_HOST)
> ALLOW_CONFIG = $(CONDOR_HOST),$(FULL_HOSTNAME)
> ENABLE_RUNTIME_CONFIG = True
> ENABLE_PERSISTENT_CONFIG = True
> PERSISTENT_CONFIG_DIR = /etc/condor/config.d
> SETTABLE_ATTRS_CONFIG = *
> USE_NFS         = True
> DEFAULT_DOMAIN_NAME = internal.domain
> ALLOW_DAEMON = *@$(CONDOR_HOST)
> SOFT_UID_DOMAIN = TRUE
> START = TRUE
> TRUST_UID_DOMAIN = TRUE
> STARTD_EXPRS=$(STARTD_EXPRS), DedicatedScheduler, ParallelSchedulingGroup
> SCHEDD_NAME = $(CONDOR_HOST)

> Any tips to what may (not) be going on are very, very, veeeeery welcome.

It doesn't look like you defined DedicatedScheduler on your execute
nodes. Likely needs to look like:

DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxx"

Without this attribute, your scheduler will not match parallel jobs with
dedicated execute nodes.

Take a look at
http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#SECTION0041310100000000000000
for more information.

Best of luck,
DJH