Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs stuck in queue

Date: Mon, 22 Aug 2011 14:26:55 -0400
From: "David J. Herzfeld" <herzfeldd@xxxxxxxxx>
Subject: Re: [Condor-users] jobs stuck in queue

Hello:

On Mon, 2011-08-22 at 15:07 -0300, Fabricio Cannini wrote:
> > > Any tips to what may (not) be going on are very, very, veeeeery welcome.
> > 
> > It doesn't look like you defined DedicatedScheduler on your execute
> > nodes. Likely needs to look like:
> > 
> > DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxx"
> > 
> > Without this attribute, your scheduler will not match parallel jobs with
> > dedicated execute nodes.
> > 
> > Take a look at
> > http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#SECTION004131
> > 0100000000000000 for more information.
> > 
> > Best of luck,
> > DJH
> 
> Hi.
> 
> I've tried that, but unfortunately it didn't solve. Worse, now i can't see the 
> pool!

Well, you are going to have to define DedicatedScheduler on your execute
nodes in order to match in the parallel universe (there's no way around
it that I know of).

As for the pool problems, I would start with your security settings.
While I would never recommend setting security wide open on production
systems, but until you get everything up and running I would set
ALLOW_READ = *
ALLOW_WRITE = *
and don't change any of the values of ALLOW_NEGOTIATOR, ALLOW_DAEMON,
etc. from the values in the standard UW config. You can begin to scale
back access to these subsystems once things work appropriately (also
making sure that you have authentication turned on).

Also - is it possible that the dedicated scheduler machine has two
network interfaces? You can use condor_master -schedd to confirm that
the hostname used in your ALLOW_ and DedicatedScheduler configuration
settings is appropriate (if the hostname is incorrect, specify
NETWORK_INTERFACE). More information can be found in the manual:
http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#SECTION00436000000000000000

Best of luck,
DJH

Follow-Ups:
- Re: [Condor-users] jobs stuck in queue
  - From: Fabricio Cannini

References:
- [Condor-users] jobs stuck in queue
  - From: Fabricio Cannini
- Re: [Condor-users] jobs stuck in queue
  - From: David J. Herzfeld
- Re: [Condor-users] jobs stuck in queue
  - From: Fabricio Cannini

Prev by Date: Re: [Condor-users] jobs stuck in queue
Next by Date: Re: [Condor-users] Problem with multiple machines in a Windows pool
Previous by thread: Re: [Condor-users] jobs stuck in queue
Next by thread: Re: [Condor-users] jobs stuck in queue
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] jobs stuck in queue