[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] jobs stuck in queue



Em segunda-feira 22 agosto 2011, às 15:26:55, David J. Herzfeld escreveu:
> Hello:
> 
> On Mon, 2011-08-22 at 15:07 -0300, Fabricio Cannini wrote:
> > > > Any tips to what may (not) be going on are very, very, veeeeery
> > > > welcome.
> > > 
> > > It doesn't look like you defined DedicatedScheduler on your execute
> > > nodes. Likely needs to look like:
> > > 
> > > DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxx"
> > > 
> > > Without this attribute, your scheduler will not match parallel jobs
> > > with dedicated execute nodes.
> > > 
> > > Take a look at
> > > http://www.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#SECTION00
> > > 4131 0100000000000000 for more information.
> > > 
> > > Best of luck,
> > > DJH
> > 
> > Hi.
> > 
> > I've tried that, but unfortunately it didn't solve. Worse, now i can't
> > see the pool!
> 
> Well, you are going to have to define DedicatedScheduler on your execute
> nodes in order to match in the parallel universe (there's no way around
> it that I know of).
> 
> As for the pool problems, I would start with your security settings.
> While I would never recommend setting security wide open on production
> systems, but until you get everything up and running I would set
> ALLOW_READ = *
> ALLOW_WRITE = *
> and don't change any of the values of ALLOW_NEGOTIATOR, ALLOW_DAEMON,
> etc. from the values in the standard UW config. You can begin to scale
> back access to these subsystems once things work appropriately (also
> making sure that you have authentication turned on).

I was avoiding it, but well, here we go.

> Also - is it possible that the dedicated scheduler machine has two
> network interfaces? You can use condor_master -schedd to confirm that
> the hostname used in your ALLOW_ and DedicatedScheduler configuration
> settings is appropriate (if the hostname is incorrect, specify
> NETWORK_INTERFACE). More information can be found in the manual:
> http://www.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#SECTION0043
> 6000000000000000

Both master and ndes has 2 network interfaces, but only one working. Is it 
needed to define NETWORK_INTERFACE ?

> Best of luck,
> DJH
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/