[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] offline compute nodes and Rooster



> From: Paul Haldane
> Sent: 17 October 2010 16:49
> 
> > From: Paul Haldane
> > Sent: 16 October 2010 14:24
> >
> > > > > 3.  Offline slots _should_ (I think they should, but would like
> > > > > confirmation) continue to appear in the output of condor_status (using
> > > > > -constraint Offline to just see offline slots).  In our environment
> > > > > they only appear for 10/20 minutes after powering off.  This isn't what
> > > > > I expect because OFFLINE_EXPIRE_ADS_AFTER defaults to maxint.
> > > >
> > > > Yes, the offline ads should remain visible in condor_status.  They
> > > > should not expire in 30 minutes if you are using the default
> > > > OFFLINE_EXPIRE_ADS_AFTER.
> >
> > I've just been able to grab (using condor_status -l
> > yard10.campus.ncl.ac.uk) the ADS for a machine that's unpingable (so it
> > is hibernating) but still visible in condor_status output.
> >
> > I won't include all 109 lines of output here (unless that would be
> > useful - full version is at
> > http://www.staff.ncl.ac.uk/paul.haldane/yard10.txt).  All looks
> > plausible to me apart from
> >
> > Offline = ((CurrentTime - EnteredCurrentState) >= 60 &&
> >         MachineLastMatchTime =?= UNDEFINED && State =?= "Unclaimed")
> >
> > Is that correct or should it just be a simple Boolean value?
> >
> > I know why it's showing that value ("Offline = $(ShouldHibernate)" in
> > the config file on the compute nodes) but perfectly willing to believe
> > that it's rubbish.
> 
> I've made progress on a couple of fronts.
> 
> 1. Realised that we'd changed ROOSTER_UNHIBERNATE to a daft setting.
> 
> We had
> 
>  ROOSTER_UNHIBERNATE = Unhibernate && Offline =?= False
> 
> ... which I don't think would ever match.  Changing it to the default value of
> 
>  ROOSTER_UNHIBERNATE = Unhibernate && Offline == True
> 
> ... worked better but because I don't think we're setting Unhibernate properly
> yet I've currently got
> 
>  ROOSTER_UNHIBERNATE = Offline == True

May as well point out myself that that's a really dumb idea.  Leads to Rooster waking up any offline machines even when they're not needed to service jobs.

Paul