[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] autoclusterattrs (fwd)



On Tue, Mar 06, 2007 at 01:19:54PM -0600, Steven Timm wrote:
> > Steve -
> >
> > I don't think autoclusters are the problem.  We've recently discovered an issue with Condor-G matching only once per resource ad is an issue with the matchlist caching (which is yet another mechanism that speeds up the matchmaking process in the "regular" case).  It is on our list to fix this with Condor-G matching.  Discovery of this bug is credited to our friends in Italy at INFN working on egee (thanks Francesco!).  For now, you can workaround the problem by putting the following into your negotiator's condor_config :
> >   NEGOTIATOR_MATCHLIST_CACHING = False
> >  (normally it defaults to true)
> > Then restart the negotiator.  Hopefully this will restore the ability to match a given resource multiple times per cycle (assuming you've set things up this way) until we get around to fixing the issue in the caching code.
> >
> > hope this helps
> > Todd
> 
> I have made the change to turn off NEGOTIATOR_MATCHLIST_CACHING.
> and the match rate has gone up.
> But I asked the larger question about autoclustering because
> I observed two condor pools, one in which the autoclusterattr
> of all the outbound grid jobs were the same, and
> one where all the outbound grid jobs had no autoclusterattr
> field at all.
> 
> They were both running condor 6.8.3 andd I didn't make
> any changes to autoclusterattrs on either machines.
> 

Were you looking at one of the pools with Quill? Some of the autocluster
attribute of a job exist only in the schedd's memory, and autoclusterattr
may be one of them (that sentence is a bit confusing, because I'm using
"autocluster" and "attributes" twice - the job classad has attributes,
and one of the attributes is "autoclusterattrs", which is a list of
attributes used in autoclustering)

Those attributes are not stored on disk, since they're just internal to
the schedd and recomputed as needed. Since they're not stored on disk,
they never appear in job_queue.log, so Quill never reads them.

You could confirm this with a 
condor_q -l cluster.proc -direct schedd |sort >job.schedd
condor_q -l cluster.proc -direct rdbms |sort >job.rdbms
comm -3 job.schedd job.rdbms