[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_status taking ages to report



On Wed, 23 Mar 2005 15:44:43 +0000, Dr Ian C. Smith
<i.c.smith@xxxxxxxxxxxxxxx> wrote:
> I think I've finally got to the root of this. The condor view server
> was rebooted but the condor daemons didn't come up on it. The collector
> on the manager was so busy trying to contact the (now defunct) view
> server that nothing else got a look in. I'm not sure why it just didn't
> give up as the condor stats are hardly mission-critical.

the old three finger salute still works :)
 
> I'm still puzzled as to why the collector is taking up so much memory
> ( getting on for 500 MB ). I've restarted the daemons, rebooted the
> machine but no change. How does this scale with the number of startds
> in the pool ? At present we have ~ 100 but this is small compared to
> some sites. If we run out of real memory and are into swap presumably it's
> going to crawl along.

Take a look at a cross selection of the ads in the collector for the
startds with condor_status -l vm1@xxxxxxx
(ensure you include some which have an active claim)
are you exposing a job attribute which can be extremely large perhaps.

Note that the collector also holds an add for each schedd and master.
do you have any unecessary schedds cluttering things up (and possibly
slowing down negotiation albeit not by much since iit will be passed
over pretty quickly)

Matt