[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory leak in collectd in 8.8.5?


(all the e-mail, this time).

Sorry about the slow reply, last week was school holidays and I was off looking after the kids.

I'll send a link you can get the configs and logs from directly to you rather than clog up the list (and in case there's anything sensitive in there).

      	How many ads are in the collector?  If you pick a few at random,
    how long is their long (-l) form?

Is there a simple way to get that number:

# condor_status -collector -l | grep Ads
MachineAds = 1121
MachineAdsPeak = 1162
SubmitterAds = 42
SubmitterAdsPeak = 43

There are currently about 5000 jobs in the queue most are from ArcCEs so have a fairly standard classad of 120 lines when queued and 150 lines running

condor_status -l <worker_node> returns 700 to 2500 lines depending on the number of slots on the machine and the job mix.

The config is a little overcomplicated at the moment as I'm in the middle of trying to simplify things with stuff I learned at the HTCondor Workshop. We recently enabled SSL authentication for the Daemons but haven't got around to disabling pool password and we're switching from hugely nested ifThenElse statements to Schedd transforms to set the AccountingGroup. I can try backing out of those changes if they're likely to be an issue but I think the first incident happened after I upgraded to 8.8.5 but before I made any other changes.

Let me know if you want any more info or logs.


ïOn 28/10/2019, 18:53, "HTCondor-users on behalf of Todd L Miller" <htcondor-users-bounces@xxxxxxxxxxx on behalf of tlmiller@xxxxxxxxxxx> wrote:

    > Iâve doubled the memory (now at 8GB) on the Collector/Negotiator VM 
    > twice in case it just needed more space to work in
     	How many ads are in the collector?  If you pick a few at random, 
    how long is their long (-l) form?
    > Any ideas what in my config could be causing this or additional 
    > diagnostics you want?
     	Not off the top of my head.  As usual, go ahead and send your 
    configuration and logs, and I'll take a look.
    - ToddM