[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] memory leak in Condor 7.4.2 schedd ???



Smith, Ian wrote:
Dear All,

I've recently moved to Condor 7.4.2 on our central manager/submit host running
Solaris 10 and found that the schedd seems to be taking a worrying amount
of memory. For instance, at present there are only ~ 150 jobs in the queue and
the schedd is taking over 900 MB. The documentation seems to suggest that it
should only be using around 10 kB per job !! Since this has been rising montonically
seemingly since I restarted the daemons just a few days ago I can only
assume that this is down to a leak.

The net result of this is that condor_q etc can be very slow to respond (more than
five minutes on occasion) and it is difficult to submit more than ~ 1000 jobs
at once whereas before there was no problem with 10 000 jobs. As far as I can
see the auto-clustering is working fine although I sometimes see in the schedd log
messages about rebuilding tables ??

Anyone else seem this on other systems ?

Any suggestions for a fix/workaround ?

regards,

-ian.

--------------------------------------------
Dr Ian C. Smith,
Advanced Research Computing (e-Science) Team,
The University of Liverpool,
Computing Services Department.

Is that virtual, resident or private memory usage?

Output of,

 top -n1 -b -p $(pidof condor_schedd)
 pmap -d $(pidof condor_schedd)

?

FYI, Condor uses string interning to minimize the memory footprint of jobs (all classads actually), but, iirc, does not always garbage collect the string pool. If you have a lot of jobs passing through your Schedd, say with large unique Environments, you could certainly see memory usage increase. Then of course there could just be a memory leak.

Best,


matt