[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] memory leak in Condor 7.4.2 schedd ???



> Is that virtual, resident or private memory usage?
> 
> Output of,
> 
>   top -n1 -b -p $(pidof condor_schedd)

I think some of the options may be different under Solaris but
from what I can see most of it is memory resident

$ top

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 15373 root       1  59    0   31M   27M sleep    0:07  0.06% condor_schedd

(I've since restarted the schedd but this still looks a bit excessive).

>   pmap -d $(pidof condor_schedd)

This is the output of pmap  (we actually run two condor instances separately  - one is for Condor-G).
It looks to me that there is ~ 20 MB of dynamically allocated storage used by the schedd which
makes up most of the memory footprint.

00010000    6744K r-x--  /opt1/condor_7.4.2/sbin/condor_schedd
006B4000     544K rwx--  /opt1/condor_7.4.2/sbin/condor_schedd
0073C000     784K rwx--    [ heap ]
00800000   20480K rwx--    [ heap ]
FEF00000     608K r-x--  /lib/libm.so.2
FEFA6000      24K rwx--  /lib/libm.so.2
FF000000    1216K r-x--  /lib/libc.so.1
FF130000      40K rwx--  /lib/libc.so.1
FF13A000       8K rwx--  /lib/libc.so.1
FF160000      64K rwx--    [ anon ]
FF180000     584K r-x--  /lib/libnsl.so.1
FF222000      40K rwx--  /lib/libnsl.so.1
FF22C000      24K rwx--  /lib/libnsl.so.1
FF240000      64K rwx--    [ anon ]
FF260000      64K rwx--    [ anon ]
FF280000      16K r-x--  /lib/libm.so.1
FF292000       8K rwx--  /lib/libm.so.1
FF2A0000     240K r-x--  /lib/libresolv.so.2
FF2E0000      24K rwx--    [ anon ]
FF2EC000      16K rwx--  /lib/libresolv.so.2
FF300000      48K r-x--  /lib/libsocket.so.1
FF310000       8K rwx--    [ anon ]
FF31C000       8K rwx--  /lib/libsocket.so.1
FF320000     128K r-x--  /lib/libelf.so.1
FF340000       8K rwx--  /lib/libelf.so.1
FF350000       8K rwx--    [ anon ]
FF360000       8K r-x--  /lib/libkstat.so.1
FF372000       8K rwx--  /lib/libkstat.so.1
FF380000       8K r-x--  /lib/libdl.so.1
FF392000       8K rwx--  /lib/libdl.so.1
FF3A0000       8K r-x--  /platform/sun4u-us3/lib/libc_psr.so.1
FF3A4000       8K rwxs-    [ anon ]
FF3B0000     208K r-x--  /lib/ld.so.1
FF3F0000       8K r--s-  dev:32,12 ino:70306
FF3F4000       8K rwx--  /lib/ld.so.1
FF3F6000       8K rwx--  /lib/ld.so.1
FFBEC000      80K rwx--    [ stack ]
 total     32160K


> ?
> 
> FYI, Condor uses string interning to minimize the memory footprint of
> jobs (all classads actually), but, iirc, does not always garbage collect
> the string pool. If you have a lot of jobs passing through your Schedd,
> say with large unique Environments, you could certainly see memory usage
> increase. Then of course there could just be a memory leak.

All of the jobs are separate clusters but with the same requirements and
as I said the clustering does seem to be working fine. AFAIK everything
was OK with Condor 7.4.0 and this only surfaced when I moved to 7.4.2 to get rid of the 
" long message still waiting to be closed" problem when restarting the 
daemons. Incidently I get the same thing with a pre-release 7.4.3 from Dan.

thanks for speedy reply,

regards,

-ian.