[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] memory leak in Condor 7.4.2 schedd ???



Ian,

We might be able to tell where the problem is by looking at a core file from the bloated schedd process. One way to generate one is this:

gdb -p <PID of schedd>
(gdb) gcore
(gdb) quit

It will write the core file into your current working directory, so make sure there is enough space. Also, it will take some time (minute or two, I imagine), during which the schedd will be unresponsive.

--Dan

Smith, Ian wrote:
Apologies for the rather long running thread but I've just now seen a
repeat of the excessive schedd memory usage described earlier.

Running top

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND

 17281 root       1  59    0 1043M 1038M sleep   55:11  0.50% condor_schedd

and pmap:

17281:  condor_schedd -f
00010000    6752K r-x--  /opt1/condor_7.4.3/sbin/condor_schedd
006B6000     536K rwx--  /opt1/condor_7.4.3/sbin/condor_schedd
0073C000     784K rwx--    [ heap ]
00800000 1056768K rwx--    [ heap ]
FEF00000     608K r-x--  /lib/libm.so.2
FEFA6000      24K rwx--  /lib/libm.so.2
FF000000    1216K r-x--  /lib/libc.so.1
FF130000      40K rwx--  /lib/libc.so.1
FF13A000       8K rwx--  /lib/libc.so.1
FF160000      64K rwx--    [ anon ]
FF180000     584K r-x--  /lib/libnsl.so.1
FF222000      40K rwx--  /lib/libnsl.so.1
FF22C000      24K rwx--  /lib/libnsl.so.1
FF240000      64K rwx--    [ anon ]
FF260000      64K rwx--    [ anon ]
FF280000      16K r-x--  /lib/libm.so.1
FF292000       8K rwx--  /lib/libm.so.1
FF2A0000     240K r-x--  /lib/libresolv.so.2
FF2E0000      24K rwx--    [ anon ]
FF2EC000      16K rwx--  /lib/libresolv.so.2
FF300000      48K r-x--  /lib/libsocket.so.1
FF310000       8K rwx--    [ anon ]
FF31C000       8K rwx--  /lib/libsocket.so.1
FF320000     128K r-x--  /lib/libelf.so.1
FF340000       8K rwx--  /lib/libelf.so.1
FF350000       8K rwx--    [ anon ]
FF360000       8K r-x--  /lib/libkstat.so.1
FF372000       8K rwx--  /lib/libkstat.so.1
FF380000       8K r-x--  /lib/libdl.so.1
FF38E000       8K rwxs-    [ anon ]
FF392000       8K rwx--  /lib/libdl.so.1
FF3A0000       8K r-x--  /platform/sun4u-us3/lib/libc_psr.so.1
FF3B0000     208K r-x--  /lib/ld.so.1
FF3F0000       8K r--s-  dev:32,12 ino:70306
FF3F4000       8K rwx--  /lib/ld.so.1
FF3F6000       8K rwx--  /lib/ld.so.1
FFBEC000      80K rwx--    [ stack ]
 total   1068448K

So it does look to me that around 1 GB of heap is allocated to the schedd.
Currently I have 889 jobs in total, 450 idle and 439 running which seems
pretty modest.

regards,

-ian.

Is that virtual, resident or private memory usage?

Output of,

  top -n1 -b -p $(pidof condor_schedd)
  pmap -d $(pidof condor_schedd)

?

FYI, Condor uses string interning to minimize the memory footprint of
jobs (all classads actually), but, iirc, does not always garbage collect
the string pool. If you have a lot of jobs passing through your Schedd,
say with large unique Environments, you could certainly see memory usage
increase. Then of course there could just be a memory leak.

Best,


matt
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/