[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] memory leak in Condor 7.4.2 schedd ???



Hi Dan,

I've copied this here:

http://pcwww.liv.ac.uk/~smithic/core.17281.Z

Its about 500 MB so I'm not sure how much luck you will have
with downloading it. As I write the scheduler is using a stonking
1700 MB and we have only one job in the queue !

regards,

-ian.

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
> Sent: 23 June 2010 15:12
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] memory leak in Condor 7.4.2 schedd ???
> 
> Ian,
> 
> We might be able to tell where the problem is by looking at a core file
> from the bloated schedd process.  One way to generate one is this:
> 
> gdb -p <PID of schedd>
> (gdb) gcore
> (gdb) quit
> 
> It will write the core file into your current working directory, so make
> sure there is enough space.  Also, it will take some time (minute or
> two, I imagine), during which the schedd will be unresponsive.
> 
> --Dan
> 
> Smith, Ian wrote:
> > Apologies for the rather long running thread but I've just now seen a
> > repeat of the excessive schedd memory usage described earlier.
> >
> > Running top
> >
> >    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU
> COMMAND
> >
> >  17281 root       1  59    0 1043M 1038M sleep   55:11  0.50% condor_schedd
> >
> > and pmap:
> >
> > 17281:  condor_schedd -f
> > 00010000    6752K r-x--  /opt1/condor_7.4.3/sbin/condor_schedd
> > 006B6000     536K rwx--  /opt1/condor_7.4.3/sbin/condor_schedd
> > 0073C000     784K rwx--    [ heap ]
> > 00800000 1056768K rwx--    [ heap ]
> > FEF00000     608K r-x--  /lib/libm.so.2
> > FEFA6000      24K rwx--  /lib/libm.so.2
> > FF000000    1216K r-x--  /lib/libc.so.1
> > FF130000      40K rwx--  /lib/libc.so.1
> > FF13A000       8K rwx--  /lib/libc.so.1
> > FF160000      64K rwx--    [ anon ]
> > FF180000     584K r-x--  /lib/libnsl.so.1
> > FF222000      40K rwx--  /lib/libnsl.so.1
> > FF22C000      24K rwx--  /lib/libnsl.so.1
> > FF240000      64K rwx--    [ anon ]
> > FF260000      64K rwx--    [ anon ]
> > FF280000      16K r-x--  /lib/libm.so.1
> > FF292000       8K rwx--  /lib/libm.so.1
> > FF2A0000     240K r-x--  /lib/libresolv.so.2
> > FF2E0000      24K rwx--    [ anon ]
> > FF2EC000      16K rwx--  /lib/libresolv.so.2
> > FF300000      48K r-x--  /lib/libsocket.so.1
> > FF310000       8K rwx--    [ anon ]
> > FF31C000       8K rwx--  /lib/libsocket.so.1
> > FF320000     128K r-x--  /lib/libelf.so.1
> > FF340000       8K rwx--  /lib/libelf.so.1
> > FF350000       8K rwx--    [ anon ]
> > FF360000       8K r-x--  /lib/libkstat.so.1
> > FF372000       8K rwx--  /lib/libkstat.so.1
> > FF380000       8K r-x--  /lib/libdl.so.1
> > FF38E000       8K rwxs-    [ anon ]
> > FF392000       8K rwx--  /lib/libdl.so.1
> > FF3A0000       8K r-x--  /platform/sun4u-us3/lib/libc_psr.so.1
> > FF3B0000     208K r-x--  /lib/ld.so.1
> > FF3F0000       8K r--s-  dev:32,12 ino:70306
> > FF3F4000       8K rwx--  /lib/ld.so.1
> > FF3F6000       8K rwx--  /lib/ld.so.1
> > FFBEC000      80K rwx--    [ stack ]
> >  total   1068448K
> >
> > So it does look to me that around 1 GB of heap is allocated to the schedd.
> > Currently I have 889 jobs in total, 450 idle and 439 running which seems
> > pretty modest.
> >
> > regards,
> >
> > -ian.
> >
> >
> >> Is that virtual, resident or private memory usage?
> >>
> >> Output of,
> >>
> >>   top -n1 -b -p $(pidof condor_schedd)
> >>   pmap -d $(pidof condor_schedd)
> >>
> >> ?
> >>
> >> FYI, Condor uses string interning to minimize the memory footprint of
> >> jobs (all classads actually), but, iirc, does not always garbage collect
> >> the string pool. If you have a lot of jobs passing through your Schedd,
> >> say with large unique Environments, you could certainly see memory usage
> >> increase. Then of course there could just be a memory leak.
> >>
> >> Best,
> >>
> >>
> >> matt
> >> _______________________________________________
> >> Condor-users mailing list
> >> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> >> subject: Unsubscribe
> >> You can also unsubscribe by visiting
> >> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> >> The archives can be found at:
> >> https://lists.cs.wisc.edu/archive/condor-users/
> >>
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> >
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/