[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory leak in python bindings?



Hi Brian,

Unfortunately schedd.xquery() exhibits the same behavior, and as it turns out, so does query() on htcondor.Collector().

-Scott
Â
Date: Mon, 13 Jun 2016 21:29:26 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Memory leak in python bindings?
Message-ID: <8F7BB5AE-8E8D-46A8-AC00-4B4567FD576C@xxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

Hi Scott,

Definitely looks like a leak to me. Thanks for reporting it.

As a potential workaround: do you see the leak go away if you use schedd.xquery() instead?

Thanks,

Brian

> On Jun 12, 2016, at 9:53 PM, Scott Leishman <scott.leishman@xxxxxxxxx> wrote:
>
> Hi,
>
> I have a long running python script that queries the scheduler periodically and noticed the process continually growing in memory over time.
>
> The following snippet is enough to illustrate the issue on our systems (running condor v8.4.6):
>
> import resource
> import htcondor
> schedd = htcondor.Schedd()
> while True:
>Â Â Âschedd.query()
>Â Â Âprint('mem use: %s (kb)' % resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
> With no jobs running, I typically see the memory use increase by 128kb once every 45 or so iterations of the loop.
>
> pympler didn't show any new python objects getting created, but when I hooked up valgrind's leak checker I see it report the following lost blocks:
>
> ==2473759== 136,000 (92,000 direct, 44,000 indirect) bytes in 500 blocks are definitely lost in loss record 820 of 821
>
> ==2473759==Â Â at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>
> ==2473759==Â Â by 0x6B42B29: CondorQ::getFilterAndProcessAds(char const*, StringList&, int, bool (*)(void*, compat_classad::ClassAd*), void*, bool) (in /usr/lib/condor/libcondor_utils_8_4_6.so)
>
> ==2473759==Â Â by 0x6B43EB6: CondorQ::fetchQueueFromHostAndProcess(char const*, StringList&, int, int, bool (*)(void*, compat_classad::ClassAd*), void*, int, CondorError*) (in /usr/lib/condor/libcondor_utils_8_4_6.so)
>
> ==2473759==Â Â by 0x65750CA: Schedd::query(boost::python::api::object, boost::python::list, boost::python::api::object, int, CondorQ::QueryFetchOpts) (in /usr/lib/python2.7/dist-packages/htcondor.so)
>
> ==2473759==Â Â by 0x6575C4E: query_overloads::non_void_return_type::gen<boost::mpl::vector7<boost::python::api::object, Schedd&, boost::python::api::object, boost::python::list, boost::python::api::object, int, CondorQ::QueryFetchOpts> >::func_0(Schedd&) (in /usr/lib/python2.7/dist-packages/htcondor.so)
>
> ==2473759==Â Â by 0x656B0FA: boost::python::objects::caller_py_function_impl<boost::python::detail::caller<boost::python::api::object (*)(Schedd&), boost::python::default_call_policies, boost::mpl::vector2<boost::python::api::object, Schedd&> > >::operator()(_object*, _object*) (in /usr/lib/python2.7/dist-packages/htcondor.so)
>
> ==2473759==Â Â by 0x67ED9E9: boost::python::objects::function::call(_object*, _object*) const (in /usr/lib/condor/libpyclassad2.7_8_4_6.so <http://libpyclassad2.7_8_4_6.so/>)
>
> ==2473759==Â Â by 0x67EDD57: ??? (in /usr/lib/condor/libpyclassad2.7_8_4_6.so <http://libpyclassad2.7_8_4_6.so/>)
>
> ==2473759==Â Â by 0x67E8852: boost::python::handle_exception_impl(boost::function0<void>) (in /usr/lib/condor/libpyclassad2.7_8_4_6.so <http://libpyclassad2.7_8_4_6.so/>)
>
> ==2473759==Â Â by 0x67EC662: ??? (in /usr/lib/condor/libpyclassad2.7_8_4_6.so <http://libpyclassad2.7_8_4_6.so/>)
>
> ==2473759==Â Â by 0x499BE4: PyEval_EvalFrameEx (in /usr/bin/python2.7)
>
>
> ==2473759==Â Â by 0x4A1633: ??? (in /usr/bin/python2.7)
>
>
>
> I can work around this issue by restarting the process periodically but it seems like there is an allocation in getFilterAndProcessAds that isn't later freed?
>
>
>
> -Scott
>