[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Memory leaks in Quill ?



I am also seeing all my condor 7.4.3 condor_Quill daemons using between
1.5-2 GB of RAM apiece. I don't recall that this was happening in earlier versions of quill.


Steve


On Fri, 31 Dec 2010, Marian Zvada wrote:

Hi,

On 27.12.2010 16:12, slebodnik wrote:
This problem is still actual.

I attach a graph of actual quill memory usage.
First 130 hours, value of variable QUILL_POLLING_PERIOD was "1", and
then we
change value to "10".

We would like to also know how much memory use quill daemon on yours submit
machine (where schedd daemon is running).

I remember there is somewhere presentation from condor-week 20?? which provides lots of useful info regarding memory footprint vs. quill usage.

Can someone, maybe from condor folks, recall this presentation and point it to us? Or maybe provide other recommended quill setup we might use for further tests? Qull logs don't say much using QUILL_DEBUG=D_FULLDEBUG.

Currently, we have done some valgrind and gdb work to find out source of the trouble. Our investigation shows it's likely typical memory leak and using simplified code as follows helps:

FILE * fp;
while (true){
  fp = fdopen(file_descriptor, "w") //here is memory leak
  //work
  wait(QUILL_POOLING_PERIOD)
}

I've talked to Lukas what he found out also on our development environment - see attached plot. What is strange, same code on CentOS takes all the memory and crashes virtual host on our dev env. Whereas, on debian it takes 300MB of res mem and that's it... Maybe different glibc on CentOS vs debian, but this is what Lukas can comment more in details if needed.

Well, we have option to compile quill and see what is going on with memory in our test case or we can live with gdb hook Lukas patched to the code. However we would be happy if someone from condor experts can review our report and give a hint in here. Or, should we open condor-admin ticket track the issue yet?

Thanks in advance!

Cheers,
Marian


PS: Happy New Year 2011 to all condor-users and wishing successful new lines of code into next condor releases ;)



Thanks

Lukas Slebodnik

On Tue, 7 Dec 2010 09:41:11 +0100, Vladimir Motoska <motoska@xxxxxxxx>
wrote:
Hi,

We have some issues with quill daemon on the submitter node. We use
condor 7.4.4 , x86_64 rhel5 dynamic running on CentOS with
postgresql 8.1.22-1.el5_5.1.

The problem is that quill accumulates memory. Here is the memory usage
log
http://pastie.org/1354799
The data are divided to 3 columns.
1st percentage of memory used by quill
2rd usage of memory in KB
3th usage of memory in MB
Each line represents a time stamp one minute.

Here is also our condor_config.
http://pastie.org/1354821

On all other nodes quill runs just fine. Can anybody give me some hint ?

Thanks


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


--
------------------------------------------------------------------
Steven C. Timm, Ph.D  (630) 840-8525
timm@xxxxxxxx  http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Group Leader.
Lead of FermiCloud project.