[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] memory leak in condor_q?




Indeed, as Brian says, the patches going into the code deal with the below condor_q -stream issue. Note that an often over-looked downside of "condor_q -stream" is that the results will not be sorted in any manner, as normally the schedd sends the job ads out in hash table order and condor_q does the sorting.

Another often over-looked item is using condor_status with either "-schedd" or "-submitter" instead of polling condor_q in order to get "big picture" aggregate statistics like total jobs running/idle per schedd or user, respectively. No need to use condor_q and slowly get a dump of all 500,000 jobs and then count to get this sort of aggregate info, condor_status is much faster and less resource intensive. Yes, condor_status is giving you "cached" information that may only be updated once or twice a minute, but in many situations (like portals that want to update a web page or what have you) that is just fine...

regards,
Todd

On 2/6/2014 10:11 AM, Pek Daniel wrote:
Yaaay, perfect! :)

2014-02-06 Brian Bockelman <bbockelm@xxxxxxxxxxx>:
Hi Daniel,

This is expected.  Stream does not stream (well, until my patches land).  It still buffers the entire response in memory before parsing it.

Stream prevents HTCondor from sorting the results in memory.

Non-blocking condor_q patch set will take care of this and turn it into a real stream.

Sent from my iPhone

On Feb 6, 2014, at 9:46 AM, Pek Daniel <pekdaniel@xxxxxxxxx> wrote:

I've noticed there's not so much difference between the memory
consumption of condor_q -stream and condor_q:

[root@XXX thrash]# /usr/bin/time -v condor_q -stream >/dev/null
...
Maximum resident set size (kbytes): 186592
...

[root@XXX thrash]# /usr/bin/time -v condor_q >/dev/null
...
Maximum resident set size (kbytes): 306352
...

There was 100k jobs in the queue. I've dug into the source a bit, and
I suspect some leak somewhere here:
https://github.com/htcondor/htcondor/blob/b151357dcd13efe2703a2386e1d89bbacac79cd6/src/condor_schedd.V6/qmgmt_send_stubs.cpp#L862-L882

or here:
https://github.com/htcondor/htcondor/blob/0222c71b4a7cf5946ab9d5caf5ecca0ca8c75539/src/condor_utils/classad_oldnew.cpp#L57-L130

Maybe the ReliSock, or the ClassAd...

Cheers,
daniel
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/



--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685