[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fwd: Jobs using large memory but profiler says that jobs are fine?



Based on that htop report, it's allocating a bunch of virtual memory and likely not using it. Unfortunately, HTCondor is tracking that in the default configuration. The first paragraph of this section of the docs goes into more detail:
https://htcondor.readthedocs.io/en/latest/admin-manual/ep-policy-configuration.html#limiting-resource-usage-using-cgroups

If you're setting OMP_NUM_THREADS=1 correctly, then there might be something else in your Python process allocating virtual pages. You'll have to find that, or keep requesting more memory from HTCondor.

Alternatively, if you control the HTCondor cluster, you may want to switch to cgroup memory tracking and only track physical memory.

Best,
David

On Tue, Mar 12, 2024 at 11:29âAM Angel Campoverde <angelfcampoverde@xxxxxxxxx> wrote:
I will put this back in the mailing list which was dropped at some point.

---------- Forwarded message ---------
From: Angel Campoverde <angelfcampoverde@xxxxxxxxx>
Date: Tue, Mar 12, 2024 at 5:05âPM
Subject: Re: [HTCondor-users] Jobs using large memory but profiler says that jobs are fine?
To: David Schultz <david.schultz@xxxxxxxxxxxxxxxx>


Dear David,

Thanks for your answer, I ran it again with native flag and I still see the same memory usage:

image.png
This time it was done not through a job in the cluster, but locally. Using htop I see:

image.png
for pretty much the whole duration of the process. I am running with time, right now and will get back to you soon.

Cheers.

On Thu, Mar 7, 2024 at 5:26âPM David Schultz <david.schultz@xxxxxxxxxxxxxxxx> wrote:
This is more Python profiling and probably not relevant to this list, but I assume you've tried running memray with the --native argument, to track numpy and other C extensions? You could also run `/usr/bin/time <executable> <arguments>` manually, and see what it thinks the maximum memory usage is.ÂÂ

If these two differ by a significant amount, it might be that Python is hanging on to the extra memory, either due to not running garbage collection often enough, or via the memory allocation pool. You could try running gc.collect() in places, though it would probably take a more thorough analysis of the program to figure out exactly what is going on.

Best,
David

On Thu, Mar 7, 2024 at 8:20âAM Angel Campoverde <angelfcampoverde@xxxxxxxxx> wrote:
Dear Experts,

I am still stuck on this issue and unable to send jobs to more than a handful of machines. Is there anything else that can be done?

Cheers.

On Wed, Mar 6, 2024 at 11:29âAM Angel Campoverde <angelfcampoverde@xxxxxxxxx> wrote:
Dear Ben,

I am running on centos7 machines, I can see:

el7.x86_64

From inside the job. The program does write files, but only about 200-300 Mb, It definitely does not look like I am using more than a few hundred megabytes of memory.

Cheers.

On Tue, Mar 5, 2024 at 5:22âPM Ben Jones <ben.dylan.jones@xxxxxxx> wrote:
Do you know if youâre running on an el9 machine or similar? I think that on cgroup v2 machines the memory reporting currently reports file cache. Are you writing a large file?Â

On 5 Mar 2024, at 17:08, Angel Campoverde <angelfcampoverde@xxxxxxxxx> wrote:

ï
Dear Condor experts,

I am running a job that seems to be going beyond 8Gb in memory usage. However when I run it with memray (a memory profiler for python projects) I see that my memory usage is only 107Mb, as you can see in the screenshot below:

<image.png>

I have contacted the python mailing list:


and despite I try to turn off multithreading, I still get this high memory usage. Why is this happening and how should I fix it? I cannot use 16Gb of memory in my jobs, there are very few machines that have that much memory.

Cheers.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/