[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Memory use



On 2/23/2015 4:10 PM, Dimitri Maziuk wrote:
On 02/23/2015 03:47 PM, Ricardo Oda wrote:
Hello,

I have a simple question: what happens if a running job utilizes more
memory than the size reserved in the matched slot? Will it be killed or it
can mess up with the memory of another slot?


With the default configuration, the job is simply allowed to use more memory than the size reserved, and thus could mess up with the memory of another slot.

However, you can configure HTCondor to limit the memory usage of a job several different ways (linux containers, setlimit, or just polling and placing the job on hold).... See
  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToLimitMemoryUsage

Regardless of the answer to that, the wiki lists a couple of known cases
where condor gets the memory used by a job wrong here:
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=WhyLargeImageSize


HTCondor maintains several job classad attributes in addition to ImageSize, including ProportionalSetSizeKb, ResidentSetSize, and MemoryUsage. You can see in the manual the details of what each metric entails. In current releases of HTCondor, I'd recommend just referring to "MemoryUsage" in policy expressions instead of ImageSize - MemoryUsage attempts to the most balanced overall metric to say how much RAM is my job using, and it tends to not overestimate as much as ImageSize.

So if you're running jobs that may run out of memory I recommend
limiting them to slots with enough RAM in the first place: either
request_memory=BIG-ENOUGH-NUMor select the "known good" machines some
other way and set request_memory=0 to override condor's own guesstimate.


I agree with the advice of setting request_memory=BIG-ENOUGH-NUM (or you can set the condor_config knob JOB_DEFAULT_REQUESTMEMORY to specify a default that condor_submit will use for request_memory).

I don't think I'd go with request_memory=0 ...

regards,
Todd