RESERVED_MEMORY not considered by HTCondor


128 * 1024 = 131,072 - 3072 = 128,000

So youâre 723MB over your reserved limit.

This may be a result of consumption policies - the default memory consumption policy rounds up the memory for the slot to the next 128-megabyte increment:

CONSUMPTION_MEMORY = quantize(target.RequestMemory,{128})

The extra 723 megabytes is 5.65 times 128, so if each of the 13 jobs requested 64 megabytes short of the next 128 then the quantize would wind up with that number. I'm not certain, however, that the consumption policy is applied before or after the match - if it's after the match, then that may be the root cause.

Try submitting your jobs with even increments of 128 megabytes of memory, and see if that helps.

Hi HTCondor Experts,
Recently we are experiencing machine crashing because of OOM. Each worker in our cluster has 128GB memory, and each has 3072MB reserved memory that cannot be used by HTCondor:

In addition, each worker has 1 partitionable slot defined as below:
SLOT_TYPE_1 = cpus=100%

However, if you add the dynamic slot size shown below in the second last column (MB), you will get 128,723MB. Condor obviously does not subtract 3072MB (RESERVED_MEMORY) from all the physical memory of the machine.