[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] RESERVED_MEMORY not considered by HTCondor



Thank you for the inputs Michael. My previous email saying that worker1 has 128GB memory is not accurate. I used "free -m" to check the total memory on that machine:

worker1:1} free -m
   total  used   free shared   buff/cache available Mem: 128723 39524 35613 47 53585  Â 83923
Swap:ÂÂÂÂÂÂÂÂÂÂÂÂ 4095ÂÂÂÂÂÂ Â ÂÂ 170ÂÂÂÂÂÂ Â 3925

The total memory on worker1 is 128,723 MBytes, which is the sum of all dynamic slots and partitionable slot size. Obviously, 3072 MB (reserved) is not used.

Any ideal?

Thanks,

Zhuo

On 10/23/2018 2:43 PM, Michael Pelletier wrote:
Jewel,

128 * 1024 = 131,072 - 3072 = 128,000

So youâre 723MB over your reserved limit.

This may be a result of consumption policies - the default memory consumption policy rounds up the memory for the slot to the next 128-megabyte increment:

CONSUMPTION_MEMORY = quantize(target.RequestMemory,{128})

The extra 723 megabytes is 5.65 times 128, so if each of the 13 jobs requested 64 megabytes short of the next 128 then the quantize would wind up with that number. I'm not certain, however, that the consumption policy is applied before or after the match - if it's after the match, then that may be the root cause.

Try submitting your jobs with even increments of 128 megabytes of memory, and see if that helps.

Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon Company

From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Zhuo Zhang via HTCondor-users
Sent: Tuesday, October 23, 2018 11:17 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Zhuo Zhang <zhuo.zhang@xxxxxxxx>
Subject: [External] [HTCondor-users] RESERVED_MEMORY not considered by HTCondor

Hi HTCondor Experts,
Recently we are experiencing machine crashing because of OOM. Each worker in our cluster has 128GB memory, and each has 3072MB reserved memory that cannot be used by HTCondor:
RESERVED_MEMORYÂ Â Â Â Â= 3072

In addition, each worker has 1 partitionable slot defined as below:
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=100%
SLOT_TYPE_1_PARTITIONABLE = TRUE

However, if you add the dynamic slot size shown below in the second last column (MB), you will get 128,723MB. Condor obviously does not subtract 3072MB (RESERVED_MEMORY) from all the physical memory of the machine.

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

--
Zhuo Zhang, ASSISTT
I.M. Systems Group (IMSG), NOAA/NESDIS/STAR
5825 University Research Court, Suite 1500 (IMSG), Cube 1500-11
College Park, MD 20740
Tel: (240) 582-3585 (x23017)