[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] totalmemory and totalcpus calculation



Hello Experts,

Any inputs on it.

Thanks & Regards,
Vikrant Aggarwal


On Tue, Jul 30, 2019 at 12:50 PM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello Experts,

I am digging up on the stuff related to memorypercore, I have seen discussions happened in past but none of them is covering my queries hence started this thread:

- Following is the output from one of the pool in our env, I am not able to understand the output of totalcpus and totalmemory, AFAIU totalmemory output is impacted because of totalcpus. Few days before when I captured same output majority of nodes were showing totalcpus as 28 because of which totalmemory was 172032 but now only few of them are showing. If the totalcpus count is 27 then totalmemory is showing right value (totalcpus * memorypercore) but when totalcpus are 28 it's showing wrong value of totalmemory.and totalcpus defined the resources available to run the condor jobs.

~~~
$ condor_status -compact -const 'detectedmemory > 250000' -af:h machine memorypercore detectedmemory totalmemory detectedcpus totalcpus totalpssize memory
machine                memorypercore detectedmemory totalmemory detectedcpus totalcpus       totalpssize      memory
condornode894.test.com 7130 Â Â Â Â Â257923 Â Â Â Â 192510 Â Â Â28 Â Â Â Â Â 27.0 Â Â Â Â Â Â Â Â Â66757.2 Â Â Â Â Â Â Â 128340
condornode895.test.com 7130 Â Â Â Â Â257923 Â Â Â Â 192510 Â Â Â28 Â Â Â Â Â 27.0 Â Â Â Â Â Â Â Â Â66713.89999999999 Â Â 114080
condornode896.test.com 7130 Â Â Â Â Â257923 Â Â Â Â 192510 Â Â Â28 Â Â Â Â Â 27.0 Â Â Â Â Â Â Â Â Â66472.8 Â Â Â Â Â Â Â 92690
condornode897.test.com 7130 Â Â Â Â Â257923 Â Â Â Â 192510 Â Â Â28 Â Â Â Â Â 27.0 Â Â Â Â Â Â Â Â Â66854.39999999999 Â Â 99820
condornode898.test.com 7130 Â Â Â Â Â257923 Â Â Â Â 192510 Â Â Â28 Â Â Â Â Â 27.0 Â Â Â Â Â Â Â Â Â66315.8 Â Â Â Â Â Â Â 163990
condornode899.test.com 7130 Â Â Â Â Â257923 Â Â Â Â 192510 Â Â Â28 Â Â Â Â Â 27.0 Â Â Â Â Â Â Â Â Â66772.39999999999 Â Â 0 Â Â
condornode913.test.com 6978 Â Â Â Â Â257923 Â Â Â Â 172032 Â Â Â28 Â Â Â Â Â 28.0 Â Â Â Â Â Â Â Â Â66881.89999999999 Â Â 151098
condornode927.test.com 6978 Â Â Â Â Â257923 Â Â Â Â 172032 Â Â Â28 Â Â Â Â Â 28.0 Â Â Â Â Â Â Â Â Â67120.2 Â Â Â Â Â Â Â 151098
condornode961.test.com 9253     Â257923     172032   Â28      28.0         Âundefined       144273
condornode963.test.com 9253 Â Â Â Â Â257923 Â Â Â Â 172032 Â Â Â28 Â Â Â Â Â 28.0 Â Â Â Â Â Â Â Â Â5566.57 Â Â Â Â Â Â Â 162779
~~~

- We are using the following expressions for calculating the memory per core and and memory.

~~~
minimum_memory_per_core = 5342
reserved_system_memory = quantize($(TotalPsSize)+1024, {2048})
reserved_system_cpus = 1

available_memory = int(quantize($(DETECTED_MEMORY)-$(reserved_system_memory),$(smallest_dimm)))
available_cpus = $(DETECTED_CPUS)-$(reserved_system_cpus)

# ensure that each core gets >= minimum_memory_per_core
NUM_CPUS = int(ifThenElse( $(available_memory)/$(minimum_memory_per_core) > $(available_cpus), $(available_cpus), $(available_memory)/$(minimum_memory_per_core)) )
MemoryPerCore = int($(available_memory) / $(NUM_CPUS))
MEMORY = $(MemoryPerCore) * $(NUM_CPUS)
~~~

Taking an example of condornode913 node, it's showing right value for memorypercore but how it's calculating totalmemory and totalcpus and why it's keep on changing?

~~~
reserved_system_memory = quantize(66882+1024, {2048}) == 69632
available_memory = int(quantize(257923-69632), 4096) == 188416
available_cpus = 27
NUM_CPUS = (188416/5342 > 27, 27, 188416/5342) == 27
MemoryPerCore = (188416/27) == 6978
MEMORY = (6978 * 27) == 188416
~~~

- Also from where it gets the totalpssize value I didn't find any _expression_ to calculate this value I don't think it's env specific some references are pointing to pssize probably both are same.Â


Regards,
Vikrant