[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor not detecting all cores



What you are seeing is the new default slot configuration of a single Partitionable slot


It is not true that not all of the cores are being detected.  This line

10/30/23 19:19:34   slot type 0: Cpus: 64.000000, Memory: 515706, Swap: 100.00%, Disk: 100.00%

Shows that you have a single p-slot with 64 cores and 515Gb of memory.   

If all of your jobs want 1 core,  then when each of those 1-core jobs is is started, a 1-core dynamic slot will be carved off of the p-slot to run the job.   As many as 64 1-core dynamic slots can be created and they will exist as long as there are 1 core jobs to run.  

See the manual for more information. 




-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx>
Sent: Monday, October 30, 2023 10:59 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Condor not detecting all cores
 
Hi Guys,
                   We have added a new AMD based server as an additional execute node to our condor cluster.  It has 64 cores and 500G memory. 

But its being shown as having only slot1 with all 500G memory in the output of condor_status   ( Instead of dividing resources equally among all cores which is default and we use default on all condor execute nodes )

slot1@ms-s10 LINUX      X86_64 Unclaimed Idle      0.000 515706  0+01:49:45

Below is what I see in the StartLog :- 

10/30/23 19:19:34 VM universe will be tested to check if it is available
10/30/23 19:19:34 History file rotation is enabled.
10/30/23 19:19:34   Maximum history file size is: 20971520 bytes
10/30/23 19:19:34   Number of rotated history files is: 2
10/30/23 19:19:34 Startd will not enforce disk limits via logical volume management.
10/30/23 19:19:34 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
10/30/23 19:19:34   slot type 0: Cpus: 64.000000, Memory: 515706, Swap: 100.00%, Disk: 100.00%
10/30/23 19:19:34 slot1: New machine resource allocated

So, It's not distributing resources equally.

On other execute nodes ( which are Intel based ), its working fine with following the StartLog :- 

10/30/23 20:54:14 VM universe will be tested to check if it is available
10/30/23 20:54:14 History file rotation is enabled.
10/30/23 20:54:14   Maximum history file size is: 20971520 bytes
10/30/23 20:54:14   Number of rotated history files is: 2
10/30/23 20:54:14 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%


Please let me know how to fix it on the AMD base execute node.

Here is its detail CPUs and Memory detail :- 

CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  64
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Advanced Micro Devices, Inc.
UMA node0 CPU(s):   0-63

free -g
              total        used        free      shared  buff/cache   available
Mem:            503           1         502           0           0         499
Swap:             3           0           3

Thanks,
Gagan