[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor not detecting all cores



Hi Guys,
         ÂWe have added a new AMD based server as an additional execute node to our condor cluster. It has 64 cores and 500G memory.Â

But its being shown as having only slot1 with all 500G memory in the output of condor_status Â( Instead of dividingÂresources equally among all cores which is default and we use default on all condor execute nodes )

slot1@ms-s10 LINUX   ÂX86_64 Unclaimed Idle   Â0.000 515706 Â0+01:49:45

Below is what I see in the StartLog :-Â

10/30/23 19:19:34 VM universe will be tested to check if it is available
10/30/23 19:19:34 History file rotation is enabled.
10/30/23 19:19:34 Â Maximum history file size is: 20971520 bytes
10/30/23 19:19:34 Â Number of rotated history files is: 2
10/30/23 19:19:34 Startd will not enforce disk limits via logical volume management.
10/30/23 19:19:34 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
10/30/23 19:19:34 Â slot type 0: Cpus: 64.000000, Memory: 515706, Swap: 100.00%, Disk: 100.00%
10/30/23 19:19:34 slot1: New machine resource allocated

So, It's not distributing resources equally.

On other execute nodes ( which are Intel based ), its working fine with following the StartLog :-Â

10/30/23 20:54:14 VM universe will be tested to check if it is available
10/30/23 20:54:14 History file rotation is enabled.
10/30/23 20:54:14 Â Maximum history file size is: 20971520 bytes
10/30/23 20:54:14 Â Number of rotated history files is: 2
10/30/23 20:54:14 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14 Â slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%


Please let me know how to fix it on the AMD baseÂexecute node.

Here is its detail CPUs and Memory detail :-Â

CPU(s): Â Â Â Â Â Â Â64
On-line CPU(s) list: 0-63
Thread(s) per core: Â1
Core(s) per socket: Â64
Socket(s): Â Â Â Â Â 1
NUMA node(s): Â Â Â Â1
Vendor ID: Â Â Â Â Â AuthenticAMD
BIOS Vendor ID: Â Â ÂAdvanced Micro Devices, Inc.
UMA node0 CPU(s): Â 0-63

free -g
       total    Âused    Âfree   Âshared Âbuff/cache  available
Mem: Â Â Â Â Â Â503 Â Â Â Â Â 1 Â Â Â Â 502 Â Â Â Â Â 0 Â Â Â Â Â 0 Â Â Â Â 499
Swap: Â Â Â Â Â Â 3 Â Â Â Â Â 0 Â Â Â Â Â 3

Thanks,
Gagan