[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor not detecting all cores



Hello Gagan,

In HTCondor 23.0 the new default for execution points is to use a single partitionable slot for jobs. This is much more flexible. Then the execution point can run multi-core jobs or jobs that require more memory than what single slice of the machine would provide. If that is not preferred at your site, you can add

use FEATURE : StaticSlots

to the configuration of your execute points to restore the previous behavior.

...Tim

On 10/30/23 10:59, gagan tiwari wrote:
Hi Guys,
                   We have added a new AMD based server as an additional execute node to our condor cluster.  It has 64 cores and 500G memory. 

But its being shown as having only slot1 with all 500G memory in the output of condor_status   ( Instead of dividing resources equally among all cores which is default and we use default on all condor execute nodes )

slot1@ms-s10 LINUX      X86_64 Unclaimed Idle      0.000 515706  0+01:49:45

Below is what I see in the StartLog :- 

10/30/23 19:19:34 VM universe will be tested to check if it is available
10/30/23 19:19:34 History file rotation is enabled.
10/30/23 19:19:34   Maximum history file size is: 20971520 bytes
10/30/23 19:19:34   Number of rotated history files is: 2
10/30/23 19:19:34 Startd will not enforce disk limits via logical volume management.
10/30/23 19:19:34 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
10/30/23 19:19:34   slot type 0: Cpus: 64.000000, Memory: 515706, Swap: 100.00%, Disk: 100.00%
10/30/23 19:19:34 slot1: New machine resource allocated

So, It's not distributing resources equally.

On other execute nodes ( which are Intel based ), its working fine with following the StartLog :- 

10/30/23 20:54:14 VM universe will be tested to check if it is available
10/30/23 20:54:14 History file rotation is enabled.
10/30/23 20:54:14   Maximum history file size is: 20971520 bytes
10/30/23 20:54:14   Number of rotated history files is: 2
10/30/23 20:54:14 Allocating auto shares for slot type 0: Cpus: auto, Memory: auto, Swap: auto, Disk: auto
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%
10/30/23 20:54:14   slot type 0: Cpus: 1.000000, Memory: 3996, Swap: 6.25%, Disk: 6.25%


Please let me know how to fix it on the AMD base execute node.

Here is its detail CPUs and Memory detail :- 

CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  64
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
BIOS Vendor ID:      Advanced Micro Devices, Inc.
UMA node0 CPU(s):   0-63

free -g
              total        used        free      shared  buff/cache   available
Mem:            503           1         502           0           0         499
Swap:             3           0           3

Thanks,
Gagan



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736