Re: [HTCondor-users] HT cores utilized to 100% although HT core count is false

Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

On Fri, Jan 15, 2021 at 11:11 AM Tom Downes <tpdownes@xxxxxxxxx> wrote:

Thomas:

CGroups allows you to set hard limits on CPU if you wish.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-cpu

There is a lot of movement in cgroups so refer to your own OS kernel docs where you can. The RHEL 6 link above should work on most contemporary OSes. On modern versions of SystemD (this excludes CentOS 7), you can set this with a SystemD directive CPUQuota but you'd have to use "condor.service" as your parent cgroup for jobs. Otherwise you have to manually script up a solution for your htcondor cgroup.

Alternatively - and this might be best for your use case... set the cgroups's CPU set to the kernel-exposed cores that map to independent physical cores. This will completely preclude the kernel from considering the "fake" cores when scheduling your threads.

Whether any of this actually helps your applications is another matter but they are the way of accomplishing what you want.

Tom
On Fri, Jan 15, 2021 at 10:51 AM Greg Thain <gthain@xxxxxxxxxxx> wrote:
Hi Thomas:

When you sent

COUNT_HYPERTHREADED_CPUS = false

HTCondor will only advertise as many cores as there are physical cores.Â Whether the kernel will choose to schedule processes only on the physical cores is kind of up to the Linux kernel.Â If you absolutely want to prohibit the kernel from ever running a process using hyperthreads, it might be best to disable hyperthreading in the BIOS, but I understand that's more work than merely setting a condor knob.

As you see from your cgroups, an HTCondor with root will set cpu.shares.Â Note that cpu.shares isn't a hard limit, but only comes into play when there is contention.Â That is, let's say on your machine you have 48 slots, all running jobs that have requested and been allocated one core each.Â If 47 of those jobs are idle, (maybe waiting on I/O), but one job launched 96 cpu-bound threads, the linux kernel schedule may run all 96 of those threads concurrently.Â If the 47 idle jobs suddenly become cpu-bound again, the Linux scheduler will throttle the 96 thread job back to one core.

Now, whether to use or disable hyperthreads depends on your needs.Â Enabling hyperthreads, in general, increases throughput, at the cost of performance and per job memory of individual jobs.Â There is no free lunch.

-greg

On 1/15/21 8:30 AM, Thomas Hartmann wrote:
Hi all,

I am currently wondering about a few nodes, that have a utilization of all (HT) cores but should only be using only 50%, i.e., just the physical core count.

The nodes have AMD Epycs with HT/SMT cores active - but since we have
Â COUNT_HYPERTHREAD_CPUS = false
set, Condor should be using only 50% of the (virtual) core count [1], or?.

What worries me a bit is, that the CPU time shares of the jobs look good [2], i.e., currently just <48 single core jobs with a relative '100' weight. However, I am not sure anymore, how the kernel is distributing the CPU time slots here, if the parent relative share is 100%(?) of the overall(??) time share?

Is the CPU time weighting maybe misleading here, if one tries to 'match' only for the physical core count?

Cheers and thanks for ideas,
Â Thomas

[1]
COUNT_HYPERTHREAD_CPUS = false
...
DETECTED_CORES = 96
DETECTED_CPUS = 48
DETECTED_MEMORY = 257656
DETECTED_PHYSICAL_CPUS = 48
..
NUM_CPUS = $(DETECTED_CPUS)

[2]
[root@batch1071 htcondor]# cat /sys/fs/cgroup/cpu,cpuacct/cpu.shares
1024
[root@batch1071 htcondor]# cat /sys/fs/cgroup/cpu,cpuacct/htcondor/cpu.shares
1024
[root@batch1071 htcondor]# cat /sys/fs/cgroup/cpu,cpuacct/htcondor/condor_var_lib_condor_execute_slot*/cpu.shares | sort | wc -l
45
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Mailing List Archives

Public Access

Re: [HTCondor-users] HT cores utilized to 100% although HT core count is false