[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Resource Matching



On 4/4/24 10:54, Matthew T West via HTCondor-users wrote:
Let me try again, at least regarding CPUs & chiplets.

When requesting multiple CPU cores for a single job, can I specify that all the cores come from a single NUMA node or single socket, when the EP is set-up to dynamically allocate slots? Maybe that is the default but I cannot find any info in the docs.


Hi Matt:

There's no good or easy way to do this today in HTCondor. The startd can affinity-lock a job to a cpu-core or a number of cores, but there is no automatic way in condor when requesting more than one core to affiinity-lock into some specific geometry. Part of the problem is one of naming. The affinity APIs in the kernel speak in terms of numeric core-ids, but there is no standard for how the number assigned to a core-id relates to it's NUMA (or hyperthread) geometry.

Now, there are hack-arounds (there always are!), wherein if you are willing to forbid jobs from ever running across zones, you can configure a startd or a p-slot in a startd to be dedicated to a particular subset of the job, and use ASSIGN_CPU_AFFINITY to lock that startd or p-slot to that subsection of the cpus on the system.

Personally, my strongly-held, but unfulfilled opinion, is that this is all the responsibility of the OS kernel, and *it* should figure out which processes belong together, and schedule them appropriately. But perhaps that is naive.

-greg