[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] making condor aware of isolcpus/cpu subset



On 6/15/2012 10:27 AM, Vlad wrote:
Greetings,

I am using Condor 7.8 on RHEL 5 machines that use isolcpus boot option
to exclude a couple of physical cores from regular process scheduling
(these cores are dedicated to I/O processing). It appears that Condor is
having a hard time recognizing that these CPUs are not available for job
scheduling: it tries to run its benchmark on each core and only figures
out that a core is unavailable when it times out (takes hours to settle).

We have attempted to configure Condor with 1 slot per available core,
but there does not seem to be a way to bind slots to specific physical
core indices -- is that true or have we just not found the right
configuration options? I would appreciate any insight into how to make
Condor aware of a restricted cpuset available for scheduling.

Thank you in advance,
Vlad

Hi Vlad -

I think Condor can do what you want. Condor v7.8 can indeed bind slots to specific physical cores; below I copied out of the Manual the config knobs of interest. So I think/hope you can easily achieve what you want by setting in condor_config
   NUM_CPUS = X
(where X is the number of physical cores you want condor to control), and then set the cpu affinity knobs as documented below. I think you will have to do a condor_restart (i don't think cpu affinity edits work with just a reconfig, but I cannot recall off the top of my head for certain).

Hope the above helps,
Todd

ENFORCE_CPU_AFFINITY A boolean value that defaults to False. When False, the affinity of jobs and their descendants to a CPU is not enforced. When True, Condor jobs and their descendants maintain their affinity to a CPU. When True, more fine grained affinities may
be specified with SLOT<N>_CPU_AFFINITY.

SLOT<N>_CPU_AFFINITY A comma separated list of cores to which a Condor
job running on a specific slot given by the value of <N> show affinity. Note that slots are numbered beginning with the value 1, while CPU cores are numbered beginning with the value 0. This affinity list only takes effect if ENFORCE_CPU_AFFINITY = True