[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] making condor aware of isolcpus/cpu subset



A belated follow-up for me, but AFFINITY-releated settings do work and address the issue. This seems to be a useful new capability in Condor v7.8.

Thanks,
Vlad

On Jun 15, 2012, at 1:07 PM, Todd Tannenbaum wrote:

> On 6/15/2012 10:27 AM, Vlad wrote:
>> Greetings,
>> 
>> I am using Condor 7.8 on RHEL 5 machines that use isolcpus boot option
>> to exclude a couple of physical cores from regular process scheduling
>> (these cores are dedicated to I/O processing). It appears that Condor is
>> having a hard time recognizing that these CPUs are not available for job
>> scheduling: it tries to run its benchmark on each core and only figures
>> out that a core is unavailable when it times out (takes hours to settle).
>> 
>> We have attempted to configure Condor with 1 slot per available core,
>> but there does not seem to be a way to bind slots to specific physical
>> core indices -- is that true or have we just not found the right
>> configuration options? I would appreciate any insight into how to make
>> Condor aware of a restricted cpuset available for scheduling.
>> 
>> Thank you in advance,
>> Vlad
> 
> Hi Vlad -
> 
> I think Condor can do what you want.  Condor v7.8 can indeed bind slots to specific physical cores; below I copied out of the Manual the config knobs of interest. So I think/hope you can easily achieve what you want by setting in condor_config
>   NUM_CPUS = X
> (where X is the number of physical cores you want condor to control), and then set the cpu affinity knobs as documented below.  I think you will have to do a condor_restart (i don't think cpu affinity edits work with just a reconfig, but I cannot recall off the top of my head for certain).
> 
> Hope the above helps,
> Todd
> 
> ENFORCE_CPU_AFFINITY A boolean value that defaults to False. When False, the affinity of jobs and their descendants to a CPU is not enforced. When True, Condor jobs and their descendants maintain their affinity to a CPU. When True, more fine grained affinities may
> be specified with SLOT<N>_CPU_AFFINITY.
> 
> SLOT<N>_CPU_AFFINITY A comma separated list of cores to which a Condor
> job running on a specific slot given by the value of <N> show affinity. Note that slots are numbered beginning with the value 1, while CPU cores are numbered beginning with the value 0. This affinity list only takes effect if ENFORCE_CPU_AFFINITY = True
>