[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CPU Affinity in condor v8.9.1



Shawn,
	I think cgroup is the right solution for this. Try something
like the following to limit the total memory and cpu usage by condor
and all the processes that it spawns on your CentOS 7 systems,

/etc/systemd/system/condor.service.d/cgroup.conf

[Service]
MemoryAccounting = true
ExecStartPost    = /bin/bash -c "cgcreate -g *:htcondor; cgset -r memory.limit_in_bytes=186G -r memory.memsw.limit_in_bytes=186G -r cpu.cfs_quota_us=2000000 /htcondor"

After restarting the condor service you should then see these limits
in /sys/fs/cgroup/*/htcondor and be able to dynamically change them.

If you then run a Condor job like "/bin/stress -c 50" you should be
able to see with systemd-cgtop that the cpu utilization is capped
at 20 cpu-cores, and similarly run a large memory test job to see
that the expected limits are in place.


Instead of a static quota on condor cpu usage you could also make
sure your priority non-condor services are running in a cgroup
and grant that a much higher value of cpu.shares to make sure they
are never starved for cpu cycles regardless of what condor jobs want.

Thanks.


> On Oct 18, 2019, at 6:48 AM, Shawn A Kwang <kwangs@xxxxxxx> wrote:
> 
> Signed PGP part
> Greg,
> 
> Thanks for the response. Here is the issue with "NUM_CPUS".
> 
> I have attached the partitionable slot configuration that Tom put
> together for the cluster. I haven't touched this since he moved-on. You
> can see at the top he put:
> 
> num_cpus = 2 * $(DETECTED_CPUS)
> 
> I have no clue as to why this was done, but I suspect it has to do with
> the partitionable slot configurations in the rest if this file. Which
> looks to partition the cluster into two partitions, one seems to be
> dedicated to the 'online_cbc_gstlal_inspiral' analysis and the other for
> all other jobs.
> 
> Thus I don't know if I should be changing this setting. Which is one
> reason I looked into the cgroups and other cpu affinity settings.
> 
> Tom also set the RAM in this file as well, which is a reason I am
> investigating cgroups for memory-limiting condor as well as cpu-limiting
> condor.
> 
> Sincerely,
> Shawn
> 
> On 10/17/19 3:39 PM, Greg Thain wrote:
>> On 10/17/19 11:39 AM, Shawn A Kwang wrote:
>>> In Condor (v8.9.1) how do I assign CPU affinity to jobs on the compute
>>> nodes with 24 cores? Let's say I want to limit condor to using 20 cores:
>>> 0-19, for users jobs. It should be noted: the cluster is using
>>> partitionable slots.
>>> 
>>> Bigger picture: I wish to limit condors resources because the compute
>>> nodes run alongside the ceph-osd daemons which I want to 'reserve' a
>>> certain amount of RAM and CPU.
>> 
>> 
>> Shawn:
>> 
>> What I would do on this machine is set
>> 
>> 
>> NUM_CPUS = 20
>> 
>> in the htcondor config.
>> 
>> This will tell htcondor that it only has 20 cores to work with (but not
>> which physical ones), and condor will only dole out 20 cores worth of
>> work.  With cgroups, if there is contention for all the cores on the
>> system, the sum of the condor jobs shouldn't exceed 20 cores worth, but
>> the kernel is free to pick which physical cores to use, leaving the rest
>> for ceph or other system daemons.
>> 
>> 
>> -greg
>> 
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>> 
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> 
> -- 
> Associate Scientist
> Center for Gravitation, Cosmology, and Astrophysics
> University of Wisconsin-Milwaukee
> office: +1 414 229 4960
> kwangs@xxxxxxx
> <50slot.txt>
> 
> 

--
Stuart Anderson
sba@xxxxxxxxxxx