[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Disk I/O control with cgroup:blkio?



Thomas:

I believe you have two issues. (1) work from man systemd.resource-control on your system. On RHEL7, this parameter is (I believe) BlockIOReadBandwidth.

The man page also makes clear that it limits the IO to devices rather than to individual filesystems. And you're pointing to a filesystem UUID symlink. It does say that it will identify the device associated with the filesystem if you point to one. But... I'd just point to a block device: /dev/sda (not sda1,sda2,etc) or using the /dev/disk/by-id symlink to sda.

--
Tom Downes
Senior Scientist and Data CenterÂManager
Center for Gravitation, Cosmology and Astrophysics
University of Wisconsin-Milwaukee
414.229.2678

On Thu, Aug 31, 2017 at 3:26 AM, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
Hi Dimitri,

many thanks for the info! I will try your config on our SL6 machines.

I guess the cgconfig.d conf works only with 2.6 and for systemd one
would need a drop in. And as for systemd the syntax is somewhat
different (not sure if also better...)
>From what I just learnt the systemd related options are probable now
 IO{Read,Write}{Bandwidth,IOPS}Max
with the device selected by its /dev path [1]. I *assume* that these get
translated by systemd into the standard cgroup parameters??
Anyway, I am just testing something like [2] but so far the limits seem
not to be propagated towards the parent condor cgroup or its slot
subgroups [3] :-/
Have to fiddle a bit more with systemd...

Cheers and thanks,
 Thomas


[1]
https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
I stumbled over
https://www.certdepot.net/rhel7-get-started-cgroups/
but the options the article uses (CPUShares and BlockIOWeight) seem to
be lagacy nowadays(??)

[2]
> /etc/systemd/system/condor.service.d/20-blkio.conf
[Service]
IOReadBandwidthMax=/dev/disk/by-uuid/abcdef-12345-6789Â 12345678
IOWriteBandwidthMax=/dev/disk/by-uuid/abcdef-12345-6789Â 12345678

[3]
>
/sys/fs/cgroup/blkio/system.slice/condor.service/blkio.throttle.read_bps_device

> cat
/sys/fs/cgroup/blkio/system.slice/condor.service/condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxxx/blkio.throttle.*
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0


On 2017-08-30 19:43, Dimitri Maziuk wrote:
> We had jobs fail because of too much unzip/untarring and I added
>
> /etc/cgconfig.d/condor.conf:
> group htcondor {
>Â Â Âcpu {}
>Â Â Âcpuacct {}
>Â Â Âmemory {}
>Â Â Âfreezer {}
>Â Â Âblkio {
>Â Â Â Â Âblkio.throttle.write_bps_device = "8:0 104857600
> 8:16 104857600";
>Â Â Â}
> }
>
> The errors seems to have disappeared since.
>
> Note that you have get the major:minor for each disk you want to
> throttle on each node which could be a bit of a PITA. And the newline
> syntax is silly, but that's how you specify multiple disks.


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/