[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Disk I/O control with cgroup:blkio?



Hi Tom,

many thanks - you are right!
with BlockIO{Read,Write}Bandwidth and the block device path -- thinking
about it the whole block device makes much more sense than my attempt to
r/w throttle a partition... :-[

Cheers and thanks,
  Thomas



for completeness the conf working for me
 > /etc/systemd/system/condor.service.d/20-blkio.conf
[Service]
BlockIOReadBandwidth=/dev/disk/by-id/scsi-1234567893  12345678
BlockIOWriteBandwidth=/dev/disk/by-id/scsi-123456789  12345678

I guess that with kernel 4.* the option is
  IO{Read,Write}BandwidthMax
at least that one seems to work on a Fedora 25

On 2017-08-31 15:36, Tom Downes wrote:
> Thomas:
> 
> I believe you have two issues. (1) work from man
> systemd.resource-control on your system. On RHEL7, this parameter is (I
> believe) BlockIOReadBandwidth.
> 
> The man page also makes clear that it limits the IO to devices rather
> than to individual filesystems. And you're pointing to a filesystem UUID
> symlink. It does say that it will identify the device associated with
> the filesystem if you point to one. But... I'd just point to a block
> device: /dev/sda (not sda1,sda2,etc) or using the /dev/disk/by-id
> symlink to sda.
> 
> --
> Tom Downes
> Senior Scientist and Data Center Manager
> Center for Gravitation, Cosmology and Astrophysics
> University of Wisconsin-Milwaukee
> 414.229.2678
> 
> On Thu, Aug 31, 2017 at 3:26 AM, Thomas Hartmann
> <thomas.hartmann@xxxxxxx <mailto:thomas.hartmann@xxxxxxx>> wrote:
> 
>     Hi Dimitri,
> 
>     many thanks for the info! I will try your config on our SL6 machines.
> 
>     I guess the cgconfig.d conf works only with 2.6 and for systemd one
>     would need a drop in. And as for systemd the syntax is somewhat
>     different (not sure if also better...)
>     From what I just learnt the systemd related options are probable now
>       IO{Read,Write}{Bandwidth,IOPS}Max
>     with the device selected by its /dev path [1]. I *assume* that these get
>     translated by systemd into the standard cgroup parameters??
>     Anyway, I am just testing something like [2] but so far the limits seem
>     not to be propagated towards the parent condor cgroup or its slot
>     subgroups [3] :-/
>     Have to fiddle a bit more with systemd...
> 
>     Cheers and thanks,
>       Thomas
> 
> 
>     [1]
>     https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
>     <https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html>
>     I stumbled over
>     https://www.certdepot.net/rhel7-get-started-cgroups/
>     <https://www.certdepot.net/rhel7-get-started-cgroups/>
>     but the options the article uses (CPUShares and BlockIOWeight) seem to
>     be lagacy nowadays(??)
> 
>     [2]
>     > /etc/systemd/system/condor.service.d/20-blkio.conf
>     [Service]
>     IOReadBandwidthMax=/dev/disk/by-uuid/abcdef-12345-6789  12345678
>     IOWriteBandwidthMax=/dev/disk/by-uuid/abcdef-12345-6789  12345678
> 
>     [3]
>     >
>     /sys/fs/cgroup/blkio/system.slice/condor.service/blkio.throttle.read_bps_device
> 
>     > cat
>     /sys/fs/cgroup/blkio/system.slice/condor.service/condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxxx/blkio.throttle.*
>     <http://condor_var_lib_condor_execute_slot1_2@xxxxxxxxxxxxxxxxx/blkio.throttle.*>
>     8:0 Read 0
>     8:0 Write 0
>     8:0 Sync 0
>     8:0 Async 0
>     8:0 Total 0
>     Total 0
>     8:0 Read 0
>     8:0 Write 0
>     8:0 Sync 0
>     8:0 Async 0
>     8:0 Total 0
>     Total 0
> 
> 
>     On 2017-08-30 19:43, Dimitri Maziuk wrote:
>     > We had jobs fail because of too much unzip/untarring and I added
>     >
>     > /etc/cgconfig.d/condor.conf:
>     > group htcondor {
>     >     cpu {}
>     >     cpuacct {}
>     >     memory {}
>     >     freezer {}
>     >     blkio {
>     >         blkio.throttle.write_bps_device = "8:0 104857600
>     > 8:16 104857600 <tel:16%20104857600>";
>     >     }
>     > }
>     >
>     > The errors seems to have disappeared since.
>     >
>     > Note that you have get the major:minor for each disk you want to
>     > throttle on each node which could be a bit of a PITA. And the newline
>     > syntax is silly, but that's how you specify multiple disks.
> 
> 
>     _______________________________________________
>     HTCondor-users mailing list
>     To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>     <mailto:htcondor-users-request@xxxxxxxxxxx> with a
>     subject: Unsubscribe
>     You can also unsubscribe by visiting
>     https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>     <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
> 
>     The archives can be found at:
>     https://lists.cs.wisc.edu/archive/htcondor-users/
>     <https://lists.cs.wisc.edu/archive/htcondor-users/>
> 
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature