[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Disk I/O control with cgroup:blkio?



Hi all,

has somebody experiences using the cgroup blkio controller to limit a
job's I/O to the disk?

Background is, that a user recently send a task whose jobs were doing
primarily merging, i.e., heavily churning on the local disk with r/w.
When nodes got 'too many' jobs of this type, they became somewhat stuck
in I/O wait.
So I have been thinking, if the condor cgroups' blkio controller could
be tuned limiting each job's I/O and not to waste too many cycles in IO
wait and to protect other jobs?

As far as I see, condor cgroups have all no throttling limits set and
have each subgroup has the default weighting.

Would it be feasible in a first step to set some upper limits for the
parent group .../condor.service/blkio.throttle.* - let's say taking the
I/O rates from a small benchmark (bps and/or iops?) and add some safety
margin.
Due to the same weighting this might be not the 'fairest' solution
(would be scaling bps/iops by the number of cores actually a reasonable
assumption if cores are the basic commodity??)

Maybe somebody has some suggestions or experiences in this direction?

Cheers and thanks,
  Thomas

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature