[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] soft/hard limiting cpu.shares ?



Hi all,

a short question regarding jobs core time scaling via cgroup cpu.shares:

The relative share of a job's cgroup is only limiting with respect to the total core-scaled CPU time, or?

I.e., we are running our nodes with hyperthreading 2x enabled for simplicity, since we use the same machines for production jobs as well as for user job sub-clusters.

Since user have occasionally odd user jobs (that tend to work better without overbooking) we broker on user nodes only 1/2 of the HT-core numbers for jobs.

now, the condor parent cgroup has assigned
  htcondor/cpu.shares = 1024
with respect to the total system share of
  cpu.shares=1024
so all condor child processes (without further sub-groups) could in principle use up to 100% of the total HT-core scaled CPU time.

A single core job gets a relative share like

htcondor/condor_var_lib_condor_execute_slot2_15@xxxxxxxxxxxxxxx/cpu.shares
100
where we broker only 50% of the total HT-core scaled time - as far as I see.

However, user jobs can utilize more than their nominally assigned cpu share.
My understanding is, that the kernel notices, that the total CPU time is not utilized completely - and thus allows processes to use more than their nominal time limit as there is still CPU time available.
Is this correct? ð

When we scale the condor parent cgroup to a reasonable fraction of the system cpu.share (taking HT efficiency into account), we should be able to scale CPU times per job to (roughly) core-equivalents - without the need to bind jobs to specific cores, or?

Cheers,
  Thomas

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature