[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Help needed understanding cpu core usage with cgroups



On 04/10/2015 09:19 AM, Roderick Johnstone wrote:
Hi

I have a condor job with 5 threads (4 cpu bound) running with request_cpus = 2 in the submit file.

When I have 2 foreground (Owner) jobs running at 100%cpu the condor job is only getting the equivalent of 1 cpu between its threads.

I'm measuring this by looking at the aggregate nice cpu percentage which is 25% in the output of the top program (the condor jobs are niced to 16 while the foreground jobs running at nice 0). This result is confirmed by the sum of the cpu percentage of the condor job threads adding up to approx 100% indicating that only one core is being used.

From the wiki page above, I was expecting that the condor job would access 2 cpus rather than 1 under these circumstances. Did I misunderstand something here?

HTCondor with cgroups uses the "cpu shares" parameter to limit cpu usage. HTCondor will set the cpu shares of a cgroup to 100 * number_of_cores_assigned_to_the_slot. This works well if the only cpu-bound activity on the machine is from HTCondor jobs.

When you say "foreground (Owner)" jobs -- are these processes running under HTCondor, or not? If not, and they aren't in any cgroup, then I would expect the behavior that you see, their cpus shares are effectively unlimited, and the condor jobs just get the leftovers.

You could fix this by putting the foreground jobs into their own cgroup, or running them as a condor job proper.

One point that I'm not sure about is the first paragraph in Option 2. HTCondor is started as root (from init scripts; condor is installed form the condor repository rpm) but running as the condor user. Does that count as "condor daemons being started as root"?

If condor is started from init, that counts as "started as root".


-Greg