[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Help needed understanding cpu core usage with cgroups



On 13/04/15 19:00, Greg Thain wrote:
On 04/10/2015 09:19 AM, Roderick Johnstone wrote:
Hi

I have a condor job with 5 threads (4 cpu bound) running with
request_cpus = 2 in the submit file.

When I have 2 foreground (Owner) jobs running at 100%cpu the condor
job is only getting the equivalent of 1 cpu between its threads.

I'm measuring this by looking at the aggregate nice cpu percentage
which is 25% in the output of the top program (the condor jobs are
niced to 16 while the foreground jobs running at nice 0). This result
is confirmed by the sum of the cpu percentage of the condor job
threads adding up to approx 100% indicating that only one core is
being used.

From the wiki page above, I was expecting that the condor job would
access 2 cpus rather than 1 under these circumstances. Did I
misunderstand something here?

HTCondor with cgroups uses the "cpu shares" parameter to limit cpu
usage.  HTCondor will set the cpu shares of a cgroup to 100 *
number_of_cores_assigned_to_the_slot.  This works well if the only
cpu-bound activity on the machine is from HTCondor jobs.


When you say "foreground (Owner)" jobs -- are these processes running
under HTCondor, or not?  If not, and they aren't in any cgroup, then I
would expect the behavior that you see, their cpus shares are
effectively unlimited, and the condor jobs just get the leftovers.

Greg

Thanks for your responses.

Yes, the "foreground (Owner)" are non-condor jobs that are not using cgroups. Our situation is that we use condor to soak up spare cycles on peoples desktops as well as using dedicated computer servers and this was a test of the former situation where I was simulating the workstation owner running their own code outside of condor.

So, just to double check I understand this, the cgroups is really working to allocate the relative number of cpus between different condor jobs running in different slots according to the request_cpus criterion in the job submit file, regardless of the number of threads running in each job.

The actual number of cpus that a condor job might run on is not really constrained by cgroups because non-condor (non-cgroups) jobs can squeeze out the condor jobs and if there are no non-condor jobs, the condor jobs can take over all available cpus.

Is that right?


You could fix this by putting the foreground jobs into their own cgroup,
or running them as a condor job proper.

Thats not really an option for us in the general case.



One point that I'm not sure about is the first paragraph in Option 2.
HTCondor is started as root (from init scripts; condor is installed
form the condor repository rpm) but running as the condor user. Does
that count as "condor daemons being started as root"?

If condor is started from init, that counts as "started as root".

Thanks for the clarification.

Roderick