[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Help needed understanding cpu core usage with cgroups



> On Apr 13, 2015, at 2:23 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:
> 
> On 04/13/2015 01:37 PM, Roderick Johnstone wrote:
>> 
>> Yes, the "foreground (Owner)" are non-condor jobs that are not using cgroups. Our situation is that we use condor to soak up spare cycles on peoples desktops as well as using dedicated computer servers and this was a test of the former situation where I was simulating the workstation owner running their own code outside of condor.
>> 
>> So, just to double check I understand this, the cgroups is really working to allocate the relative number of cpus between different condor jobs running in different slots according to the request_cpus criterion in the job submit file, regardless of the number of threads running in each job.
> 
> Right.  To put a finer point on it, the HTcondor cgroup code works to share cpu resources, but only among processes within cgroups that have agreed to share cpus, whether or not those cgroups are managed by HTcondor.

Hi Roderick,

My memory is getting a bit fuzzy on the topic - but isn't the CPU shares mechanism hierarchical?

That is, if you run with BASE_CGROUP=/condor, this is just redistributing the shares given to the /condor group.  Shares given to the other groups are not affected.

Brian

>> 
>> The actual number of cpus that a condor job might run on is not really constrained by cgroups because non-condor (non-cgroups) jobs can squeeze out the condor jobs and if there are no non-condor jobs, the condor jobs can take over all available cpus.
>> 
>> Is that right?
> 
> Yes.  The kernel gives us little control over other peoples' processes.
>> 
>>> 
>>> You could fix this by putting the foreground jobs into their own cgroup,
>>> or running them as a condor job proper.
>> 
>> Thats not really an option for us in the general case.
> 
> You could also affinity-lock the foreground processes.
> 
> Also, I wonder if improving the job's nice-ness would help the situation.  The knob JOB_RENICE_INCREMENT defaults to 10 in some versions of condor, you may want to set it to 0 if it isn't already, and try launching the condor_master with a negative niceness.
> 
> -greg
> 
> 
> -greg
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/