[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] cgroups applied at job level or process level



Hi Vikrant,

I think your test is not quite right.

  with ProcessPoolExecutor(max_workers=10) as ex:
    ex.submit(hog_for_seconds, 300)Â

ÂThat does create a pool of 10 worker processes, but you only submit one function to be run, so only one process ever gets busy. Could you try running

  with ProcessPoolExecutor(max_workers=10) as ex:
    ex.map(hog_for_seconds, [300] * 10)Â

which should submit hog_for_seconds to be run 10 times, one for each worker.


Josh Karpel


On Mon, Oct 21, 2019 at 7:04 AM Vikrant Aggarwal <ervikrant06@xxxxxxxxx> wrote:
Hello HTCondor Experts,Â

I have a query regarding the cgroup implementation in condor. AFAIK, cgroup works at a job level so if I am giving request_cpus as 1 then irrespectiveÂhow many processes job will spawn all the processes combinedly can only take 100% of cpu time share and if I request_cpu value is 2 then all processesÂcombinedly belonging to single job can take 200% of cpu time share.Â

However this theoretical understanding seems to be getting wrong with the test which we conducted on one of the node with 23 cores available to run job.

1) Started batch of 22 jobs with below python code to hog CPU core and each job was supposed to be completed in 300s.ÂÂloop_per_sec value was used after doing tests to ensure that job will run for 300s.Â

def hog_for_seconds(n):
 loop_per_sec = 16304387
 count = 0
 while count < (loop_per_sec * n):
  count += 1
 return count

start = time.time()
hog_for_seconds(300)
end = time.time()

2) At same time submitted one more job which is internally spawning 10 processes and it was expected to take at-least twice more than the time for completion.Â

def hog_for_seconds(n):
 loop_per_sec = 16304387
 count = 0
 while count < (loop_per_sec * n):
  count += 1
 return count

start = time.time()
with ProcessPoolExecutor(max_workers=10) as ex:
 ex.submit(hog_for_seconds, 300)
end = time.time()

But to our surprise both jobs were completed in approximately same time.Â

I am aware of cgroup oppportunistic behavior hence we used ASSIGN_CPU_AFFINITY setting but the results remain same.Â

Can anyone please help to understand how cgroup are restricting the job cpu share?

Condor version : 8.5.8 (Dynamic slots)

Thanks & Regards,
Vikrant Aggarwal
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/