Subject: Re: [HTCondor-users] [FSL] Parallel processing: Condor or SGE with CentOS 7
From: FSL - FMRIB's Software Library [mailto:FSL@xxxxxxxxxxxxxx]
On Behalf Of d m
Sent: Friday, July 15, 2016 8:47 PM
> We are using a linux server with CentOS 7 installed. > > We want to take advantage of our multi-core system,
and were wondering
> which is the best setup to use. In the past we used ubuntu, and condor
was
> a pretty easy setup. > > Does Condor work with CentOS 7? It is a much
easier setup than SGE.
> However, we installed Condor on our CentOS 7 system. But it still
did not
> use multiple threads when running tbss_2_proc. Has anyone had
success
> running condor with CentOS 7? > > If not, what about SGE and parallel processing
with CentOS 7? > > Your would greatly be appreciated.
I've found that using Partitionable slots works best for multi-core processing, rather than static slots. We typically
have a variety of job types which require different core counts, from "-j8"
software builds to single-thread data-formatting runs, so Partitionable
slots gives us the most flexibility to divide up the resources appropriately
with the highest possible utilization.
If you set up cgroups, then the kernel will constrain
a job in a slot to only as many cores as it requested when there is
competition for CPU time, using the "cpu.shares" cgroup
attribute. It will still run in multiple threads, but the kernel scheduler will
only give it time slices adding up to that number of cores.
Other than this, HTCondor doesn't govern how many
threads a process can spawn. This can pose a problem with things like
MATLAB, which check to see the total number of cores on the entire
system and spawn that many threads, which leads to the square of the
number of cores worth of threads - 4,096 on a 64-core machine, all
fighting for their 1.56% of a CPU. The MATLAB maxNumCompThreads() function
comes in handy here.
It's also possible to set up static slots with more
than one CPU, so if all your work uses two cores, you can define
your static slots with two cores each. Any one-core jobs would waste
a core while running, though.
If by multiple threads you're talking about MPI jobs,
you'll want to read up on the parallel universe. There's a few tricks
to that which are set forth in the manual to pick up multiple slots
on one or more machines for a single job. It's not quite as pretty
with partitionable slots, I found, particularly in v8.2.