[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [FSL] Parallel processing: Condor or SGE with CentOS 7

From: FSL - FMRIB's Software Library [mailto:FSL@xxxxxxxxxxxxxx] On Behalf Of d m
Sent: Friday, July 15, 2016 8:47 PM

> We are using a linux server with CentOS 7 installed.

> We want to take advantage of our multi-core system, and were wondering
> which is the best setup to use. In the past we used ubuntu, and condor was
> a pretty easy setup.

> Does Condor work with CentOS 7? It is a much easier setup than SGE.
> However, we installed Condor on our CentOS 7 system. But it still did not
> use multiple threads when running tbss_2_proc. Has anyone  had success
> running condor with CentOS 7?

> If not, what about SGE and parallel processing with CentOS 7?
> Your would greatly be appreciated.

I've found that using Partitionable slots works best for multi-core

processing, rather than static slots. We typically have a variety of job
types which require different core counts, from "-j8" software builds
to single-thread data-formatting runs, so Partitionable slots gives us
the most flexibility to divide up the resources appropriately with the
highest possible utilization.

If you set up cgroups, then the kernel will constrain a job in a slot
to only as many cores as it requested when there is competition for
CPU time, using the "cpu.shares" cgroup attribute. It will still run
in multiple threads, but the kernel scheduler will only give it time
slices adding up to that number of cores.

Other than this, HTCondor doesn't govern how many threads a process
can spawn. This can pose a problem with things like MATLAB, which
check to see the total number of cores on the entire system and spawn
that many threads, which leads to the square of the number of cores
worth of threads - 4,096 on a 64-core machine, all fighting for their
1.56% of a CPU. The MATLAB maxNumCompThreads() function comes in
handy here.

It's also possible to set up static slots with more than one CPU,
so if all your work uses two cores, you can define your static slots
with two cores each. Any one-core jobs would waste a core while
running, though.

If by multiple threads you're talking about MPI jobs, you'll want to
read up on the parallel universe. There's a few tricks to that which
are set forth in the manual to pick up multiple slots on one or more
machines for a single job. It's not quite as pretty with
partitionable slots, I found, particularly in v8.2.