[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] gang scheduling multiple CPUs on an SMP machine



Hello,

We have a user running a multi-threaded MPI application, i.e. each rank
itself is multi-threaded.  Our cluster consists of dual-Xeon SMP machines
and we set NUM_CPUS to 2 in Condor.

The problem is that the MPI application uses an Intel math library that
only allows a single process to use the library in a multi-threaded
manner.  However, Condor often allocates two processors on the same
machine to two ranks.  When threads from both ranks attempt to access the
library, the application fails.

I found several references to "gang-matching" being a potential feature
that could be added to Condor.  For example, "Condor on Dedicated
Clusters" from Condor Week 2000 contains a slide titled "Future
Directions: Parallel Scheduling" that states the following:

----------------------------
> Co-scheduling of multiple hosts
  * ...
  * Other jobs might require co-scheduling
    * A multi-threaded application might want to claim multiple CPUs on a
      single SMP machine
  * Requires "gang-matching"
----------------------------

A "gang-matching" capability as described in this slide would solve our
problem, by allocating both nodes on a processor to a single rank.
However, I cannot find any mention of it in the Condor documentation.  Has
gang-matching been added to Condor or is there a way to obtain similar
behavior?  Thanks.

Hahn