[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] mpi and dedicated scheduler configuration

On Tue, Jun 22, 2004 at 08:18:53PM -0700, Mike Busch wrote:
> --- Erik Paulson <epaulson@xxxxxxxxxxx> wrote:
> > It's not likely to work. The current MPI implementation is pretty
> > heart-set
> > on using MPICH 1.2.2, 3 or 4. 
> > 
> I'm a little confused here, or perhaps I need to read the docs a little
> more.  My pool manager is a Debian Linux box and the pool are all
> Windows 2000 boxes.  On the pool I have NT-MPICH 1.2.0 running fine --
> the Linux box does not participate in computation. Now I want to use
> Condor to start manage jobs on the pool.  

Currently, you can't cross-submit MPI jobs - if you want to run MPI
jobs on Unix, they have to be submitted from Unix. (Similarly, NT MPI
jobs must come from NT). 

> Am I using the wrong version of MPICH?  Or can I start jobs in the
> vanilla universe and everything is happy?

With the vanilla universe, you won't be able to allocate multiple machines
in any sort of a group - you run the risk of a single node disappearing. 
With the MPI universe, the loss of a single node tells Condor to shut down
all of the other machines, since Condor assumes your MPI implementation 
has no fault tolerance.