[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] mpi and dedicated scheduler configuration
- Date: Wed, 23 Jun 2004 11:56:12 -0500
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-users] mpi and dedicated scheduler configuration
On Tue, Jun 22, 2004 at 08:18:53PM -0700, Mike Busch wrote:
> --- Erik Paulson <epaulson@xxxxxxxxxxx> wrote:
> > It's not likely to work. The current MPI implementation is pretty
> > heart-set
> > on using MPICH 1.2.2, 3 or 4.
> I'm a little confused here, or perhaps I need to read the docs a little
> more. My pool manager is a Debian Linux box and the pool are all
> Windows 2000 boxes. On the pool I have NT-MPICH 1.2.0 running fine --
> the Linux box does not participate in computation. Now I want to use
> Condor to start manage jobs on the pool.
Currently, you can't cross-submit MPI jobs - if you want to run MPI
jobs on Unix, they have to be submitted from Unix. (Similarly, NT MPI
jobs must come from NT).
> Am I using the wrong version of MPICH? Or can I start jobs in the
> vanilla universe and everything is happy?
With the vanilla universe, you won't be able to allocate multiple machines
in any sort of a group - you run the risk of a single node disappearing.
With the MPI universe, the loss of a single node tells Condor to shut down
all of the other machines, since Condor assumes your MPI implementation
has no fault tolerance.