I've put together a condor-MPI tutorial for the particular code that we
use here. It doesn't really involve much about the specifics of the
condor administrative end though... I've set it up here, so that MPI
jobs behave in the same way as single-processor jobs, which means that
they can get preempted, evicted etc.! The MPI "cluster" is actually a
set of 18 computer lab desktop machines, linked together with a
high-speed network. This was accomplished very easily by simply
including the following lines:
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
MPI_CONDOR_RSH_PATH = /net/condor/sbin
Note that the last line is not specified in the manual, and the first
line has regular quotes, not two single quotes like in the manual. Also
note that I didn't change any policies for these resources, all types
of job universes abide by the same rules.
I then had to recompile my code using mpich - the mpich download page
actually has a condor version.
That was it. I do have passwordless ssh logins configured, but I don't
even think you need that to run mpi in condor.
University of Washington
Department of Astronomy
On Feb 2, 2006, at 3:02 PM, rnayar@xxxxxxxx wrote:
> Hello everyone, for those of you who actually got MPI jobs up and
> running can
> you give us a greater insight as to how you accomplished this? Through
> installation of MPI, any type of configurations (i.e. passwordless ssh
> etc). I've noticed that alot of people seem to be running MPI apps
> perhaps we
> should throw together some sort of "How to MPI" thread.
> Condor-users mailing list
Condor-users mailing list