[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] MPI jobs in the vanilla universe
- Date: Thu, 28 Oct 2004 18:00:28 -0500
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-users] MPI jobs in the vanilla universe
On Wed, Oct 27, 2004 at 08:24:14AM -0700, David E. Konerding wrote:
> Erik Paulson wrote:
> >On Tue, Oct 26, 2004 at 03:30:02PM -0700, David E. Konerding wrote:
> >>I am interested in running an MPI job on my cluster (which is already
> >>running Condor 6.6.6), but within the vanilla
> >>universe (there are some restrictions to the MPI universe setup which we
> >>cannot abide by).
> >The vanilla universe has all of the same restrictions as the MPI universe
> >(they're nearly identical code-bases) - what is giving you trouble?
> From the manual:
> > Administratively, Condor must be congured such that resources
> (machines) running MPI jobs are
> > dedicated.
> Not sure what that means, but it sounds to me like we would have to
> statically configure nodes to run
> MPI jobs, which would be fully exclusive of vanilla jobs (following the
> docs from the user MPI section, 2.10 to the admion MPI section, 3.10.10,
> shows that you have to set up a dedicated scheduler that manages dedicated
> resources). We've always used the pool as a combination of MPI and
> single process jobs, so this is undesireable.
No, that's not what we mean by dedicated - dedicated to us means "only run
Condor jobs - not desktops that will be interrupted by returning users".
MPI universe jobs are managed such that if any one processor is lost,
we abort the job on all processors - so it's a bad idea to have MPI
universe jobs run on machines that might be evicted (you could if you
_really_ wanted to, though)
See the Wright "Cheap cycles from the desktop to the dedicated cluster"
paper at http://www.cs.wisc.edu/condor/publications.html#scheduling -
the whole idea of Condor and MPI is that we can run vanilla jobs on a
"dedicated" MPI cluster.
> > This leads to a further restriction that jobs submitted to execute
> under the MPI
> > universe (with dedicated machines) must be submitted from the machine
> running as the dedicated
> > scheduler.
> We would normally be starting these jobs from a laptop, far away from
OK - one bad thing is that the submit machine needs to stay connected during
the lifetime of the job, so submitting from a laptop isn't always a good
> That laptop is running Windows, the pool is running Linux. So
> this is a constraint we cannot satisfy; we don't want to have to ssh
> into the pool to start the job.
This is also a problem - you cannot currently cross submit MPI jobs.
Windows MPI must be submitted from Windows, Unix MPI must be submitted from
Unix (you can cross submit between Unix platfroms - ie submit Linux jobs