[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] MPI jobs in the vanilla universe
- Date: Wed, 27 Oct 2004 08:24:14 -0700
- From: "David E. Konerding" <dekonerding@xxxxxxx>
- Subject: Re: [Condor-users] MPI jobs in the vanilla universe
Erik Paulson wrote:
On Tue, Oct 26, 2004 at 03:30:02PM -0700, David E. Konerding wrote:From the manual:
I am interested in running an MPI job on my cluster (which is already
running Condor 6.6.6), but within the vanilla
universe (there are some restrictions to the MPI universe setup which we
cannot abide by).
The vanilla universe has all of the same restrictions as the MPI universe
(they're nearly identical code-bases) - what is giving you trouble?
> Administratively, Condor must be congured such that resources
(machines) running MPI jobs are
Not sure what that means, but it sounds to me like we would have to
statically configure nodes to run
MPI jobs, which would be fully exclusive of vanilla jobs (following the
docs from the user MPI section, 2.10 to the admion MPI section, 3.10.10,
shows that you have to set up a dedicated scheduler that manages dedicated
resources). We've always used the pool as a combination of MPI and
single process jobs, so this is undesireable.
> This leads to a further restriction that jobs submitted to execute
under the MPI
> universe (with dedicated machines) must be submitted from the machine
running as the dedicated
We would normally be starting these jobs from a laptop, far away from
pool. That laptop is running Windows, the pool is running Linux. So
this is a constraint we cannot satisfy; we don't want to have to ssh
into the pool to start the job.
We've solved the ssh problem by running the MPICH mpd daemon instead of
the regular mpirun job startup
In the past, I've used Sun Grid Engine and PBS; I submitted a job asking
for "N nodes"; the batch queueing system would basically wait until N
nodes were free. When the job ran, the batch system would start my job
on the "first" machine of the N, and provide me with a file listing all
the nodes ($PBS_NODEFILE is an env var pointing to the file). At that
point, I could run mpirun with the machines file being the list of
nodes. The batch system would properly manage the nodes, in that they
would be marked as being used, rather than schedule more jobs there.
The reason we don't do it this way is that there's no way for the batch
system to clean up - mpirun just fires off ssh or rsh. No cleanup of
the execute environment, no cleanup of errant processes, you have to
setup ssh keys for all of the users beforehand... we went with a more
mechanism. This deals with cleanup of errant processes, ssh keys are not
required. Another approach is to use condor itself as the job launch
mechanism; this is analogous to the PBS mpiexec feature, which uses the
PBS multi-node job launch mechanism to start all the MPI processes on
You can, however, write a perl program that submits vanilla jobs to Condor andThat's an interesting approach. I'll give it some consideration.
watches the userlog to see when they start running, and where they start
running, and then runs mpirun on those machines. It can keep watching and
see if any of the nodes gets evicted, and it can tear down the rest of the
MPI job. We call a program like this a "coordinator". If you sbmit your
coordinator program as a Condor job under the "scheduler" universe, it
will start running right away on your submit node - DAGMan does exactly this.
(In fact, this is why we call it the "scheduler" universe - it's meant to be
for jobs that schedule other jobs)