[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI jobs in the vanilla universe



On Tue, Oct 26, 2004 at 03:30:02PM -0700, David E. Konerding wrote:
> Hi,
> 
> I am interested in running an MPI job on my cluster (which is already 
> running Condor 6.6.6), but within the vanilla
> universe (there are some restrictions to the MPI universe setup which we 
> cannot abide by).
> 

The vanilla universe has all of the same restrictions as the MPI universe 
(they're nearly identical code-bases) - what is giving you trouble?

> In the past, I've used Sun Grid Engine and PBS; I submitted a job asking 
> for "N nodes"; the batch queueing system would basically wait until N 
> nodes were free.  When the job ran, the batch system would start my job 
> on the "first" machine of the N, and provide me with a file listing all 
> the nodes ($PBS_NODEFILE is an env var pointing to the file).  At that 
> point, I could run mpirun with the machines file being the list of 
> nodes.  The batch system would properly manage the nodes, in that they 
> would be marked as being used, rather than schedule more jobs there.
> 

The reason we don't do it this way is that there's no way for the batch
system to clean up - mpirun just fires off ssh or rsh. No cleanup of
the execute environment, no cleanup of errant processes, you have to
setup ssh keys for all of the users beforehand... we went with a more
managed solution.

> I've checked the manual, and there doesn't seem to be an equivalent in 
> condor that I can find.  The 'machine_count' directive nt he condor job 
> file doesn't seem to apply to vanilla jobs, and there are no other ways 
> I can find to schedule a bunch of machines together.
> 
> Any suggestions.  Suggestions including obscure class ad hackery are 
> quite welcome.
> 

Condor won't do it for you without using the MPI universe. 

You can, however, write a perl program that submits vanilla jobs to Condor and
watches the userlog to see when they start running, and where they start
running, and then runs mpirun on those machines. It can keep watching and
see if any of the nodes gets evicted, and it can tear down the rest of the
MPI job. We call a program like this a "coordinator". If you sbmit your
coordinator program as a Condor job under the "scheduler" universe, it
will start running right away on your submit node - DAGMan does exactly this.
(In fact, this is why we call it the "scheduler" universe - it's meant to be
for jobs that schedule other jobs)

-Erik