[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPICH2 question




Hello Antoni,

On Thu, May 13, 2010 at 03:20:43PM +0200, antoni artigues wrote:
> Hello
> 
> Sorry, but I have another question again.
> 
> Here is my problem:
> 
> I have two machines A and B. Machine A have 4 cpu's and machine B have 2
> cpu's.
> 
> I want to launch a MPI(MPICH2) job that needs 6 processes. But I can't
> do it with Condor.

If you launch it in a parallel universe all free slots should be
assigned to this MPI job. The crucial thing is to prepare a machines list for 
the MPI-universe. If one node provides two slots it should appear twice in this list.

We are using OpenMPI but I can't see a reason why it shouldn't work with MPICH.


> Finally a single slot is responsible for starting the MPI job.
> ------------CONFIGURATION 1----------------
> NUM_SLOTS = 1 and NUM_CPUS= 4 for A
> NUM_SLOTS = 1 and NUM_CPUS= 2 for B
> 
> in the job definition I put:
> machine_count = 2
> Because there are two machines on the cluster. But, how can I specify
> that I want 6 processes for the mpi? Is there any configuration
> parameter on the job definition?
> 
> -----------CONFIGURATION 2-----------------
> NUM_SLOTS = 4 and NUM_CPUS= 4 for A
> NUM_SLOTS = 2 and NUM_CPUS= 2 for B
> 
> in the job definition I put:
> machine_count = 6
> 
> But the mpi execution fails, because Condor tries to start more than one
> mpd on the same machine. Because the mp2script starts a mpd process for
> each node.

machine_count = 6 is correct. If only one mpd process can run per node, than MPICH2 is not the
right candidate. Can you run 6 process on two nodes without using condor?

Cheers,
Henning