[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPICH2 question



Ok. Thank you very much for the answer.

That's what we need it.

Regards

Antoni Artigues
El jue, 13-05-2010 a las 11:19 -0300, Gabriel A. von Winckler escribió:
> Hi Antoni,
> 
> I'm not a Condor expert, but I can tell what we do here.
> 
> We use Intel MPI. It's not MPICH2, but I think the mpd interface is the
> same.
> 
> We have SMP machines (8 cores). Each core is a Condor slot (8 slots per
> machine).
> 
> As described in Intel MPI manual, you can specify many mpd instances in
> one host if you use mpirun, not mpiexec.
> 
> I've included a snippet of our mpi-wrapper in the end of this message.
> 
> Using this approach, you have to set the number of MPI process in
> machine_count.
> 
> MPI will be smart enough to use memory transfer between processes in the
> same machine and sockets (or RDMA) to different ones.
> 
> Let me know if you need more information.
> 
> Regards,
> 
> Gabriel
> 
> ---mpi-wrapper------
> 
> 
> if [ $_CONDOR_PROCNO -eq 0 ]
> then
>         #Get the list of slots
>         SLOTS=$($(condor_config_val libexec)/condor_chirp get_job_attr
> AllRemoteHosts)
>         MACHINE_FILE="${_CONDOR_SCRATCH_DIR}/hosts"
>         echo $SLOTS |  sed -e 's/\"\(.*\)\".*/\1/' -e 's/,/\n/g' -e
> 's/slot.\@//g' >> ${MACHINE_FILE}
> 
>          mpirun -r ssh -f $MACHINE_FILE -machinefile $MACHINE_FILE -np
> $_CONDOR_NPROCS $EXECUTABLE $@
> 
>         rm $MACHINE_FILE        
> 
> else
>         sleep 10
> fi
> 
> 
> On Thu, 2010-05-13 at 15:20 +0200, antoni artigues wrote:
> > Hello
> > 
> > Sorry, but I have another question again.
> > 
> > Here is my problem:
> > 
> > I have two machines A and B. Machine A have 4 cpu's and machine B have 2
> > cpu's.
> > 
> > I want to launch a MPI(MPICH2) job that needs 6 processes. But I can't
> > do it with Condor.
> > 
> > I'm not sure, but I think, in Condor, with MPI you only can have one
> > slot per machine. And the maximum number for machine_count is the number
> > of the cluster machines. So, In my case I only can launch the MPI job
> > with 2 processes. Is that true?
> > 
> > This are my experiments:
> > 
> > ------------CONFIGURATION 1----------------
> > NUM_SLOTS = 1 and NUM_CPUS= 4 for A
> > NUM_SLOTS = 1 and NUM_CPUS= 2 for B
> > 
> > in the job definition I put:
> > machine_count = 2
> > Because there are two machines on the cluster. But, how can I specify
> > that I want 6 processes for the mpi? Is there any configuration
> > parameter on the job definition?
> > 
> > -----------CONFIGURATION 2-----------------
> > NUM_SLOTS = 4 and NUM_CPUS= 4 for A
> > NUM_SLOTS = 2 and NUM_CPUS= 2 for B
> > 
> > in the job definition I put:
> > machine_count = 6
> > 
> > But the mpi execution fails, because Condor tries to start more than one
> > mpd on the same machine. Because the mp2script starts a mpd process for
> > each node.
> > 
> > Thanks in advance
> > 
> > Regards
> > 
> > Antoni Artigues
> > 
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> 
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/