[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPICH2 question



Hi Antoni,

I'm not a Condor expert, but I can tell what we do here.

We use Intel MPI. It's not MPICH2, but I think the mpd interface is the
same.

We have SMP machines (8 cores). Each core is a Condor slot (8 slots per
machine).

As described in Intel MPI manual, you can specify many mpd instances in
one host if you use mpirun, not mpiexec.

I've included a snippet of our mpi-wrapper in the end of this message.

Using this approach, you have to set the number of MPI process in
machine_count.

MPI will be smart enough to use memory transfer between processes in the
same machine and sockets (or RDMA) to different ones.

Let me know if you need more information.

Regards,

Gabriel

---mpi-wrapper------


if [ $_CONDOR_PROCNO -eq 0 ]
then
        #Get the list of slots
        SLOTS=$($(condor_config_val libexec)/condor_chirp get_job_attr
AllRemoteHosts)
        MACHINE_FILE="${_CONDOR_SCRATCH_DIR}/hosts"
        echo $SLOTS |  sed -e 's/\"\(.*\)\".*/\1/' -e 's/,/\n/g' -e
's/slot.\@//g' >> ${MACHINE_FILE}

         mpirun -r ssh -f $MACHINE_FILE -machinefile $MACHINE_FILE -np
$_CONDOR_NPROCS $EXECUTABLE $@

        rm $MACHINE_FILE        

else
        sleep 10
fi


On Thu, 2010-05-13 at 15:20 +0200, antoni artigues wrote:
> Hello
> 
> Sorry, but I have another question again.
> 
> Here is my problem:
> 
> I have two machines A and B. Machine A have 4 cpu's and machine B have 2
> cpu's.
> 
> I want to launch a MPI(MPICH2) job that needs 6 processes. But I can't
> do it with Condor.
> 
> I'm not sure, but I think, in Condor, with MPI you only can have one
> slot per machine. And the maximum number for machine_count is the number
> of the cluster machines. So, In my case I only can launch the MPI job
> with 2 processes. Is that true?
> 
> This are my experiments:
> 
> ------------CONFIGURATION 1----------------
> NUM_SLOTS = 1 and NUM_CPUS= 4 for A
> NUM_SLOTS = 1 and NUM_CPUS= 2 for B
> 
> in the job definition I put:
> machine_count = 2
> Because there are two machines on the cluster. But, how can I specify
> that I want 6 processes for the mpi? Is there any configuration
> parameter on the job definition?
> 
> -----------CONFIGURATION 2-----------------
> NUM_SLOTS = 4 and NUM_CPUS= 4 for A
> NUM_SLOTS = 2 and NUM_CPUS= 2 for B
> 
> in the job definition I put:
> machine_count = 6
> 
> But the mpi execution fails, because Condor tries to start more than one
> mpd on the same machine. Because the mp2script starts a mpd process for
> each node.
> 
> Thanks in advance
> 
> Regards
> 
> Antoni Artigues
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/