[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor + gpudirect + openmpi



Hello,

to be clear, I'm talking about openmpi not openmp, but I assume
they also can be mixed at least each mpi program could multithreaded or even 
start multi processes.

The file to change is openmpiscript which can be found in 
$(RELEASE_DIR)/etc/examples/
(on debian /usr/share/doc/condor/etc/examples/).

It starts one sshd per machine_count which means in general per slot.
The reservation of the resources was done by requesting cpus and gpus.
The modified script just informs mpirun that more mpi programs should be 
started on each slot, in our case its the number of gpus requested.
Therefore the -n argument has to be multiplied by RequestGpus and each
line in the machines file has to occur as often.
Alternativly one can also use the -npernode argument of mpirun, I think (not 
tried yet).

Fore pure cpu jobs one have to distinguish between cpus (or mpi programs per 
node) and threads. Is there a 
standard solution? Or could one set OMP_NUM_THREADS and Request_cpus 
differently?

Best regards
Harald

On Friday 15 September 2017 06:31:54 Malathi Deenadayalan wrote:
> Hello all,
> 
> Can you tell me how this works and in which file we have to edit this.
> 
> Regards,
> Malathi
> 
> ----- Original Message -----
> From: "Harald van Pee" <pee@xxxxxxxxxxxxxxxxx>
> To: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
> Sent: Tuesday, September 12, 2017 8:31:03 PM
> Subject: Re: [HTCondor-users] htcondor + gpudirect + openmpi
> 
> Hello all,
> 
> I think I have now a working version for all cases,
> CONDOR_CHIRP=`condor_config_val libexec`
> CONDOR_CHIRP=$CONDOR_CHIRP/condor_chirp
> ncpus=`$CONDOR_CHIRP get_job_attr RequestCpus`
> ngpus=`$CONDOR_CHIRP get_job_attr RequestGpus`
> ...
> sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $2}' > machines
> #sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $1}' > machines
> for(( i=1 ; i <$ngpus ; i++)) ; do
>     echo i= $i
>     sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $2}' >> machines
> #    sort -n -k 1 < $CONDOR_CONTACT_FILE | awk '{print $1}' >> machines
> done;
> ...
> nmpinodes=$(( $ngpus * $_CONDOR_NPROCS))
> ...
> mpirun -v  --prefix $MPDIR --mca $mca_ssh_agent $CONDOR_SSH -n $nmpinodes 
> - hostfile machines $EXECUTABLE $@ &
> 
> but
> 
> I have to use the old condor_ssh version from htcondor 8.4 which uses
> hostnames not proc numbers (indeed I just changed back these parts).
> If I do not use hostnames, it could hapen, that if a
> request_cpus=1/request_gpus=1 job lands several times on one machine, there
> is an sshd running and mpirun starts all jobs on that machine and ignores
> completly all others.
> Therfore I think we need hosts in the machine file, because mpirun can not
> handle procnumbers.
> 
> Why was it changed? Any other pitfalls?
> 
> Best
> Harald
> 
> On Monday 11 September 2017 23:10:09 Harald van Pee wrote:
> > On Monday 11 September 2017 22:10:18 Michael Pelletier wrote:
> > > I've been using the job ad file for non-dynamic values. For dynamic
> > > stuff you could use condor_chirp get_job_attr.
> > > 
> > >      condor_q -jobads $_CONDOR_JOB_AD -autoformat RequestCpus
> > 
> > Thanks!
> > 
> > > -Michael Pelletier.
> > > 
> > > > -----Original Message-----
> > > > From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On
> > > > Behalf Of Harald van Pee
> > > > Sent: Monday, September 11, 2017 3:45 PM
> > > > To: htcondor-users@xxxxxxxxxxx
> > > > Subject: Re: [HTCondor-users] htcondor + gpudirect + openmpi
> > > > 
> > > > Hi Jason,
> > > > 
> > > > I think I have done something wrong, or its just working since I have
> > > > installed Mellanox OFED 4.1.
> > > > 
> > > > Up to now I have tested it only with openmpi-2.0.2a1 but at least for
> > > > this version its working if I make a loop over the requested gpus to
> > > > just get more lines in the machine file and for the -n argument I
> > > > have to multiply $_CONDOR_NPROCS with the requested gpus.
> > > > 
> > > > How I get the number of requested gpus in the script?
> > > > At the moment I would parse
> > > > _CONDOR_AssignedGPUs and count them.
> > > > 
> > > > OMP_NUM_THREADS can be used to get the request_cpus value but in
> > > > general this is not the number of mpinodes per node.
> > > > 
> > > > Up to now I just test the cpus but I hope I can start with real gpu
> > > > jobs soon.
> > >