[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] openmpi jobs with condor [on debian] (using infiniband and new options in htcondor 8.6.1)



On Thursday 16 March 2017 20:02:30 Jason Patton wrote:
> > export OMPI_MCA_btl_tcp_if_exclude="lo,eth0,eth1"

up to now we have not used this variable, tests are running with
it, I will report if there is any difference

Harald

> 
> Based on my reading of the Open MPI wiki, disabling network interfaces
> should be unnecessary if Open MPI detects that you're using InfiniBand or a
> similar HPC communication interface. I am curious, if you have available
> time and capacity to run a test, can you tell us what happens if you just
> comment out this line?
> 
> For reference, with this variable not defined on machines without
> InfiniBand (or similar), I found that the mpi proxy processes would stall
> at near 100% cpu usage. With the increased verbosity of the mpirun output,
> I condor_ssh_to_job to node 0 and saw in _condor_stdout that Open MPI was
> stuck looking for IPs on the wrong network interfaces.
> 
> Jason Patton
> 
> 
> On Thu, Mar 16, 2017 at 1:43 PM, Harald van Pee <pee@xxxxxxxxxxxxxxxxx>
> 
> wrote:
> > Hello all,
> > 
> > I copied the general part at the end of this mail that one have
> > everything together. Here what I found out to get openmpi running:
> > 
> > Howto openmpi with htcondor (using infiniband and new options in
> > openmpiscript
> > 8.6.1)
> > For debian 7 and debian 8 we need one additional line in
> > /etc/init.d/condor to
> > allow locking all available memory
> > (locked-in-memory address space unlimited):
> > 
> > ulimit -l unlimited
> > 
> > without this lines the values in /etc/security/limits.conf are ignored
> > and the default value of 64k is used.
> > 
> > Requirements:
> > - You have an infiniband aware openmpi installation
> > - You allow the users for locked-in-memory address space unlimited
> > 
> >   For this we need to set in /etc/security/limits.conf
> > 
> > *                soft    memlock         unlimited
> > *                hard    memlock         unlimited
> > - Set the mtt big enough.
> > Most likely this will be done well if you install the Mellanox OFED
> > package,
> > if you use the
> > kernel drivers see
> > 
> > https://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem
> > 
> > For debian we can only set one parameter and use in /etc/default/grub
> > GRUB_CMDLINE_LINUX_DEFAULT="mlx4_core.log_mtts_per_seg=7"
> > which should be good for 256GB memory.
> > 
> > ----
> > 
> > New in the openmpiscript of htcondor 8.6.1:
> > before mpirun is started the following environment variables are set:
> > # set MCA values for running on HTCondor
> > 
> >         export OMPI_MCA_plm_rsh_no_tree_spawn="true" # disable ssh tree
> > 
> > spawn
> > 
> >         export OMPI_MCA_btl_tcp_if_exclude="lo,$EXINT" # exclude network
> > 
> > interfaces
> > 
> >         # optionally set MCA values for increasing mpirun verbosity
> >         #export OMPI_MCA_plm_base_verbose=30
> >         #export OMPI_MCA_btl_base_verbose=30
> > 
> > because we still using htcondor 8.4.x we use now
> > 
> >         export OMPI_MCA_plm_rsh_no_tree_spawn="true" # disable ssh tree
> > 
> > spawn
> > 
> >         export OMPI_MCA_btl_tcp_if_exclude="lo,eth0,eth1" # exclude
> > 
> > network
> > interfaces
> > 
> >         export OMPI_MCA_plm_base_verbose=30
> >         export OMPI_MCA_btl_base_verbose=30
> > 
> > ----
> > 
> > Problems: We still see sometimes, that some mpi programs are still
> > running after condor_rm
> > even with our additional signal handler, mpirun and openmpiscript are
> > allways
> > stopped but not the running program.
> > 
> > 
> > -----
> > 
> > Howto openmpi with htcondor (general part)
> > 
> > We use htcondor 8.4.x with debian 7 and debian 8 and use openmpi 1.6.5
> > mostly
> > with debian 7,
> > with debian 8 we just tested a small openmpi example.
> > We use a common file system for all nodes, htcondor claims it does work
> > also
> > without
> > (but is this realy useful?).
> > Requirements:
> > - Setup your htcondor environment for parallel jobs (see manual section
> > 2.9)
> > - Running openmpi (test it on a single node or in the vanilla universe
> > [section 2.9.4])
> > - ssh client and server on each node.
> > 
> > In my understanding, htcondor just claims the needed slots, prepares and
> > start
> > the sshd on the running
> > nodes and than just start mpirun. This is done by the openmpiscript (see
> > section 2.9.3) and other scripts.
> > From htcondor 8.6.1 on, these scripts are improved and condor variables
> > can be
> > set which are used by
> > openmpiscript. In earlier versions one have to change the openmpiscript
> > directly.
> > 
> > What I have to do to get openmpi running?
> > Change the openmpiscript:
> > 1. the openmpiscript is a bash script, therefore make sure that bash not
> > sh was used
> > for example use
> > #!/bin/bash
> > not
> > #!/bin/sh
> > debian often use dash as system shell which is not fully bash compatible.
> > My suggestion is that condor use for all scripts bash explicitly, at
> > least if
> > they are
> > not fully bourne shell compatible and therfore need bash not sh.
> > Is there any system where bash could not be installed under /bin/bash?
> > 
> > 2. change MPDIR to the prefix dir of your openmpi
> > 
> > Take into account:
> > The scripts will run into problems if you add a path for your program (in
> > the
> > argument for openmpiscript).
> > Therefore put the program into your working directory and submit from
> > there.
> > 
> > 
> > Improvements:
> > We often have seen that after condor_rm the mpi processes are still
> > running but the parallel job was
> > removed from condor.
> > Following the philosophy of condor, that mpirun have to do the job, we
> > start
> > mpirun in background
> > and wait for this process. This allows us to install a signal handler
> > with trap, which send
> > a TERM signal to mpirun after the openmpiscript gets the TERM signal.
> > With this signal handler we never have seen the problem above. I do not
> > know
> > if and why the condor team
> > does think this was not necessary, but at least it works for us (in most
> > cases).
> > 
> > Best regards
> > Harald
> > 
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
> > a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
Nussallee 14-16 - 53115 Bonn - Tel +49-228-732213 - Fax +49-228-732505
mail: pee@xxxxxxxxxxxxxxxxx