[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor parallel universe RLIMIT problem



I have the parallel universe all setup and am able to run MPI jobs but when running through condor I get errors with RLIMIT_MEMLOCK=32768 when my /etc/security/limits.conf is supposed to have set memlock to unlimited.    When I run the job from mpirun on the commandline everything works and the infiniband sets up and works correctly, however I get this to stderr when run through condor.  What I suspect is that somehow condor is altering the MEMLOCK limit....


-- libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
libibverbs: Warning: RLIMIT_MEMLOCK is 32768 bytes.
    This will severely limit memory registrations.
--------------------------------------------------------------------------
The OpenIB BTL failed to initialize while trying to allocate some
locked memory.  This typically can indicate that the memlock limits
are set too low.  For most HPC installations, the memlock limits
should be set to "unlimited".  The failure occured here:

    Host:          hal9001.cs.uiuc.edu
    OMPI source:   btl_openib.c:830
    Function:      ibv_create_cq()
    Device:        mlx4_0
    Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
The OpenIB BTL failed to initialize while trying to allocate some
locked memory.  This typically can indicate that the memlock limits
are set too low.  For most HPC installations, the memlock limits
should be set to "unlimited".  The failure occured here:

    Host:          hal9001.cs.uiuc.edu
    OMPI source:   btl_openib.c:830
    Function:      ibv_create_cq()
    Device:        mlx4_0
    Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
The OpenIB BTL failed to initialize while trying to allocate some
locked memory.  This typically can indicate that the memlock limits
are set too low.  For most HPC installations, the memlock limits
should be set to "unlimited".  The failure occured here:

    Host:          hal9001.cs.uiuc.edu
    OMPI source:   btl_openib.c:830
    Function:      ibv_create_cq()
    Device:        mlx4_0
    Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
mpirun noticed that job rank 0 with PID 17464 on node hal9001.cs.uiuc.edu e
xited on signal 15 (Terminated).
/home/staff/danders5/mpitest/ompiscript: line 86: wait: pid 17461 is not a
child of this shell

David Anderson
mr.anderson.neo@xxxxxxxxxxxxxxxx