[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Condor and MPI jobs



Hi, Im trying to run a job with MPI and Condor... I have my .submit file like this:

universe        = vanilla
requirements    = Activity == "Idle"
executable      = LIME-443-001.sh
output          = LIME-443-001.sh.out
error           = LIME-443-001.sh.err
log             = LIME-443-001.sh.log
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
queue

In this example, the LIME-443-001.sh have the content:

#!/bin/sh
export OMP_NUM_THREADS=1
export LD_LIBRARY_PATH=:/usr/lib64/mpi/gcc/openmpi/lib64
/usr/lib64/mpi/gcc/openmpi/bin/mpirun -np 2 /opt/espresso-mpi/bin/pw.x < /home/aryjr/SUPERFICIES/LIME/LIME-443-001.pw.inp > /home/aryjr/SUPERFICIES/LIME/LIME-443-001.pw.out

If I don't use Condor and execute the .sh file like "sh LIME-443-001.sh", all works fine... However, if I try to run "condor_submit LIME-443-001.submit" I get the error on LIME-443-001.sh.err file:

[xeonquad01:22365] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/orte_init_stage1.c at line 312
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_pls_base_select failed
  --> Returned value -1 instead of ORTE_SUCCESS

--------------------------------------------------------------------------
[xeonquad01:22365] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/orte_system_init.c at line 42
[xeonquad01:22365] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 52
--------------------------------------------------------------------------
Open RTE was unable to initialize properly.  The error occured while
attempting to orte_init().  Returned value -1 instead of ORTE_SUCCESS.
--------------------------------------------------------------------------

Anybody can help me?

Thanks very much!!!

Ary Juniort