[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] mpi job stuck as idle



Hi,
May I ask why a simple mpihello stuck in the idle state? Te ht script and the outputs are shown below:


[mahmood@rocks7 ~]$ cat mpi.ht
universe = parallel
executable = /opt/openmpi/bin/mpirun
arguments = ./hellompi
log = hellompi.log
output = hellompi.out
error = hellompi.err
machine_count = 2
queue
[mahmood@rocks7 ~]$ condor_q


-- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?... @ 01/17/18 02:45:50
OWNER   BATCH_NAME                      SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
mahmood CMD: /opt/openmpi/bin/mpirun   1/17 02:41      _      _      1      1 4.0

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
[mahmood@rocks7 ~]$ condor_q -analyze


-- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?...

004.000:  Job has not yet been considered by the matchmaker.


004.000:  Run analysis summary ignoring user priority.  Of 2 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      2 are available to run your job
[mahmood@rocks7 ~]$ ls -l mpihello.*
-rw-rw-r-- 1 mahmood mahmood 833 Jan 16 12:48 mpihello.c
[mahmood@rocks7 ~]$ ls -l hello*
-rw-rw-r-- 1 mahmood mahmood   0 Jan 17 02:41 hellompi.err
-rw-rw-r-- 1 mahmood mahmood 134 Jan 17 02:41 hellompi.log
-rw-rw-r-- 1 mahmood mahmood   0 Jan 17 02:41 hellompi.out
[mahmood@rocks7 ~]$ cat hellompi.log
000 (004.000.000) 01/17 02:41:30 Job submitted from host: <10.0.3.15:9618?addrs=10.0.3.15-9618+[--1]-9618&noUDP&sock=2329_79d6_3>
...
[mahmood@rocks7 ~]$ rocks list host
HOST         MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
rocks7:      Frontend   2    0    0    os        install      
compute-0-0: Compute    2    0    0    os        install      
[mahmood@rocks7 ~]$





Regards,
Mahmood