[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] mpi job stuck as idle



Mahmood,

Is condor configured to use a DedicatedScheduler? See:

https://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html#SECTION00392000000000000000

and

https://research.cs.wisc.edu/htcondor/manual/current/3_14Setting_Up.html#SECTION004148000000000000000

Jason Patton

On Wed, Jan 17, 2018 at 1:48 AM, Mahmood Naderan <nt_mahmood@xxxxxxxxx> wrote:
> Hi,
> May I ask why a simple mpihello stuck in the idle state? Te ht script and
> the outputs are shown below:
>
>
> [mahmood@rocks7 ~]$ cat mpi.ht
> universe = parallel
> executable = /opt/openmpi/bin/mpirun
> arguments = ./hellompi
> log = hellompi.log
> output = hellompi.out
> error = hellompi.err
> machine_count = 2
> queue
> [mahmood@rocks7 ~]$ condor_q
>
>
> -- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?... @ 01/17/18
> 02:45:50
> OWNER   BATCH_NAME                      SUBMITTED   DONE   RUN    IDLE
> TOTAL JOB_IDS
> mahmood CMD: /opt/openmpi/bin/mpirun   1/17 02:41      _      _      1
> 1 4.0
>
> 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
> [mahmood@rocks7 ~]$ condor_q -analyze
>
>
> -- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?...
>
> 004.000:  Job has not yet been considered by the matchmaker.
>
>
> 004.000:  Run analysis summary ignoring user priority.  Of 2 machines,
>       0 are rejected by your job's requirements
>       0 reject your job because of their own requirements
>       0 match and are already running your jobs
>       0 match but are serving other users
>       2 are available to run your job
> [mahmood@rocks7 ~]$ ls -l mpihello.*
> -rw-rw-r-- 1 mahmood mahmood 833 Jan 16 12:48 mpihello.c
> [mahmood@rocks7 ~]$ ls -l hello*
> -rw-rw-r-- 1 mahmood mahmood   0 Jan 17 02:41 hellompi.err
> -rw-rw-r-- 1 mahmood mahmood 134 Jan 17 02:41 hellompi.log
> -rw-rw-r-- 1 mahmood mahmood   0 Jan 17 02:41 hellompi.out
> [mahmood@rocks7 ~]$ cat hellompi.log
> 000 (004.000.000) 01/17 02:41:30 Job submitted from host:
> <10.0.3.15:9618?addrs=10.0.3.15-9618+[--1]-9618&noUDP&sock=2329_79d6_3>
> ...
> [mahmood@rocks7 ~]$ rocks list host
> HOST         MEMBERSHIP CPUS RACK RANK RUNACTION INSTALLACTION
> rocks7:      Frontend   2    0    0    os        install
> compute-0-0: Compute    2    0    0    os        install
> [mahmood@rocks7 ~]$
>
>
>
>
>
> Regards,
> Mahmood
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/