[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] mpi job stuck as idle



> 1- Is dedicated scheduler OK? From which part of the output, I  should find of that?

The links to the manual explain what the dedicated scheduler is, what
it does, and how to configure it.

To check to see if you have configured a DedicatedScheduler, you can
query your execute nodes using "condor_status":

condor_status -af:h Machine DedicatedScheduler

The value of DedicatedScheduler should be "DedicatedScheduler@<your
submit node hostname>" and it should be the same on all execute nodes.

> 2- Why the first say 2 idle jobs and the second command say 1 idle job?

Assuming you are running a recent version of condor, "condor_q" will
not show jobs from all users, but "condor_status -schedd" will show
totals from all users. Does the output of "condor_q -all" show more
jobs?

Jason

On Wed, Jan 17, 2018 at 11:06 AM, Mahmood Naderan <nt_mahmood@xxxxxxxxx> wrote:
> OK. Before any modification (section 3.14.8 in document), I ran
> "condor_status -schedd" and saw
>
>
> [mahmood@rocks7 ~]$ condor_status -schedd
> Name                     Machine                  RunningJobs   IdleJobs
> HeldJobs
>
> rocks7.vbtestcluster.com rocks7.vbtestcluster.com           0          2
> 0
>
>                       TotalRunningJobs      TotalIdleJobs      TotalHeldJobs
>
>
>                Total                 0                  2                  0
> [mahmood@rocks7 ~]$ condor_q
>
>
> -- Schedd: rocks7.vbtestcluster.com : <10.0.3.15:9618?... @ 01/17/18
> 11:54:39
> OWNER   BATCH_NAME                      SUBMITTED   DONE   RUN    IDLE
> TOTAL JOB_IDS
> mahmood CMD: /opt/openmpi/bin/mpirun   1/17 03:04      _      _      1
> 1 5.0
>
> 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
> [mahmood@rocks7 ~]$
>
>
>
>
> The questions are
> 1- Is dedicated scheduler OK? From which part of the output, I  should find
> of that?
> 2- Why the first say 2 idle jobs and the second command say 1 idle job?
>
>
> Regards,
> Mahmood
>
> On Wednesday, January 17, 2018, 9:20:37 AM EST, Jason Patton
> <jpatton@xxxxxxxxxxx> wrote:
>
>
> Mahmood,
>
> Is condor configured to use a DedicatedScheduler? See:
>
> https://research.cs.wisc.edu/htcondor/manual/current/2_9Parallel_Applications.html#SECTION00392000000000000000
>
> and
>
> https://research.cs.wisc.edu/htcondor/manual/current/3_14Setting_Up.html#SECTION004148000000000000000
>
> Jason Patton
>