[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Not running Parallel-universe jobs?



The -analyze and -better-analyze options still show that machines which don't have the DedicatedScheduler attribute set as "available" to run a parallel universe job:

008.000:  Run analysis summary.  Of 6 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      6 are available to run your job

This is what shows up when I submit a 3-machine_count parallel job to a static-slot pool which only has two slots with DedicatedScheduler set. If you set a job requirement of ( ! isUndefined(DedicatedScheduler) ), or some sort of more sophisticated _expression_ to match the dedicated scheduler to which the job was submitted, then the analyze will show you a clearer picture:

009.000:  Run analysis summary.  Of 6 machines,
      4 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      2 are available to run your job

(Feature request for 8.2.10?)

Check section 2.9.2 in the 8.2.9 manual for more details about the DedicatedScheduler attribute. A parallel job will only run on a slot with the DedicatedScheduler attribute - maybe some of the other machines lost that in the wake of your recent disruption if you're expecting the job to run on the six available machines.

As to the current job you're waiting on, once those 41 slots which are running open up, then your job will be dispatched.

 

Michael V. Pelletier
IT Program Execution
Principal Engineer
978.858.9681 (5-9681) NOTE NEW NUMBER
339.293.9149 cell
339.645.8614 fax

michael.v.pelletier@xxxxxxxxxxxx