Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Not running Parallel-universe jobs?

Date: Tue, 25 Aug 2015 11:20:38 -0400
From: Michael V Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
Subject: Re: [HTCondor-users] Not running Parallel-universe jobs?

The -analyze and -better-analyze options still show that machines which don't have the DedicatedScheduler attribute set as "available" to run a parallel universe job:

008.000: Run analysis summary. Of 6 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
6 are available to run your job

This is what shows up when I submit a 3-machine_count parallel job to a static-slot pool which only has two slots with DedicatedScheduler set. If you set a job requirement of ( ! isUndefined(DedicatedScheduler) ), or some sort of more sophisticated _expression_ to match the dedicated scheduler to which the job was submitted, then the analyze will show you a clearer picture:

009.000: Run analysis summary. Of 6 machines,
4 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
2 are available to run your job

(Feature request for 8.2.10?)

Check section 2.9.2 in the 8.2.9 manual for more details about the DedicatedScheduler attribute. A parallel job will only run on a slot with the DedicatedScheduler attribute - maybe some of the other machines lost that in the wake of your recent disruption if you're expecting the job to run on the six available machines.

As to the current job you're waiting on, once those 41 slots which are running open up, then your job will be dispatched.


	Michael V. Pelletier IT Program Execution Principal Engineer 978.858.9681 (5-9681) NOTE NEW NUMBER 339.293.9149 cell 339.645.8614 fax michael.v.pelletier@xxxxxxxxxxxx

Follow-Ups:
- Re: [HTCondor-users] Not running Parallel-universe jobs?
  - From: Seering, Adam

References:
- [HTCondor-users] Not running Parallel-universe jobs?
  - From: Seering, Adam

Prev by Date: Re: [HTCondor-users] cgroup error
Next by Date: [HTCondor-users] Fast shutdown happens frequently on one node
Previous by thread: [HTCondor-users] Not running Parallel-universe jobs?
Next by thread: Re: [HTCondor-users] Not running Parallel-universe jobs?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [HTCondor-users] Not running Parallel-universe jobs?