[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Not running Parallel-universe jobs?



Thanks!

How would I go about finding machines that are in this state, though?
For example (we only have one dedicated scheduler):

"""
$ condor_status -constraint 'DedicatedScheduler =!= "DedicatedScheduler@<hostname>"'
$
"""

The command produces no output; I assume that means no machines are
found.  (If I change that to "==", or I change the string to something
that's not our scheduler, then it prints out all machines in the
cluster.)

Adam


On Tue, 2015-08-25 at 11:20 -0400, Michael V Pelletier wrote:
> The -analyze and -better-analyze options still show that machines
> which don't have the DedicatedScheduler attribute set as "available"
> to run a parallel universe job: 
> 
> 008.000:  Run analysis summary.  Of 6 machines, 
>       0 are rejected by your job's requirements 
>       0 reject your job because of their own requirements 
>       0 match and are already running your jobs 
>       0 match but are serving other users 
>       6 are available to run your job 
> 
> This is what shows up when I submit a 3-machine_count parallel job to
> a static-slot pool which only has two slots with DedicatedScheduler
> set. If you set a job requirement of ( !
> isUndefined(DedicatedScheduler) ), or some sort of more sophisticated
> expression to match the dedicated scheduler to which the job was
> submitted, then the analyze will show you a clearer picture: 
> 
> 009.000:  Run analysis summary.  Of 6 machines, 
>       4 are rejected by your job's requirements 
>       0 reject your job because of their own requirements 
>       0 match and are already running your jobs 
>       0 match but are serving other users 
>       2 are available to run your job 
> 
> (Feature request for 8.2.10?) 
> 
> Check section 2.9.2 in the 8.2.9 manual for more details about the
> DedicatedScheduler attribute. A parallel job will only run on a slot
> with the DedicatedScheduler attribute - maybe some of the other
> machines lost that in the wake of your recent disruption if you're
> expecting the job to run on the six available machines. 
> 
> As to the current job you're waiting on, once those 41 slots which are
> running open up, then your job will be dispatched. 
> 
> 
>   
> 
> 
> 
> Michael V. Pelletier
> IT Program Execution
> Principal Engineer
> 978.858.9681 (5-9681) NOTE NEW
> NUMBER
> 339.293.9149 cell
> 339.645.8614 fax
> michael.v.pelletier@xxxxxxxxxxxx
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/