[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor_q -better-analyze has a strange condition....



On 07/12/2010 06:59 PM, Rob wrote:
> 
> Hi,
> 
> I have a job, which does not start on the condor pool.
> When I do a 'condor_q -better-analyze', I see conditions that
> I can't trace where they come from:
> 
> ====================================================
> 109.000:  Run analysis summary.  Of 3 machines,
>       1 are rejected by your job's requirements
>       2 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 match but are currently offline
>       0 are available to run your job
>     No successful match recorded.
>     Last failed match: Tue Jul 13 10:50:19 2010
>     Reason for last match failure: no match found
> 
> The Requirements expression for your job is:
> 
> ( ( target.Arch == "INTEL" ) && ( target.OpSys == "WINNT51" ) &&
> ( target.Name != "skku-pc" ) ) && ( target.Disk >= DiskUsage ) &&
> ( ( ( target.Memory * 1024 ) >= ImageSize ) &&
> ( ( RequestMemory * 1024 ) >= ImageSize ) ) && ( target.HasFileTransfer )
> 
>     Condition                         Machines Matched    Suggestion
>     ---------                         ----------------    ----------
> 1   ( ( ( 1024 * target.Memory ) >= 27500 ) && ( ( 1024 * 
> ceiling(ifThenElse(JobVMMemory isnt 
> undefined,JobVMMemory,2.685546875000000E+01)) ) >= 27500 ) )
>                                       0                   REMOVE
> 2   ( target.Name != "skku-pc" )      2                    
> 3   ( target.Arch == "INTEL" )        3                    
> 4   ( target.OpSys == "WINNT51" )     3                    
> 5   ( target.Disk >= 27500 )          3                    
> 6   ( target.HasFileTransfer )        3                    
> 
> ====================================================
> 
> Conditions no 2 to 6 I understand; they are in my condor submit file.
> However, where does Condition no. 1 come from?
> I searched:
>    * condor_config and rcondor_config.local on the master, but to no avail.
>    * the output of 'condor_status -long', but also nothing there.
> 
> Is this condition some hard-coded rule in the software?
> 
> I'd like to understand this issue.
> 
> Thank you!
> 
> Rob.
> 

Two components to that expression:
 o (target.Memory * 1024) >= ImageSize
  . says, "Don't run on a slot that has less memory than my ImageSize"
 o (RequestMemory * 1024) >= ImageSize
  . says, useful w/ partitionable slots, "Don't request creation of a dynamic slot that is smaller than my ImageSize"

Both are useful because slots will often reject jobs those ImageSize is larger than their Memory.

ImageSize is the largest VM size Condor has ever seen for the job.

This often happens when your job's ImageSize blows up during a run and now doesn't fit in any slots anymore. You can qedit the ImageSize down.

Best,


matt