[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] -better-analyze doesn't tell me details (7.0.4)



On Thu, Nov 06, 2008 at 08:54:44AM -0600, Steven Timm wrote:
> At one time the Condor staff told me that you will
> only get the list of requirements that your job has got,
> if you have a non-zero value of machines that are rejected
> by your jobs requirements.
> 
> The ways to get at the "reject the job for unknown reasons"
> are to do condor_q -ana -l 227322

This means I have to ask for the reasons for the whole job cluster -?

> That will tell you the last machine that rejected your match, and why.
> NegotiatorLog can sometimes tell you something too if you are running
> at high enough debug.

Actually, I set a Memory requirement that is only fulfilled by a couple of
machines (which are in single-slot configuration, as opposed to the two-core
boxes which offer two slots).

I get 

227324.028:  Run analysis summary.  Of 1200 machines,
   1174 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
     26 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job

The Requirements expression for your job is:

( ( target.Memory > 1500 ) ) && ( target.Arch == "X86_64" ) &&
( target.OpSys == "LINUX" ) && ( target.Disk >= DiskUsage ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )

    Condition                         Machines Matched    Suggestion
    ---------                         ----------------    ----------
1   ( ( target.Memory > 1500 ) )      26
2   ( target.Arch == "X86_64" )       1200
3   ( target.OpSys == "LINUX" )       1200
4   ( target.Disk >= 2500 )           1200
5   ( TARGET.FileSystemDomain == "$domain" )
                                      1200
slot2@node599 Failed request constraint

where node599 is one of the 2-slot ones :(
What would be the order of matching machines? (There are a few beyond 600,
in particular the single slot ones which are Unclaimed and Idle.)

> Two "unknown reasons" I've hit before are (a) the negotiation cycle
> just hasn't happened yet since this job was submitted and (b)
> the user in question has exceeded his group quota.

(a): I have waited for hours, and other jobs got scheduled
(b): group quota aren't in use

Some more ideas?

Steffen

-- 
Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M�erg 1, D-14476 Potsdam
Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/
* e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298}
No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html