[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Flocking - jobs matched but not started



On Fri, 2005-08-19 at 10:57 -0500, Michael Rusch wrote:
> For what it's worth, I have had what sounds like a similar problem for quite
> awhile, though it has been much harder for me to debug, since I don't have
> access to logs on the flocked-to pool.  Out of curiosity, when your jobs
> "match" but don't run, are they still listed as idle in the queue?
>
Yes, as a snippet of condor_q shows:

-- Submitter: ws-60-56.dhcp.plymouth.ac.uk : <141.163.60.56:44957> :
ws-60-56.dhcp.plymouth.ac.uk
 ID      OWNER    SUBMITTED     RUN_TIME ST PRI SIZE CMD
  48.0   john     8/18 18:05   0+00:17:26 I  0   1.6  loop.remote 200

>
> When you condor_q -analyze, are they shown as having machines that are available to
> run the job?  I'm trying to figure out if this is the same problem, in which
> case I may have 2 cents to put in...
> 
No I don't see that. condor -q shows:

==============================================================
[root@ws-60-56 log]# condor_q -analyze 48.0


-- Submitter: ws-60-56.dhcp.plymouth.ac.uk : <141.163.60.56:44957> :
ws-60-56.dhcp.plymouth.ac.uk
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
---
048.000:  Run analysis summary.  Of 0 machines,
      0 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match, but are serving users with a better priority in the pool
      0 match, match, but reject the job for unknown reasons
      0 match, but will not currently preempt their existing job
      0 are available to run your job
        Last successful match: Fri Aug 19 17:13:00 2005
        Last failed match: Fri Aug 19 17:15:00 2005
        Reason for last match failure: no match found

WARNING:  Be advised:
   No resources matched request's constraints
   Check the Requirements expression below:

Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && ((CkptArch ==
Arch) || (CkptArch =?= UNDEFINED)) && ((CkptOpSys == OpSys) ||
(CkptOpSys =?= UNDEFINED)) && (Disk >= DiskUsage) && ((Memory * 1024) >=
ImageSize)


WARNING:  Be advised:   Request 48.0 did not match any resource's
constraints
==============================================================

However, this just will be picked up on the remote server and matched
with a client in it's pool. So I think the 'condor_q -analyze' is a bit
misleading here as it seems to show a job which is having a problem
running. Having said that though, the condor_q command is looking at the
job and seeing if it can run locally (which it can't). In my case I have
stopped startd so it won't run but must be flocked.



John.

-- 
---------------------------------------------------------------
John Horne, University of Plymouth, UK  Tel: +44 (0)1752 233914
E-mail: John.Horne@xxxxxxxxxxxxxx       Fax: +44 (0)1752 233839