[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Flocking - jobs matched but not started



On Fri, August 19, 2005 17:59, Michael Rusch said:
> To see what the job is doing on the flocked-to pool, you have to run
> condor_q from there, using the -name option to point to the original
> submitter.  It winds up looking something like this:
>
Thanks, the output from condor_q looking at one job is:

==============================================================
condor_q -name ws-60-56.dhcp.plymouth.ac.uk -analyze 48.0 -l

-- Schedd: ws-60-56.dhcp.plymouth.ac.uk : <141.163.60.56:44957>
ws-60-7         Failed offer constraint
---
048.000:  Run analysis summary.  Of 1 machines,
      0 are rejected by your job's requirements
      1 reject your job because of their own requirements
      0 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        Last successful match: Fri Aug 19 21:33:35 2005
        Last failed match: Fri Aug 19 21:34:35 2005
        Reason for last match failure: no match found

WARNING:  Be advised:   Request 48.0 did not match any resource's constraints
==============================================================

This is interesting. To me it seems to indicate that the job is not
running because the client (60.7) is rejecting it. I'll need to look more
closely at the client config I think. However, having just run the command
again, I now get:

  ws-60-7         Failed rank condition: MY.Rank > MY.CurrentRank


>
> Also, how long have you waited for the job to run?
>
No, we don't have to wait long. From starting condor up I would say we
would wait between 5 and 10 minutes for the job to become 'noticed' by the
remote server. However, in our setup we did reduce some of the timers, and
remove some of the wait conditions, because our linux clients are purely
processing clients. All keyboard, mouse, usb etc etc input devices have
been disabled. They can talk to the local network but other than that they
do nothing but run condor jobs. As such we don't want jobs waiting in a
queue.



John.

-- 
---------------------------------------------------------------
John Horne, University of Plymouth, UK  Tel: +44 (0)1752 233914
E-mail: John.Horne@xxxxxxxxxxxxxx       Fax: +44 (0)1752 233839