[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] negotiator error?



Hello,

One of the computers in a cluster seems to be not accepting condor jobs. Thanks in advance for any feedback.

In a small cluster we have 3 computers (1,32,32 CPUs) with 
    $CondorVersion: 7.2.4 Apr 11 2010 $
    $CondorPlatform: X86_64-LINUX_DEBIAN_UNKNOWN $
    Linux 2.6.32-41-server #94-Ubuntu SMP Fri Jul 6 18:15:07 UTC 2012 x86_64 GNU/Linux

and 1 computer (32 CPUs) with 
    $CondorVersion: 7.6.7 Apr 28 2012 BuildID: 422155 $
    $CondorPlatform: x86_64_deb_6.0-updated $
    Linux 3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

The condor jobs submitted on one of the first three computers do not run on the last (let's call it the 4th computer).

Results for the "better_analyze" switch look normal, but condor "can see" only 65 of the total 97 CPUs:
24283.000:  Run analysis summary.  Of 65 machines,
      0 are rejected by your job's requirements
     47 reject your job because of their own requirements
     18 match but are serving users with a better priority in the pool
      0 match but reject the job for unknown reasons
      0 match but will not currently preempt their existing job
      0 are available to run your job
        No successful match recorded.
        Last failed match: Tue Sep 18 16:11:57 2012
        Reason for last match failure: no match found

On the 4th computer the log/condor/NegotiatorLog file looks (at least to me) also normal. As an example, this is the last negotiation cycle from that file. It is odd that the "better_analyze" list of the 3rd computer showed the last failed match to be at 4.11pm, the last negotiation was logged on the 4th computer 3 mins later (at 4.14pm), but still no condor job is running on the 4th computer.

09/18/12 16:14:38 ---------- Started Negotiation Cycle ----------
09/18/12 16:14:38 Phase 1:  Obtaining ads from collector ...
09/18/12 16:14:38   Getting all public ads ...
09/18/12 16:14:38   Sorting 36 ads ...
09/18/12 16:14:38   Getting startd private ads ...
09/18/12 16:14:38 Got ads: 36 public and 32 private
09/18/12 16:14:38 Public ads include 0 submitter, 32 startd
09/18/12 16:14:38 Phase 2:  Performing accounting ...
09/18/12 16:14:38 Phase 3:  Sorting submitter ads by priority ...
09/18/12 16:14:38 Phase 4.1:  Negotiating with schedds ...
09/18/12 16:14:38  negotiateWithGroup resources used scheddAds length 0 
09/18/12 16:14:38 ---------- Finished Negotiation Cycle ----------

Thanks for any suggestions.

Best
Illes
http://hal.elte.hu/fij