[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Failed to send REQUEST_CLAIM to startd



陈婷 wrote:
> Hi everyone,
>   There are five machines in the pool. The result of executing condor_status is as follow:
> ============================================================
> Name                   OpSys      Arch   State       Activity  LoadAv Mem   ActvtyTime
> 10.10.4.214          LINUX      INTEL  Unclaimed Idle       0.000    512    0+00:00:04
> slot1@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle       0.000    442    0+16:42:40
> slot2@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle       0.000    442    0+03:05:06
> slot3@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle       0.000    442    0+17:02:43
> slot4@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle       0.000    442    0+02:50:08
>                      Total Owner Claimed Unclaimed Matched Preempting Backfill
>          INTEL/LINUX     5     0       0         4       1          0        0
>                Total     5     0       0         4       1          0        0
> =============================================================
> 
> Machine "10.10.4.214" is a virtual machine installed condor. When I submit a job from m2m.jsi.cn and the content of test.cmd is :
> ====================================
> Universe = vanilla
> CMD = test.bat
> output = condor.output
> error = condor.error
> log = condor.log
> Requirements = Name == "10.10.4.214"
> WhenToTransferOutput = ON_EXIT_OR_EVICT
> queue
> ====================================
> 
> The job cannot be dispatched to "10.10.4.214".
> 
> Here is the result when I execute condor_q -analyze.
> ======================================================
> -- Submitter: m2m.jsi.cn : <10.10.3.11:35384> : m2m.jsi.cn
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
> ---
> 066.000:  Run analysis summary.  Of 5 machines,
>       4 are rejected by your job's requirements
>       1 reject your job because of their own requirements
>       0 match but are serving users with a better priority in the pool
>       0 match but reject the job for unknown reasons
>       0 match but will not currently preempt their existing job
>       0 are available to run your job
>         Last successful match: Fri Jun 19 16:26:15 2009
> ======================================================
> 
> The SchedLog is:
> ===================================================================================
> 6/19 16:26:15 (pid:8919) Sent ad to central manager for agrid@xxxxxx
> 6/19 16:26:15 (pid:8919) Sent ad to 1 collectors for agrid@xxxxxx
> 6/19 16:26:15 (pid:8919) Called reschedule_negotiator()
> 6/19 16:26:15 (pid:8919) Activity on stashed negotiator socket
> 6/19 16:26:15 (pid:8919) Negotiating for owner: agrid@xxxxxx
> 6/19 16:26:15 (pid:8919) Checking consistency running and runnable jobs
> 6/19 16:26:15 (pid:8919) Tables are consistent
> 6/19 16:26:15 (pid:8919) Rebuilt prioritized runnable job list in 0.000s.
> 6/19 16:26:15 (pid:8919) Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
> 6/19 16:26:15 (pid:8919) Response problem from startd when requesting claim 10.10.4.214 <10.10.4.214:50611> for agrid@xxxxxx 66.0.
> 6/19 16:26:15 (pid:8919) Failed to send REQUEST_CLAIM to startd 10.10.4.214 <10.10.4.214:50611> for agrid@xxxxxx:
> 6/19 16:26:15 (pid:8919) Match record (10.10.4.214 <10.10.4.214:50611> for agrid@xxxxxx, 66.0) deleted
> 6/19 16:26:20 (pid:8919) Sent ad to central manager for agrid@xxxxxx
> 6/19 16:26:20 (pid:8919) Sent ad to 1 collectors for agrid@xxxxxx
> =====================================================================================
> The NegotiatorLog is:
> =================================================================================
> 6/19 16:26:15 ---------- Started Negotiation Cycle ----------
> 6/19 16:26:15 Phase 1:  Obtaining ads from collector ...
> 6/19 16:26:15   Getting all public ads ...
> 6/19 16:26:15   Sorting 11 ads ...
> 6/19 16:26:15   Getting startd private ads ...
> 6/19 16:26:15 Got ads: 11 public and 5 private
> 6/19 16:26:15 Public ads include 1 submitter, 5 startd
> 6/19 16:26:15 Phase 2:  Performing accounting ...
> 6/19 16:26:15 Phase 3:  Sorting submitter ads by priority ...
> 6/19 16:26:15 Phase 4.1:  Negotiating with schedds ...
> 6/19 16:26:15   Negotiating with agrid@xxxxxx at <10.10.3.11:35384>
> 6/19 16:26:15 0 seconds so far
> 6/19 16:26:15     Request 00066.00000:
> 6/19 16:26:15       Matched 66.0 agrid@xxxxxx <10.10.3.11:35384> preempting none <10.10.4.214:50611> 10.10.4.214
> 6/19 16:26:15       Successfully matched with 10.10.4.214
> 6/19 16:26:15     Got NO_MORE_JOBS;  done negotiating
> 6/19 16:26:15 ---------- Finished Negotiation Cycle ----------
> ================================================================================
> 
> All the information is shown, can anybody help me pls? Thanks very much.
> 
> Jassy 

You need to take a look at the StartLog on 10.10.4.12214 at th the time
of the issue.

Best,



matt