[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Negotiation Problem



At 06:39 AM 4/19/2006, Luis Rodríguez Ruiz wrote:
Hello,

I've installed condor in a cluster with two nodes (node1 and node2). The
installation (by means of the condor_install script) was successfully
performed in both nodes (node1 is the central manager). I have added
START=TRUE in the condor_config.local of the node2 in order to allow the
execution of jobs in node2. Each node has 4 virtual CPUs. The
condor_status command shows:


                     Machines Owner Claimed Unclaimed Matched Preempting

         INTEL/LINUX        8     4       0         4       0          0

               Total        8     4       0         4       0          0


When I submit a job from node1, the job is not sent to node2 (node1 is
not allowed to run jobs). In fact, the NegotiatorLog file shows
something like:


---------- Started Negotiation Cycle ----------
9/27 16:36:46 Phase 1:  Obtaining ads from collector ...
9/27 16:36:46   Getting all public ads ...
9/27 16:36:46   Sorting 13 ads ...
9/27 16:36:46   Getting startd private ads ...
9/27 16:36:46 Got ads: 13 public and 8 private
9/27 16:36:46 Public ads include 1 submitter, 8 startd
9/27 16:36:46 Phase 2:  Performing accounting ...
9/27 16:36:46 Phase 3:  Sorting submitter ads by priority ...
9/27 16:36:46 Phase 4.1:  Negotiating with schedds ...
9/27 16:36:46   Negotiating with lrodrig@xxxxxxxxxxxxxx at
<192.168.1.1:33444>
9/27 16:36:46     Request 00002.00000:
9/27 16:36:46       Rejected 2.0 lrodrig@xxxxxxxxxxxxxx
<192.168.1.1:33444>: no match found
9/27 16:36:46     Got NO_MORE_JOBS;  done negotiating


I can't find information related to node2 during the negotiation
process. So, no jobs are executed in this node. The idea is to build a
cluster where all the jobs have to be submitted from the central manager
(I mean just one jobs queue shared by all the nodes) and are executed in
the rest of the nodes.

Could anyone help me?


On node01, what does "condor_q -analyze 2.0" have to say?

Also the FAQ section of the manual has additional pointers/ideas, esp take a peek at the FAQ question "Why aren't any or all of my jobs running?" available online at
   http://www.cs.wisc.edu/condor/manual/v6.7/7_3Running_Condor.html#SECTION00835000000000000000


-Todd


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
http://www.cs.wisc.edu/~tannenba      Madison, WI 53706-1685
Phone: (608) 263-7132  FAX: (608) 262-9777