[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] One node doesn't execute jobs



Don’t search in the central manager’s masterlog. Search in the worker’s masterlog. See what is getting logged.

 

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Smith, Herb
Sent: Friday, May 11, 2012 3:41 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

The slacker node did not appear at all in the SchedLog, the MasterLog, or the ShadowLog.  It does appear in the NegotiatorLog as follows:

 

   Negotiating with m219237@A4005223  at <134...>

0 seconds so far

     Request 00037.00000:

       Matched 37.0 m219237@A4005223  <134...> preempting none <134...> slot1@A4111261

       Successfully matched with slot1@A4111261

     Request 00037.00001:

       Matched 37.1 m219237@A4005223  <134...> preempting none <134...> slot2@A4111261

       Successfully matched with slot2@A4111261

     Request 00037.00002:

       Matched 37.2 m219237@A4005223  <134...> preempting none <134...> slot1@A3927960

       Successfully matched with slot1@A3927960

     Request 00037.00003:

       Matched 37.3 m219237@A4005223  <134...> preempting none <134...> slot2@A3927960

       Successfully matched with slot2@A3927960

     Request 00037.00004:

       Rejected 37.4 m219237@A4005223  <134...>: no match found

     Got NO_MORE_JOBS;  done negotiating

 

So it seems to be matching up, but it doesn’t actually accept any jobs, or so it would seem.  Keep in mind that I’m a total newbie at this, ok.

 

Any other thoughts?

 

TIA,

 

Herb

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Friday, May 11, 2012 2:15 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

On Friday, 11 May, 2012 at 3:04 PM, Smith, Herb wrote:

Both of the pool machines have the same operating system setup as all the machines in the company receive the same software load. Is there some way to determine why this machine is not picking up any of the work load?

 

Start with the SchedLog -- Matched + Idle usually indicates the scheduler is having issues completing the claim process with the node so it can't send the job over. If the SchedLog says the claim was acknowledged and a shadow was spawned successfully for the job, go to the ShadowLog file and see if you can find information about the shadow that was spawned for the job.

 

Regards,

- Ian

 

---

Ian Chesal

 

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools