[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] One node doesn't execute jobs



in the config file for that node, you can set
    NETWORK_ADDRESS=ip_addr
where ip_addr is the IP address of the network interface that you want condor to use.

On 5/11/2012 3:19 PM, Smith, Herb wrote:

It appears to.  When I do an ipconfig it shows the regular one but then there is another that has no DNS suffix and the IP address is all zeros.

 

Is there a way to deal with this short of removing the thing?

 

Herb

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Friday, May 11, 2012 3:09 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

You have a conflict your tcp/ip configuration. Does this computer have multiple network cards?

 

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Smith, Herb
Sent: Friday, May 11, 2012 4:05 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

Sorry!  I get a huge list of errors as follows:

 

Failed to start non-blocking update to <134....>.

attempt to connect to <134....> failed: connect errno = 10061 connection refused.

ERROR: SECMAN:2004:Failed to create security session to <134....> with TCP.

|SECMAN:2003:TCP connection to <134....> failed.

 

Failed to start non-blocking update to <134....>.

attempt to connect to <134....> failed: connect errno = 10061 connection refused.

ERROR: SECMAN:2004:Failed to create security session to <134....> with TCP.

|SECMAN:2003:TCP connection to <134....> failed.

 

Thoughts?

 

Herb

 

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alas, Alex [FEDI]
Sent: Friday, May 11, 2012 2:44 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

Don’t search in the central manager’s masterlog. Search in the worker’s masterlog. See what is getting logged.

 

Alex

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Smith, Herb
Sent: Friday, May 11, 2012 3:41 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

The slacker node did not appear at all in the SchedLog, the MasterLog, or the ShadowLog.  It does appear in the NegotiatorLog as follows:

 

   Negotiating with m219237@A4005223  at <134...>

0 seconds so far

     Request 00037.00000:

       Matched 37.0 m219237@A4005223  <134...> preempting none <134...> slot1@A4111261

       Successfully matched with slot1@A4111261

     Request 00037.00001:

       Matched 37.1 m219237@A4005223  <134...> preempting none <134...> slot2@A4111261

       Successfully matched with slot2@A4111261

     Request 00037.00002:

       Matched 37.2 m219237@A4005223  <134...> preempting none <134...> slot1@A3927960

       Successfully matched with slot1@A3927960

     Request 00037.00003:

       Matched 37.3 m219237@A4005223  <134...> preempting none <134...> slot2@A3927960

       Successfully matched with slot2@A3927960

     Request 00037.00004:

       Rejected 37.4 m219237@A4005223  <134...>: no match found

     Got NO_MORE_JOBS;  done negotiating

 

So it seems to be matching up, but it doesn’t actually accept any jobs, or so it would seem.  Keep in mind that I’m a total newbie at this, ok.

 

Any other thoughts?

 

TIA,

 

Herb

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Friday, May 11, 2012 2:15 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] One node doesn't execute jobs

 

On Friday, 11 May, 2012 at 3:04 PM, Smith, Herb wrote:

Both of the pool machines have the same operating system setup as all the machines in the company receive the same software load. Is there some way to determine why this machine is not picking up any of the work load?

 

Start with the SchedLog -- Matched + Idle usually indicates the scheduler is having issues completing the claim process with the node so it can't send the job over. If the SchedLog says the claim was acknowledged and a shadow was spawned successfully for the job, go to the ShadowLog file and see if you can find information about the shadow that was spawned for the job.

 

Regards,

- Ian

 

---

Ian Chesal

 

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools

 

 

 



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/