[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] 7.8.2 / running out of ports for UDP



There is a persistent error in MasterLog –

09/06/12 11:59:31 Using config source: /etc/condor/condor_config

09/06/12 11:59:31 Using local config sources:

09/06/12 11:59:31    /etc/condor/condor_config.local

09/06/12 11:59:31 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)

09/06/12 11:59:31 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable)

09/06/12 11:59:31 ERROR "Can't get lock on "/var/lock/condor/InstanceLock"" at line 919 in file /slots/05/dir_12000/userdir/src/condor_master.V6/master.cpp

 

 

I did not configure the collector for any specific port.

[root@condor ~]# lsof -i udp:1980

COMMAND     PID   USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME

condor_ma 11227 condor    3u  IPv4 15806394      0t0  UDP 10.178.6.5:49664->10.178.6.5:pearldoc-xact

condor_ma 11227 condor    4u  IPv4 15806396      0t0  UDP 10.178.6.5:40075->10.178.6.5:pearldoc-xact

condor_ma 11227 condor    5u  IPv4 15806397      0t0  UDP 10.178.6.5:58964->10.178.6.5:pearldoc-xact

condor_ma 11227 condor    6u  IPv4 15806398      0t0  UDP 10.178.6.5:53669->10.178.6.5:pearldoc-xact

condor_ma 11227 condor    7u  IPv4 15806399      0t0  UDP 10.178.6.5:56974->10.178.6.5:pearldoc-xact

condor_ma 11227 condor    8u  IPv4 15806400      0t0  UDP 10.178.6.5:45113->10.178.6.5:pearldoc-xact

 

 

I’ll run the strace now and see what I get…

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: Thursday, September 06, 2012 12:09 PM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] 7.8.2 / running out of ports for UDP

 


Is there anything interesting in MasterLog?

It may be helpful to strace condor_master.

strace -p <insert-pid-of-master> -o master.strace

You can kill that after it has run for long enough to observe lots of sockets being opened.

What is using port 1980?  Have you configured your collector to use that port?

--Dan

On 9/6/12 11:02 AM, Shrum, Donald C wrote:

Hi Dan,

 

The problem persists so flocking was not related.  The condor master is back up to 28,000+ open UDP ports.

Right now I’m just doing a periodic restart

 

[root@condor ~]# condor_status -master -format "%d\n" MonitorSelfRegisteredSocketCount

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

connect: Resource temporarily unavailable

 

[root@condor ~]# service condor restart

Shutting down Condor (fast-shutdown mode)...  done.

Starting up Condor...    done.

 

[root@condor ~]# condor_status -master -format "%d\n" MonitorSelfRegisteredSocketCount

1

 

 

Thanks for the help –

 

Don

FSU HPC

 

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: Thursday, September 06, 2012 10:53 AM
To: condor-users@xxxxxxxxxxx
Subject: Re: [Condor-users] 7.8.2 / running out of ports for UDP

 

Donald,

If you observe this problem again, see what the daemon is reporting in its ClassAd:

condor_status -master -format "%d\n" MonitorSelfRegisteredSocketCount <insert-hostname-here>

What's using port 1980?  The collector?

--Dan

On 9/6/12 9:28 AM, Shrum, Donald C wrote:

As always, thanks Ian.

 

We had flocking set up with another University and using the ‘this was one of the last things I touched’ trouble shooting method I just disabled flocking and Condor Connection Brokering (CCB_ADDRESS)

 

That may have resolved the problem… we’ll see.

 

--Don

FSU HPC

 

 

From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: Thursday, September 06, 2012 9:34 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] 7.8.2 / running out of ports for UDP

 

Donald,

 

You could switch to TCP for collector updates:

 

UPDATE_COLLECTOR_WITH_TCP = True

 

 

Or even better: switch to using the shared port daemon. This should help reduce the number of connections needed on any one machine. See: http://research.cs.wisc.edu/condor/manual/v7.6/3_7Networking_includes.html#32152

 

Regards,

- Ian

 

-- 

Ian Chesal

 

Cycle Computing, LLC

Leader in Open Compute Solutions for Clouds, Servers, and Desktops

Enterprise Condor Support and Management Tools

888.292.5320

 

 

On Thursday, 6 September, 2012 at 9:28 AM, Shrum, Donald C wrote:

Looks like the collector -

 

udp 0 0 10.178.6.5:41796 10.178.6.5:1980 ESTABLISHED 580/condor_collecto

udp 0 0 10.178.6.5:43588 10.178.6.5:1980 ESTABLISHED 580/condor_collecto

udp 0 0 10.178.6.5:48964 10.178.6.5:1980 ESTABLISHED 580/condor_collecto

udp 0 0 10.178.6.5:40004 10.178.6.5:1980 ESTABLISHED 580/condor_collecto

udp 0 0 10.178.6.5:47684 10.178.6.5:1980 ESTABLISHED 580/condor_collecto

 

This was on the central manager. Next time I see it happen on a processing node I'll check there as well.

 

-----Original Message-----

Sent: Thursday, September 06, 2012 8:36 AM

To: Condor-Users Mail List

Subject: Re: [Condor-users] 7.8.2 / running out of ports for UDP

 

On Thu, Sep 06, 2012 at 12:27:46PM +0000, Shrum, Donald C wrote:

I'm running redhat 6.3 with condor 7.8.2

On a number of my servers, both processing and on the central manager;

I find condor holding open a massive number of UDP ports. So many that

it blocks any new connections and DNS lookups fail.

Is this happening for anyone else?

 

Can you say which particular condor process is holding open the ports?

 

netstat -naup

 

(as root) should show you the process name and pid for each socket.

_______________________________________________

Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a

subject: Unsubscribe

You can also unsubscribe by visiting

 

The archives can be found at:

 

 

_______________________________________________

Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a

subject: Unsubscribe

You can also unsubscribe by visiting

 

The archives can be found at:

 





_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

 




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/