Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] "create_tcp_port(): bind() failed" for standard universe jobs

Date: Fri, 04 May 2012 11:40:42 +0100
From: Mark Calleja <mc321@xxxxxxxxx>
Subject: [Condor-users] "create_tcp_port(): bind() failed" for standard universe jobs

Hi,

One of our users is seeing some of his migrating standard universe jobs(Linux, Condor v7.6.6) fail to restart with:

001 (12814.129.000) 04/29 14:59:01 Job executing on host:<xxx.xxx.xxx.xxx:9210>

...
007 (12814.129.000) 04/29 14:59:01 Shadow exception!
        create_tcp_port(): bind() failed: 98(Address already in use)
        125  -  Run Bytes Sent By Job
        6501894  -  Run Bytes Received By Job

The execute hosts we see this failing on are a mixture of distros,including Ubuntu 10.04, Debian 6.0, and SLES 10. I've come across onerelated thread in the Condor-users mailing list (begins athttps://lists.cs.wisc.edu/archive/condor-users/2011-January/msg00037.shtml),but since the majority of Condor installations on these execute hostshas been via tar balls then I don't think that what's in that thread isrelevant.

Can anyone shed light as to what this bind failure is alluding to? Is ita case that the machine has run out of ephemeral ports for the job(unlikely, as many machines don't define a port range), or is thestandard universe functionality really trying to bind to a specific portthat's already in use? (I thought that the latter couldn't be the caseas the standard universe abstracted away specific port usage).

Any hints to the underlying cause of this issue would be gratefullyreceived.


Ta,
Mark

Prev by Date: Re: [Condor-users] Condor on ARM?
Next by Date: Re: [Condor-users] Condor on ARM?
Previous by thread: Re: [Condor-users] Condor on ARM?
Next by thread: [Condor-users] Improving community involvement
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

[Condor-users] "create_tcp_port(): bind() failed" for standard universe jobs