[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] bind failed: WSAError = 10055 and 10048

Thanks Todd!

I was examining some machines that are loosing connection to the CM and jobs are stuck in the queue because of that. When the schedd node loses connection to the CM, also happens the following message

11/22 08:42:24 (pid:4224) attempt to connect to <> failed: timed out after 20 seconds.
11/22 08:42:24 (pid:4224) ERROR: SECMAN:2004:Failed to create security session to <> with TCP.|SECMAN:2003:TCP connection to <> failed.
11/22 08:42:24 (pid:4224) Failed to start non-blocking update to <>.
11/22 08:42:25 (pid:4224) attempt to connect to <> failed: connect errno = 10060.  Will keep trying for 45 total seconds (24 to go).

When I restart the Condor service, communication is restablished and jobs start again.
Seems both messages are related.

I have never seem this behavior or messages before on neither of our department pools. This particular one is located in another site but on the same domain. I will seek other department pools to see if these messages occur there too.

I will also look into the references you have listed.

I will come back soon with more informations.


Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Sent by: condor-users-bounces@xxxxxxxxxxx

22/11/2011 14:40

Please respond to
Condor-Users Mail List <condor-users@xxxxxxxxxxx>

Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Re: [Condor-users] bind failed: WSAError = 10055 and 10048

kschwarz@xxxxxxxxxxxxxx wrote:
> Environment: Condor 7.2.4, Windows XP
> I am getting the following message in the MasterLog, SchedLog, and
> StartdLog files on a Window Desktop:
> 11/22 04:21:28 bind failed: WSAError = 10055
> 11/22 04:21:28 SafeSock::connect bind() failed: _state = 1

Winsock Error 10055 = WSAENOBUFS, the system has not enough memory or
other system resources to open new TCP/IP socket or to handle socket data.

Does the problem happen consistently?  If you reboot this XP machine,
does the problem still persist?  Is this happening on many XP machines
in your shop?  If the answer is yes to any of the questions, it may be
possible we have a bug in the code to hunt down.  If the answer is
generally no, then there may not be much we can do code-wise as the
problem is environmental --- some common internet wisdom on
environmental causes of the WSAENOBUFS error:

You receive an error message when you run a custom Winsock network
programon a Microsoft Windows XP Service Pack 2-based computer.

When you try to connect from TCP ports greater than 5000 you receive the
error 'WSAENOBUFS (10055)'
(thought: perhaps you could specify a port range in the condor config
file to get around this one? if this turns out to help, let us know,
perhaps we should change default configuration on windows limit the port

You can also use TCPView to monitor how many TCP connections opened in
your process when this issue happens:
TCPView for Windows v2.51

hope this helps,

Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting

The archives can be found at:

This message is intended solely for the use of its addressee and may contain privileged or confidential information. All information contained herein shall be treated as confidential and shall not be disclosed to any third party without Embraer’s prior written approval. If you are not the addressee you should not distribute, copy or file this message. In this case, please notify the sender and destroy its contents immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações privilegiadas e confidenciais. Todas as informações aqui contidas devem ser tratadas como confidenciais e não devem ser divulgadas a terceiros sem o prévio consentimento por escrito da Embraer. Se você não é o destinatário não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor, notifique o remetente da mesma e destrua imediatamente a mensagem.