Mailing List Archives Public Access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?

Date: Mon, 15 Aug 2011 10:43:38 -0500
From: Dan Bradley <dan@xxxxxxxxxxxx>
Subject: Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?

On 8/15/11 10:32 AM, Ian Chesal wrote:

On Monday, 15 August, 2011 at 10:59 AM, Dan Bradley wrote:

Ian,

I believe the settings you mentioned will achieve what you are trying to do. In 7.6, it should also be sufficient to do this:

WANT_UDP_COMMAND_SOCKET = false
UPDATE_COLLECTOR_WITH_TCP = True

COLLECTOR_MAX_FILE_DESCRIPTORS = 3000

In 7.6, daemons that do not have a UDP port advertise this fact in their address information. Therefore, it is not necessary to fiddle with protocol knobs such as SCHEDD_SEND_VACATE_VIA_TCP, because the client automatically switches to TCP when it sees that the server lacks a UDP port.

Thanks Dan! That's sufficient incentive for me to ensure everything is 7.6.x in my pool then. Right now I'm running a 7.4.3 scheduler and CM, but 7.6.1 on the execute node.

For what it's worth, this is all to debug an issue I'm seeing on a large CPU count Windows 2k8 Server machine. It has 40 physical cores but Condor only seems to be able to utilize 12 slots on the box before it starts to fail to accept claims from the shadows with:

08/10/11 18:29:50 Received TCP command 444 (ACTIVATE_CLAIM) from unauthenticated@unmapped <10.78.194.211:40724>, access level DAEMON

08/10/11 18:29:50 Calling HandleReq <command_activate_claim> (0)

08/10/11 18:29:50 slot25: Got activate_claim request from shadow (<10.78.194.211:40724>)

08/10/11 18:30:06 condor_write(): Socket closed when trying to write 13 bytes to <10.78.194.211:40724>, fd is 1356

08/10/11 18:30:06 Buf::write(): condor_write() failed

08/10/11 18:30:06 slot25: Can't send eom to shadow.

08/10/11 18:30:06 Return from HandleReq <command_activate_claim> (handler: 15.615s, sec: 0.016s)

What appears in ShadowLog during the time between 18:29:50 and 18:30:06?

I realize that's TCP traffic but I was wondering if UDP communications from the shadows trying to get in touch with the startd on the box were causing problems in general for the Windows networking stack.

Since the address of the collector is hard-coded into the configuration file, there are two options: one is to use UPDATE_COLLECTOR_WITH_TCP, as in the above example. The other is to add the "noUDP" flag to the collector address. Example:

COLLECTOR_HOST = mycollector.host.name:9618?noUDP

I've still got 7.4.3 execute nodes in the pool -- will they they recognize this flag?

7.4 daemons will ignore this flag. Daemons from before 7.3 will likely not accept this as a valid address.

--Dan

Follow-Ups:
- Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
  - From: Ian Chesal

References:
- [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
  - From: Ian Chesal
- Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
  - From: Dan Bradley
- Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
  - From: Ian Chesal

Prev by Date: Re: [Condor-users] transfer_input_files do not copy directories on Windows XP
Next by Date: Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
Previous by thread: Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
Next by thread: Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?
Index(es):
- Date
- Thread

Mailing List Archives

Public Access

Re: [Condor-users] Do the following settings remove all UDP communication from a Condor pool?