[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] execute hosts advertise loopback address



Hi Greg,


It looks like that has worked but it's difficult to tell whether this was down to changing NETWORK_INTERFACE or just restarting

the Condor service. I'll be more certain after I've pushed the new config files out to all the PCs and we'll see if the loopback

addresses still appear. Will report back then.


thanks,


-ian.




From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Hitchen, Greg (IM&T, Kensington WA) <Greg.Hitchen@xxxxxxxx>
Sent: 29 May 2020 01:27
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] execute hosts advertise loopback address
 

Hi Ian

 

Have you tried putting ip subnet info in NETWORK_INTERFACE, rather than just *?

 

e.g. NETWORK_INTERFACE = 138.253.*

 

I think in the dim dark past we had a similar intermittent issue but have never had

problems since adding our network subnets, at least on our windows machines.

 

Linux VMs (VMWare, vSphere, ESX servers) still require a cron job to check the

condor network binding as they occasionally come up bound to the loopback

address after outages/rebooting.

 

Cheers

 

Greg

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Smith, Ian
Sent: Thursday, 28 May 2020 4:48 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] execute hosts advertise loopback address

 

Hello Again,

 

I've now had to chance to remotely login to a few of the Windows execute hosts and find pretty much the same as below.

Running 

 

condor_config_val IP_ADDRESS

 

always returns the correct IP address even if the loopback address is adverstised.  On restarting the HTCondor service the

correct address then gets advertised (this seems to be repeatable).

 

The service is set as Automatic (delayed start) with a dependency on DHCP. If anyone knows a way of delaying this further (or

restarting it automatically) , I'd be grateful to hear it.

 

As a workaround, I'm going to set things up so that I can restart the HTCondor processes on the execute hosts remotely where

machines advertise the loopback address. Not ideal - but hopefully an improvement.

 

regards,

 

-ian.

 


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Craig Parker <craig.parker@xxxxxxxxx>
Sent: 26 May 2020 04:37
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] execute hosts advertise loopback address

 

Just chiming in quickly to say I have a similar sounding issue on my Win10 Condor clients, running Condor 8.8.3.  I have a quick audit script running across my machines at present, and I had around 12 percent of them advertising 127.0.0.1 today.

 

Running 'condor_config_val IP_ADDRESS' on an affected machine always returns the correct IP address.

 

It seems to be related to machines coming out of sleep.  A service restart or PC restart always fixes it, and honestly all I’ve done with it so far is to automate a restart of the Condor service if the client's 'shared_port_ad' file has the loopback address in it.  

 

We’re back on campus now though, with a little time on our hands, so I hope to investigate this properly in the near future.  I’ll report any findings here.

 

Cheers, Craig



On 22/05/2020, at 11:23 PM, Smith, Ian <I.C.Smith@xxxxxxxxxxxxxxx> wrote:

 

Hi again,

 

I've not been able to login directly to the execute hosts yet although I hope to try remote login next week. I have though tried getting the 

config values remotely using something like this:

 

$ condor_config_val -address "<138.253.107.4:9612>" IP_ADDRESS

 

I'm assuming that this does actually contact the startd on the execute host to retrieve the info ?

 

On the machines that advertise the loopback address I do see a value of 127.0.0.1 most of the time (although sometimes it is UNDEFINED).  Also

I get:

 

$ condor_config_val -address "<138.253.107.4:9612>" -dump NETWORK

# Configuration from master on (null) <138.253.107.4:9612>

 

# Parameters with names that match NETWORK:

NETWORK_HOSTNAME =

NETWORK_INTERFACE = *

NETWORK_MAX_PENDING_CONNECTS = 0

PRIVATE_NETWORK_INTERFACE =

PRIVATE_NETWORK_NAME = $(FULL_HOSTNAME)

VM_NETWORKING = false

VM_NETWORKING_DEFAULT_TYPE = nat

VM_NETWORKING_MAC_PREFIX =

VM_NETWORKING_TYPE = nat

VMWARE_BRIDGE_NETWORKING_TYPE = bridged

VMWARE_NAT_NETWORKING_TYPE = nat

VMWARE_NETWORKING_TYPE = nat

# Contributing configuration file(s):

#       <Default>

#       C:\Condor\condor_config

 

for all the hosts (working properly or not). 

 

Interestingly some hosts on the same subnet advertise the loopback whereas others advertise the correct address and *this is not

consistent*. On subsequent forced wake ups I see different machines advertising the correct/incorrect address. This strongly

suggests a race condition on the service start up to me and I'll see if I can check this by remotely starting the htcondor service.

 

One other thing - is it possible to get the daemons to bind to a specific interface in Windows - similar to eth0 in Linux ?

 

thanks again,

 

-ian.

  


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Todd L Miller <tlmiller@xxxxxxxxxxx>
Sent: 21 May 2020 17:59
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] execute hosts advertise loopback address

 

> The comment about the service startup order is interesting. If this 
> isn't explicity set then I could imagine a race condition between 
> htcondor and the network service which would explain why some machines 
> get the correct interface address and some get the loopback. I'll get 
> back to you when I have some more information.

         On Linux, this was the cause of a lot of problems with HTCondor 
advertising loopback addresses, particularly because some distributions 
considered the network to be up when the loopback interface was ready, not 
when DHCP (or whatever) had finished.  I don't know what the case is on 
Windows.

- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to 
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://apc01.safelinks.protection.outlook.com/?url="">

The archives can be found at:
https://apc01.safelinks.protection.outlook.com/?url="">