[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] execute hosts advertise loopback address



Hello Again,


I've now had to chance to remotely login to a few of the Windows execute hosts and find pretty much the same as below.

Running 


condor_config_val IP_ADDRESS


always returns the correct IP address even if the loopback address is adverstised.  On restarting the HTCondor service the

correct address then gets advertised (this seems to be repeatable).


The service is set as Automatic (delayed start) with a dependency on DHCP. If anyone knows a way of delaying this further (or

restarting it automatically) , I'd be grateful to hear it.


As a workaround, I'm going to set things up so that I can restart the HTCondor processes on the execute hosts remotely where

machines advertise the loopback address. Not ideal - but hopefully an improvement.


regards,


-ian.




From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Craig Parker <craig.parker@xxxxxxxxx>
Sent: 26 May 2020 04:37
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] execute hosts advertise loopback address
 
Just chiming in quickly to say I have a similar sounding issue on my Win10 Condor clients, running Condor 8.8.3.  I have a quick audit script running across my machines at present, and I had around 12 percent of them advertising 127.0.0.1 today.

Running 'condor_config_val IP_ADDRESS' on an affected machine always returns the correct IP address.

It seems to be related to machines coming out of sleep.  A service restart or PC restart always fixes it, and honestly all I’ve done with it so far is to automate a restart of the Condor service if the client's 'shared_port_ad' file has the loopback address in it.  

We’re back on campus now though, with a little time on our hands, so I hope to investigate this properly in the near future.  I’ll report any findings here.

Cheers, Craig

On 22/05/2020, at 11:23 PM, Smith, Ian <I.C.Smith@xxxxxxxxxxxxxxx> wrote:

Hi again,

I've not been able to login directly to the execute hosts yet although I hope to try remote login next week. I have though tried getting the 
config values remotely using something like this:

$ condor_config_val -address "<138.253.107.4:9612>" IP_ADDRESS

I'm assuming that this does actually contact the startd on the execute host to retrieve the info ?

On the machines that advertise the loopback address I do see a value of 127.0.0.1 most of the time (although sometimes it is UNDEFINED).  Also
I get:

$ condor_config_val -address "<138.253.107.4:9612>" -dump NETWORK
# Configuration from master on (null) <138.253.107.4:9612>

# Parameters with names that match NETWORK:
NETWORK_HOSTNAME =
NETWORK_INTERFACE = *
NETWORK_MAX_PENDING_CONNECTS = 0
PRIVATE_NETWORK_INTERFACE =
PRIVATE_NETWORK_NAME = $(FULL_HOSTNAME)
VM_NETWORKING = false
VM_NETWORKING_DEFAULT_TYPE = nat
VM_NETWORKING_MAC_PREFIX =
VM_NETWORKING_TYPE = nat
VMWARE_BRIDGE_NETWORKING_TYPE = bridged
VMWARE_NAT_NETWORKING_TYPE = nat
VMWARE_NETWORKING_TYPE = nat
# Contributing configuration file(s):
#       <Default>
#       C:\Condor\condor_config

for all the hosts (working properly or not). 


Interestingly some hosts on the same subnet advertise the loopback whereas others advertise the correct address and *this is not
consistent*. On subsequent forced wake ups I see different machines advertising the correct/incorrect address. This strongly
suggests a race condition on the service start up to me and I'll see if I can check this by remotely starting the htcondor service.

One other thing - is it possible to get the daemons to bind to a specific interface in Windows - similar to eth0 in Linux ?

thanks again,

-ian.

  


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Todd L Miller <tlmiller@xxxxxxxxxxx>
Sent: 21 May 2020 17:59
To: HTCondor-Users Mail List
Subject: Re: [HTCondor-users] execute hosts advertise loopback address
 
> The comment about the service startup order is interesting. If this 
> isn't explicity set then I could imagine a race condition between 
> htcondor and the network service which would explain why some machines 
> get the correct interface address and some get the loopback. I'll get 
> back to you when I have some more information.

         On Linux, this was the cause of a lot of problems with HTCondor 
advertising loopback addresses, particularly because some distributions 
considered the network to be up when the loopback interface was ready, not 
when DHCP (or whatever) had finished.  I don't know what the case is on 
Windows.

- ToddM
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://apc01.safelinks.protection.outlook.com/?url="">

The archives can be found at:
https://apc01.safelinks.protection.outlook.com/?url="">