[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] hostname empty on node restart



Hi Tom and Peter,

Thanks for your suggestions!

First the CentOS: Apparently my memory here was related to before our recent upgrade from a RHEL6 based system. In CentOS7 systemd is used, which seems to work as expected.

For the Ubuntu machines in question, they are all running 14.04 LTE. From what I've understood after spending more time investigating, they are using a lovely mix of Upstart and standard SysV (?), and I am having a hard time to understand where one ends and the other begins. The Condor startup script from /etc/init.d is soft linked to the different run levels with K20 and S20. I figured out that S20 is probably much too low (as Peter also suggested), so I changed that to S98 with the commands:

# update-rc.d -f condor remove
# update-rc.d -f condor defaults 98 20

From what I understand about run levels, I thought that if network is needed then it shouldn't go in run level 2, only 3-5. This seems to not be the case for these machines though, where level 2 is the default after startup.

The change seems to work half way. Sometimes the machines get the correct network name, sometimes they are displayed as 'localhost', and sometimes with empty hostname.

At first I had tried to modify the LSB header, but I did not find any way to update the system based on these changes? For example, it seemed to me from the documentation that the "Required-Start" should perhaps include $named in the list?

Regarding the 'allow-hotplug', this does not seem to be activated on our machines. The only content we have in /etc/network/interfaces is:
$ cat /etc/network/interfaces
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback

Using static IP address is not an (easy) solution unfortunately, as we are on a managed work network with a reasonably complex system to govern the distribution of IP addresses and hostnames (as far as I know).

Cheers,
Yngve


On 29/05/15 14:52, Tom Downes wrote:
Yngve:

If you are using DHCP to configure your network, I may have seen the same issue on Debian. I found the following it necessary to remove "allow-hotplug eth0" in /etc/network/interfaces to ensure that HTCondor started only after the DHCP client gets a response. It should be the same on Ubuntu:

auto lo eth0
iface eth0 inet dhcp

The issue is that "allow-hotplug eth0" allows the DHCP client to go into the background and HTCondor will attempt startup even if networking setup is incomplete. Using static IP addresses would also be a solution.

I consider this a bug in HTCondor as it is not handling the out-of-the-box OS configuration properly. Could be a similar issue on CentOS 7 / systemd.

--
Tom Downes
Associate Scientist and Data Center Manager
Center for Gravitation, Cosmology and Astrophysics
University of Wisconsin-Milwaukee
414.229.2678

On Fri, May 29, 2015 at 5:07 AM, Peter Ellevseth <Peter.Ellevseth@xxxxxxxxxx> wrote:
Hi Yngve

You can control when in the boot sequence that each service is started. How this is done, differs on each distro. For CentOs it is definde by the defined name of the start-script in the rc0-etc folders. Perhaps this could solve your issues? We used CentOs 5/6 and I start condor very late in the sequence (file name K90 or K99).

Peter

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Yngve Inntjore Levinsen
Sent: 27. mai 2015 10:15
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] hostname empty on node restart

Dear all,

We have a few machines in our pool now, running either Ubuntu (various recent-ish versions) or CentOS 7. HT Condor is installed from repositories on all machines. Version 8.2.8 is installed on all nodes and the master (which runs CentOS 7).

When restarting (at least the Ubuntu machines, I forget if the CentOS machines still do this), the condor service is started on the nodes, but it seems that it is probably done too early. Hence if I run condor_status I get a list as attached, where for the restarted node I see slotN@ Instead of slotN@hostname

It seems that this is also hindering the slots from being used. The solution we currently use is to manually run a simple 'sudo service condor restart'. After that the hostname is shown correctly and everything works fine. It is a bit troublesome as the Ubuntu boxes are office computers which us 'condor admins' don't have sudo access to necessarily, so we have to ask the owners to run the commands.

Does anyone know why this happens?

Cheers,
Yngve

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/