[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] hostname empty on node restart



Yngve:

If you are using DHCP to configure your network, I may have seen the same issue on Debian. I found the following it necessary to remove "allow-hotplug eth0" in /etc/network/interfaces to ensure that HTCondor started only after the DHCP client gets a response. It should be the same on Ubuntu:

auto lo eth0
iface eth0 inet dhcp

The issue is that "allow-hotplug eth0" allows the DHCP client to go into the background and HTCondor will attempt startup even if networking setup is incomplete. Using static IP addresses would also be a solution.

I consider this a bug in HTCondor as it is not handling the out-of-the-box OS configuration properly. Could be a similar issue on CentOS 7 / systemd.

--
Tom Downes
Associate Scientist and Data Center Manager
Center for Gravitation, Cosmology and Astrophysics
University of Wisconsin-Milwaukee
414.229.2678

On Fri, May 29, 2015 at 5:07 AM, Peter Ellevseth <Peter.Ellevseth@xxxxxxxxxx> wrote:
Hi Yngve

You can control when in the boot sequence that each service is started. How this is done, differs on each distro. For CentOs it is definde by the defined name of the start-script in the rc0-etc folders. Perhaps this could solve your issues? We used CentOs 5/6 and I start condor very late in the sequence (file name K90 or K99).

Peter

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Yngve Inntjore Levinsen
Sent: 27. mai 2015 10:15
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] hostname empty on node restart

Dear all,

We have a few machines in our pool now, running either Ubuntu (various recent-ish versions) or CentOS 7. HT Condor is installed from repositories on all machines. Version 8.2.8 is installed on all nodes and the master (which runs CentOS 7).

When restarting (at least the Ubuntu machines, I forget if the CentOS machines still do this), the condor service is started on the nodes, but it seems that it is probably done too early. Hence if I run condor_status I get a list as attached, where for the restarted node I see slotN@ Instead of slotN@hostname

It seems that this is also hindering the slots from being used. The solution we currently use is to manually run a simple 'sudo service condor restart'. After that the hostname is shown correctly and everything works fine. It is a bit troublesome as the Ubuntu boxes are office computers which us 'condor admins' don't have sudo access to necessarily, so we have to ask the owners to run the commands.

Does anyone know why this happens?

Cheers,
Yngve

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/