[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] hostname empty on node restart

Hi Rich,

while [ `hostname` = localhost ]; do sleep 5; done
Simple and excellent idea, thanks!

What I did meanwhile, which seems to work well so far, is that I created a new init script. I have removed condor from all run levels, adding this new script instead with a very simple start() command:

start() {
        echo "Delayed Condor server startup"
        sleep 30
        service condor start

This delay is a bit longer than you suggest, but it doesn't matter to us if we lose 30 seconds of simulation time after a startup. It also avoids the while loop (e.g. if network cable disconnected?). If anyone would manually run 'service condor start' they would not be affected by this delay.

Not sure if I will run into other surprising issues as a result of this change, but so far so good.


On 04/06/15 15:53, Rich Pieri wrote:
On 6/4/15 4:28 AM, Yngve Inntjore Levinsen wrote:
 From what I understand about run levels, I thought that if network is
needed then it shouldn't go in run level 2, only 3-5. This seems to not
be the case for these machines though, where level 2 is the default
after startup.
Only run levels 0, 1 (or S) and 6 are set in stone. LSB suggests that
level 2 is for full multi-user without networking and level 3 is level 2
plus networking. You should still look at your distribution target
because many long-standing distributions like Debian and Slackware
predate LSB.

The change seems to work half way. Sometimes the machines get the
correct network name, sometimes they are displayed as 'localhost', and
sometimes with empty hostname.
This may be due to latency in the DHCP requests: dhcpcd is running in
the background waiting for a lease when the Condor daemon start happens.
If this is the case then you can't fix it by changing start priorities.
You need to make Condor's start process dependent on DHCP lease
acquisition. A simple hack is a little semi-infinite loop in the init

while [ `hostname` = localhost ]; do sleep 5; done