[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor_kbdd.exe crashes/doesn't start if no network is available



Hi all,

Thanks for your ideas and comments! I just found out why everything doesn't work without network. 
It turned out that it is quite a stupid reason: In our installation we refer to a global config file sitting on a network share. 
Naturally, all the deamons crash without access to that file. I hope I didn't miss anything, but may I suggest to include a logging about that exit error in the future. I only found out because I managed to build and debug my own condor version :)

Anyhow, is there some workaround for that problem?
Is there a possibility to cache the global config file locally? How can I guarantee that it updates on all clients when I update the global version? Do I need to write my own update service?

Alternatively, we may think about a wrapper which checks for the availability of the file before starting kbdd.

Thanks again,
Best,
Michael

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx
Sent: Friday, May 19, 2017 3:09 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] Condor_kbdd.exe crashes/doesn't start if no network is available

We had some issues along these lines years ago (7.* series?) related to whether the network was up or not.

Our auto-install scripts that create the condor service set it to: start = auto (delayed) depend = dhcp as a workaround. No idea if we still need it with the latest versions.

Cheers

Greg

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Friday, 19 May 2017 12:03 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Condor_kbdd.exe crashes/doesn't start if no network is available

On 5/18/2017 2:17 AM, Michael Schwarzfischer wrote:
> Dear all,
> We are running condor 8.4.9 on Windows 7 clients. Somehow our network 
> seems to have some stability issues from time to time...
>
> Especially, after login it seems to happen that the network connection 
> is not yet fully established leading to a crash in the condor_kbdd.exe.
>
> This can easily be simulated by disabling the network adapter. Without 
> the connection condor_kbdd.exe just doesn't start. Furthermore, there 
> is no logging at all in that case (even with debug logging).
>
> We wonder if there are some workarounds or tricks in order to assure 
> that the condor_kbdd is started and to assure that the process is 
> still running.
>
> Thanks!
>
> Best,
>
> Michael
>

Strange, my daily driver is a Windows laptop that always runs the condor_kbdd.exe - often times I am logging in when no network connectivity is available, and have not observed a kbdd problem in years.  Admittedly I tend to run the lastest release (currently running v8.7.1).  There were some kbdd crash issues fixed back in v8.2.x, but a quick scan of the tickets at wiki.htcondor.org does not reveal any known problems in v8.4.x.

Just brainstorming, but perhaps you could try telling the condor_kbdd to community over the loopback network instead of a "real" IP address (of which perhaps you don't have one).  You could append the following the your HTCondor config to give this a try:

# Tell the condor_kbdd to only use 127.0.0.1 for any/all communication # to the startd, and tell the startd to listen on all network interfaces # (to be certain the startd listens on both ethernet and loopback).
KBDD.NETWORK_INTERFACE = 127.0.0.1
KBDD.BIND_ALL_INTERFACES = False
STARTD.BIND_ALL_INTERFACES = True

Hope the above helps,
Todd

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/