[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Condor_kbdd.exe crashes/doesn't start if no network is available



On 5/18/2017 2:17 AM, Michael Schwarzfischer wrote:
Dear all,
We are running condor 8.4.9 on Windows 7 clients. Somehow our network
seems to have some stability issues from time to time…

Especially, after login it seems to happen that the network connection
is not yet fully established leading to a crash in the condor_kbdd.exe.

This can easily be simulated by disabling the network adapter. Without
the connection condor_kbdd.exe just doesn’t start. Furthermore, there is
no logging at all in that case (even with debug logging).

We wonder if there are some workarounds or tricks in order to assure
that the condor_kbdd is started and to assure that the process is still
running.

Thanks!

Best,

Michael


Strange, my daily driver is a Windows laptop that always runs the condor_kbdd.exe - often times I am logging in when no network connectivity is available, and have not observed a kbdd problem in years. Admittedly I tend to run the lastest release (currently running v8.7.1). There were some kbdd crash issues fixed back in v8.2.x, but a quick scan of the tickets at wiki.htcondor.org does not reveal any known problems in v8.4.x.

Just brainstorming, but perhaps you could try telling the condor_kbdd to community over the loopback network instead of a "real" IP address (of which perhaps you don't have one). You could append the following the your HTCondor config to give this a try:

# Tell the condor_kbdd to only use 127.0.0.1 for any/all communication
# to the startd, and tell the startd to listen on all network interfaces
# (to be certain the startd listens on both ethernet and loopback).
KBDD.NETWORK_INTERFACE = 127.0.0.1
KBDD.BIND_ALL_INTERFACES = False
STARTD.BIND_ALL_INTERFACES = True

Hope the above helps,
Todd