[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Failing to start: Debian 8 (assertion error)



I upgraded some machines to 8.5.1 under Ubuntu 14.04 with no problems. This is using the packages from
http://research.cs.wisc.edu/htcondor/ubuntu/development/

But after upgrading some Debian 8 machines to the packages under
http://research.cs.wisc.edu/htcondor/debian/development/
they won't start. Condor MasterLog says:

01/04/16 21:22:33 Using config source: /etc/condor/condor_config
01/04/16 21:22:33 Using local config sources:
01/04/16 21:22:33ÂÂÂ /etc/condor/condor_config.local
01/04/16 21:22:33 config Macros = 75, Sorted = 75, StringBytes = 2085, TablesBytes = 2748
01/04/16 21:22:33 CLASSAD_CACHING is OFF
01/04/16 21:22:33 Daemon Log is logging: D_ALWAYS D_ERROR
01/04/16 21:22:33 SharedPortEndpoint: waiting for connections to named socket 7709_a017
01/04/16 21:22:33 ERROR "Assertion ERROR on (s.hasAddrs())" at line 1147 in file /slots/03/dir_15104/userdir/src/condor_daemon_core.V6/daemon_core.cpp

In a git checkout I can find no tag V8_5_1 (only V8_5_0), but there is a branch origin/V8_5_1-branch. If I check that out, the offending assertion appears to be here:

char const *
DaemonCore::InfoCommandSinfulStringMyself(bool usePrivateAddress)
{
ÂÂÂÂÂÂÂ static char * sinful_public = NULL;
ÂÂÂÂÂÂÂ static char * sinful_private = NULL;
ÂÂÂÂÂÂÂ static bool initialized_sinful_private = false;

ÂÂÂÂÂÂÂ if( m_shared_port_endpoint ) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ // We do not advertise (or probably even have) our own network
 // port. Instead, we advertise SharedPortServer's port along
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ // with our local id so connections can be forwarded to us.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ char const *addr = m_shared_port_endpoint->GetMyRemoteAddress();

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if( addr ) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ // Remote addresses can be accessed from other machines, so
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ // they must have addrs.
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Sinful s( addr );
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ASSERT( s.hasAddrs() );
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ }

Any ideas what's causing this?

I don't believe there is any particular configuration which is asking for shared ports. I have now reset /etc/condor/condor_config to the distribution one, and pushed out the same /etc/condor/condo as is running happily on the Ubuntu boxes (using ansible), and the problem remains.

However, the Debian boxes *did* previously have a compiled-from-source version of htcondor, so maybe something wasn't cleaned up properly from removing that before installing the package version.

Any ideas?

Thanks,

Brian.