[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor 9.0.0 master getting SIGABRT during token request on RHEL/CentOS 8



Dear all,

When upgrading one of our systems running RHEL 8 to HTCondor 9.0.0
(previously we were running 8.9.11 without any problems) we encountered
the condor_master terminating after receiving a SIGABRT. Based on the
logs [1] this seems to be related to using token authentication and
condor trying to request a token from the host running the collector.
The machine where we saw this behavior is a worker node running a STARTD
and having access to a token with ADVERTISE_STARTD permissions.

We were able to reproduce this behavior on a test machine (here we were
only able to use CentOS 8 not RHEL 8) and were able to trace it down to
the following backtrace [2], which points to [3] as the place in
HTCondor where the abort is triggered.

Since this does not seem to related to the specific setup of our hosts,
has anyone encountered a similar issue?

Thanks,
Rene


[1]
04/26/21 12:15:00 (pid:1293) (D_SECURITY) Trying token request to remote
host cloud-htcondor.gridka.de for user (default).
Caught signal 6: si_code=4294967290, si_pid=1293, si_uid=232883,
si_addr=0x50D
Stack dump for process 1293 at timestamp 1619432100 (13 frames)
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(dprintf_dump_stack+0x28)[0x147ceb85baf8]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_Z17unix_sig_coredumpiP9siginfo_tPv+0x6d)[0x147ceba8686d]
/lib64/libpthread.so.0(+0x12dd0)[0x147ce9928dd0]
/lib64/libc.so.6(gsignal+0x10f)[0x147ce958b70f]
/lib64/libc.so.6(abort+0x127)[0x147ce9575b25]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_ZN8htcondor18generate_client_idB5cxx11Ev+0x87)[0x147ceb998157]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(+0x3d10b8)[0x147ceba8d0b8]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(+0x3d18af)[0x147ceba8d8af]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_ZN12TimerManager7TimeoutEPiPd+0x3a3)[0x147cebaa1f13]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_ZN10DaemonCore6DriverEv+0x788)[0x147ceba72ed8]
/hkfs/home/project/hk-project-test-hep/scc-sdm-hep-0001/software/condor/condor-9.0.0-1-x86_64_CentOS8-stripped/usr/sbin/../lib64/libcondor_utils_9_0_0.so(_Z7dc_mainiPPc+0x1890)[0x147ceba8b4d0]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x147ce95776a3]
condor_master(_start+0x2e)[0x558f85cedb4e]

[2]
#0Â 0x00007ffff557499f in raise () from /usr/lib64/libc.so.6
#1Â 0x00007ffff555ecf5 in abort () from /usr/lib64/libc.so.6
#2Â 0x00007ffff7979157 in std::__replacement_assert
(__condition=0x7ffff7a965a8 "__builtin_expect(__n < this->size(),
true)", __function=<synthetic pointer>, __line=932,
ÂÂÂ __file=0x7ffff7a965d8 "/usr/include/c++/8/bits/stl_vector.h") at
/usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2391
#3Â std::vector<char, std::allocator<char> >::operator[] (__n=0,
this=<synthetic pointer>) at /usr/include/c++/8/bits/stl_vector.h:932
#4Â htcondor::generate_client_id[abi:cxx11]() () at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_utils/token_utils.cpp:102
#5Â 0x00007ffff7a6e0b8 in (anonymous
namespace)::TokenRequest::tryTokenRequest (req=...) at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core_main.cpp:462
#6Â 0x00007ffff7a6e8af in (anonymous
namespace)::TokenRequest::tryTokenRequests () at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core_main.cpp:422
#7Â 0x00007ffff7a82f13 in TimerManager::Timeout (this=0x55555579f290,
pNumFired=pNumFired@entry=0x7fffffffdbf4,
pruntime=pruntime@entry=0x7fffffffdbf8)
ÂÂÂ at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/timer_manager.cpp:473
#8Â 0x00007ffff7a53ed8 in DaemonCore::Driver (this=0x5555557a03b0) at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core.cpp:3513
#9Â 0x00007ffff7a6c4d0 in dc_main (argc=1, argv=<optimized out>) at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_daemon_core.V6/daemon_core_main.cpp:4386
#10 0x00007ffff5560873 in __libc_start_main () from /usr/lib64/libc.so.6
#11 0x0000555555560b4e in _start () at
/usr/src/debug/condor-9.0.0-1.el8.x86_64/src/condor_utils/dc_service.h:70

[3]
https://github.com/htcondor/htcondor/blob/V9_0_0/src/condor_utils/token_utils.cpp#L102

-- 
Karlsruher Institut fÃr Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Dr. Renà Caspart

Hermann-von-Helmholtz-Platz 1 
76344 Eggenstein-Leopoldshafen, Germany
Telefon: +49 721 608-25631
E-mail: Rene.Caspart@xxxxxxx


Sitz der KÃrperschaft:
KaiserstraÃe 12, 76131 Karlsruhe



KIT â Die ForschungsuniversitÃt in der Helmholtz-Gemeinschaft


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature