[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor 9.0.0 condor_starter segfaulting during x509 proxy update



Dear all,

After updating to HTCondor 9.0.0 we are experiencing problems on our
worker nodes (running SL7). The condor_starter encounter a segmentation
fault after ~1h runtime. In the StarterLog.slot1_X there is nothing
being logged around that time until the start of the next
condor_starter. In the StartLog I only see the report about the
segmentation fault [1]. In addition a core dump is created. Having a
look at the core dump the corresponding backtrace is [2].

As to me it seems like this is be related to the update of the X509
proxy for the job, I tried submitting a job without a userproxy, which
so far does not trigger this problem. Other than the update of the
HTCondor version nothing changed about the setup and jobs we are submitting.

Has anyone experienced similar issues? Please let me know if any
additional information can be useful to debug this issue.

Thanks,
Rene

[1]
StartLog
04/29/21 13:01:35 (pid:4025) (D_ALWAYS|D_FAILURE) Starter pid 3337908
died on signal 11 (signal 11 (Segmentation fault))

[2]
#0Â 0x00007fa4d9d7f657 in kill () from /usr/lib64/libc.so.6
#1Â 0x00007fa4dc520e64 in unix_sig_coredump (signum=11,
s_info=<optimized out>) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core_main.cpp:1355
#2Â <signal handler called>
#3Â 0x00007fa4db0a9802 in EVP_DigestFinal_ex () from
/usr/lib64/libcrypto.so.10
#4Â 0x00007fa4dc4aee1d in ReliSock::SndMsg::snd_packet
(this=this@entry=0x55caf89bd470, peer_description=0x55caf89bd35c
"<[2a00:139c:5:1dc:0:43:1:8c]:18245>", _sock=_sock@entry=19,
end=end@entry=1,
ÂÂÂ _timeout=_timeout@entry=10) at
/usr/src/debug/condor-9.0.0/src/condor_io/reli_sock.cpp:1199
#5Â 0x00007fa4dc4af538 in ReliSock::end_of_message_internal
(this=this@entry=0x55caf89bd170) at
/usr/src/debug/condor-9.0.0/src/condor_io/reli_sock.cpp:564
#6Â 0x00007fa4dc4af5fc in ReliSock::end_of_message (this=0x55caf89bd170)
at /usr/src/debug/condor-9.0.0/src/condor_io/reli_sock.cpp:546
#7Â 0x00007fa4dc4753de in relisock_gsi_put (arg=0x55caf89bd170,
buf=0x55caf89f99a0, size=667) at
/usr/src/debug/condor-9.0.0/src/condor_io/cedar_no_ckpt.cpp:943
#8Â 0x00007fa4dc3c4fc3 in x509_receive_delegation
(destination_file=destination_file@entry=0x55caf89bdc40 "proxy.tmp",
ÂÂÂ recv_data_func=recv_data_func@entry=0x7fa4dc4752c0
<relisock_gsi_get(void*, void**, unsigned long*)>,
recv_data_ptr=recv_data_ptr@entry=0x55caf89bd170,
ÂÂÂ send_data_func=send_data_func@entry=0x7fa4dc4753b0
<relisock_gsi_put(void*, void*, unsigned long)>,
send_data_ptr=send_data_ptr@entry=0x55caf89bd170,
state_ptr=state_ptr@entry=0x7ffc6dbd1738)
ÂÂÂ at /usr/src/debug/condor-9.0.0/src/condor_utils/globus_utils.cpp:1676
#9Â 0x00007fa4dc47673e in ReliSock::get_x509_delegation
(this=this@entry=0x55caf89bd170, destination=0x55caf89bdc40 "proxy.tmp",
flush_buffers=flush_buffers@entry=false, state_ptr=state_ptr@entry=0x0)
ÂÂÂ at /usr/src/debug/condor-9.0.0/src/condor_io/cedar_no_ckpt.cpp:757
#10 0x000055caf678fe84 in updateX509Proxy (path=0x55caf8993ac5 "proxy",
rsock=0x55caf89bd170, cmd=500) at
/usr/src/debug/condor-9.0.0/src/condor_starter.V6.1/jic_shadow.cpp:1791
#11 JICShadow::updateX509Proxy (this=0x55caf898c4f0, cmd=500,
s=0x55caf89bd170) at
/usr/src/debug/condor-9.0.0/src/condor_starter.V6.1/jic_shadow.cpp:1896
#12 0x000055caf676e65b in Starter::updateX509Proxy (this=<optimized
out>, cmd=<optimized out>, s=<optimized out>) at
/usr/src/debug/condor-9.0.0/src/condor_starter.V6.1/starter.cpp:3723
#13 0x00007fa4dc50b46a in DaemonCore::CallCommandHandler
(this=0x55caf897d0e0, req=500, stream=0x55caf89bd170,
delete_stream=delete_stream@entry=false,
check_payload=check_payload@entry=true,
ÂÂÂ time_spent_on_sec=0.000277000014,
time_spent_waiting_for_payload=time_spent_waiting_for_payload@entry=0)
at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core.cpp:4468
#14 0x00007fa4dc4fa19a in DaemonCommandProtocol::ExecCommand
(this=0x55caf89b35f0) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_command.cpp:1810
#15 0x00007fa4dc4fd385 in DaemonCommandProtocol::doProtocol
(this=this@entry=0x55caf89b35f0) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_command.cpp:176
#16 0x00007fa4dc4fd485 in DaemonCommandProtocol::SocketCallback
(this=this@entry=0x55caf89b35f0, stream=0x55caf89bd170) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_command.cpp:239
#17 0x00007fa4dc50c850 in DaemonCore::CallSocketHandler_worker
(this=0x55caf897d0e0, i=3, default_to_HandleCommand=<optimized out>,
asock=<optimized out>)
ÂÂÂ at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core.cpp:4235
#18 0x00007fa4dc50c8ed in
DaemonCore::CallSocketHandler_worker_demarshall (arg=0x55caf89b2d00) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core.cpp:4194
#19 0x00007fa4dc345dd5 in CondorThreads::pool_add
(routine=routine@entry=0x7fa4dc50c8d0
<DaemonCore::CallSocketHandler_worker_demarshall(void*)>,
arg=arg@entry=0x55caf89b2d00, tid=<optimized out>,
ÂÂÂ descrip=<optimized out>) at
/usr/src/debug/condor-9.0.0/src/condor_utils/condor_threads.cpp:1109
#20 0x00007fa4dc508617 in DaemonCore::CallSocketHandler
(this=this@entry=0x55caf897d0e0, i=@0x7ffc6dbd1c60: 3,
default_to_HandleCommand=default_to_HandleCommand@entry=true)
ÂÂÂ at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core.cpp:4182
#21 0x00007fa4dc51130e in DaemonCore::Driver (this=0x55caf897d0e0) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core.cpp:4019
#22 0x00007fa4dc525d12 in dc_main (argc=2, argv=0x7ffc6dbd2550) at
/usr/src/debug/condor-9.0.0/src/condor_daemon_core.V6/daemon_core_main.cpp:4386
#23 0x00007fa4d9d6b555 in __libc_start_main () from /usr/lib64/libc.so.6
#24 0x000055caf676d961 in _start ()

-- 
Karlsruher Institut fÃr Technologie (KIT)
Steinbuch Centre for Computing (SCC)

Dr. Renà Caspart

Hermann-von-Helmholtz-Platz 1 
76344 Eggenstein-Leopoldshafen, Germany
Telefon: +49 721 608-25631
E-mail: Rene.Caspart@xxxxxxx


Sitz der KÃrperschaft:
KaiserstraÃe 12, 76131 Karlsruhe



KIT â Die ForschungsuniversitÃt in der Helmholtz-Gemeinschaft


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature