Bug #6063
closedttcn3-cbc-test: osmo-cbc is crashing
100%
Description
Since recently, we're seeing several regressions in ttcn3-cbc-test (master):
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-cbc-test/851/ (+8 failing testcases)
There is a core file in the artifacts:
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-cbc-test/851/artifact/logs/cbc/
#0 0x00007f2fcd7f7774 in __GI__IO_default_xsputn (f=0x7ffdc0da55e0, data=0x7f2fcd9164a0 <zeroes>, n=1) at genops.c:374 #1 0x00007f2fcd7eaa63 in __GI__IO_padn (fp=fp@entry=0x7ffdc0da55e0, pad=pad@entry=48, count=count@entry=1) at libioP.h:948 #2 0x00007f2fcd7e14e5 in pad_func (done=4, width=1, padchar=48 '0', s=0x7ffdc0da55e0) at vfprintf-internal.c:196 #3 __vfprintf_internal (s=s@entry=0x7ffdc0da55e0, format=format@entry=0x7f2fcda0b87a "%04d%02d%02d%02d%02d%02d%03d ", ap=ap@entry=0x7ffdc0da5760, mode_flags=mode_flags@entry=2) at vfprintf-internal.c:1646 #4 0x00007f2fcd7f2606 in __vsnprintf_internal (string=0x7f2fcc99d1df "2023", maxlen=<optimized out>, format=0x7f2fcda0b87a "%04d%02d%02d%02d%02d%02d%03d ", args=args@entry=0x7ffdc0da5760, mode_flags=2) at vsnprintf.c:114 #5 0x00007f2fcd882b31 in ___snprintf_chk (s=<optimized out>, maxlen=<optimized out>, flag=<optimized out>, slen=<optimized out>, format=<optimized out>) at snprintf_chk.c:38 #6 0x00007f2fcd9ee967 in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20 #7 0x00007f2fcd9eed4e in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20 #8 0x00007f2fcd9eeefe in osmo_vlogp () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20 #9 0x00007f2fcd9ef197 in logp2 () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20 #10 0x000055fa5ed08f0a in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:105 #11 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #12 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #13 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #14 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #15 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #16 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #17 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #18 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #19 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #20 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #21 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #22 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #23 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #24 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #25 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #26 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #27 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #28 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #29 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #30 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #31 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #32 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #33 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 ... (thousands more)
Looks like a bug/regression in libosmo-netif?
Updated by daniel 11 months ago
Looking at the code there is an obvious and some not so obvious issues. Unfortunately I don't know from the backtrace from where we end up in the recursive loop, but something calls osmo_stream_cli_close() which then calls the disconnect_cb() which calls osmo_stream_cli_reconnect() which first calls osmo_stream_cli_close().....
One obvious change is to set cli->state to CLOSED in osmo_stream_cli_close() before calling disconnect_cb(). Then the recursion end break when the next call to osmo_stream_cli_close() returns early.
Another issue I have with the code is why the disconnect_cb even calls reconnect. osmo_stream_cli handles reconnecting for you.
Furthermore, osmo_stream_cli_destroy() also calls osmo_stream_cli_close(), but then goes on to free osmo_stream_cli. This means that we can't rely on the reconnect in the disconnect_cb() and would even leak the (osmo_)fd.
Updated by daniel 11 months ago
FYI this is the beginning of the backtrace:
#261832 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107 #261833 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #261834 0x00007f2fcd9b7bf2 in osmo_stream_cli_destroy () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #261835 0x000055fa5ed0a560 in cbc_cbsp_link_close (link=0x55fa5fcc3a40) at cbsp_link.c:365 #261836 0x000055fa5ed090d8 in cbc_cbsp_link_cli_read_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:133 #261837 0x00007f2fcd9b7144 in ?? () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11 #261838 0x00007f2fcd9f8a58 in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20 #261839 0x00007f2fcd9f8b37 in osmo_select_main () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20 #261840 0x000055fa5ed049fb in main (argc=3, argv=0x7ffdc15a35d8) at cbc_main.c:314
Updated by daniel 11 months ago
- % Done changed from 0 to 30
Never mind. The osmo_stream_cli_reconnect() just schedules the timer which will be deleted in osmo_stream_cli_destroy() later.
So the one remaining question for me is what the reconnect behaviour of cbc_cbscp_link_cli should be. If it should reconnect we shouldn't call osmo_stream_cli_destroy() and if it shouldn't we shouldn't call osmo_stream_cli_reconnect() in disconnect_cb().
Updated by daniel 11 months ago
https://gerrit.osmocom.org/c/libosmo-netif/+/33344
It would be great if someone else could test with this patch.
I didn't encounter the crash during my tests, but still had the +8 test failures.