Project

General

Profile

Actions

Bug #6063

closed

ttcn3-cbc-test: osmo-cbc is crashing

Added by fixeria 11 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
06/17/2023
Due date:
% Done:

100%

Spec Reference:

Description

Since recently, we're seeing several regressions in ttcn3-cbc-test (master):

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-cbc-test/851/ (+8 failing testcases)

There is a core file in the artifacts:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-cbc-test/851/artifact/logs/cbc/

#0  0x00007f2fcd7f7774 in __GI__IO_default_xsputn (f=0x7ffdc0da55e0, data=0x7f2fcd9164a0 <zeroes>, n=1) at genops.c:374
#1  0x00007f2fcd7eaa63 in __GI__IO_padn (fp=fp@entry=0x7ffdc0da55e0, pad=pad@entry=48, count=count@entry=1) at libioP.h:948
#2  0x00007f2fcd7e14e5 in pad_func (done=4, width=1, padchar=48 '0', s=0x7ffdc0da55e0) at vfprintf-internal.c:196
#3  __vfprintf_internal (s=s@entry=0x7ffdc0da55e0, format=format@entry=0x7f2fcda0b87a "%04d%02d%02d%02d%02d%02d%03d ", ap=ap@entry=0x7ffdc0da5760, 
    mode_flags=mode_flags@entry=2) at vfprintf-internal.c:1646
#4  0x00007f2fcd7f2606 in __vsnprintf_internal (string=0x7f2fcc99d1df "2023", maxlen=<optimized out>, format=0x7f2fcda0b87a "%04d%02d%02d%02d%02d%02d%03d ", 
    args=args@entry=0x7ffdc0da5760, mode_flags=2) at vsnprintf.c:114
#5  0x00007f2fcd882b31 in ___snprintf_chk (s=<optimized out>, maxlen=<optimized out>, flag=<optimized out>, slen=<optimized out>, format=<optimized out>)
    at snprintf_chk.c:38
#6  0x00007f2fcd9ee967 in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20
#7  0x00007f2fcd9eed4e in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20
#8  0x00007f2fcd9eeefe in osmo_vlogp () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20
#9  0x00007f2fcd9ef197 in logp2 () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20
#10 0x000055fa5ed08f0a in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:105
#11 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#12 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#13 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#14 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#15 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#16 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#17 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#18 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#19 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#20 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#21 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#22 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#23 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#24 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#25 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#26 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#27 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#28 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#29 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#30 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#31 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#32 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#33 0x00007f2fcd9b6a1e in osmo_stream_cli_reconnect () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
... (thousands more)

Looks like a bug/regression in libosmo-netif?

Actions #1

Updated by daniel 11 months ago

  • Status changed from New to In Progress
Actions #2

Updated by daniel 11 months ago

Looking at the code there is an obvious and some not so obvious issues. Unfortunately I don't know from the backtrace from where we end up in the recursive loop, but something calls osmo_stream_cli_close() which then calls the disconnect_cb() which calls osmo_stream_cli_reconnect() which first calls osmo_stream_cli_close().....

One obvious change is to set cli->state to CLOSED in osmo_stream_cli_close() before calling disconnect_cb(). Then the recursion end break when the next call to osmo_stream_cli_close() returns early.

Another issue I have with the code is why the disconnect_cb even calls reconnect. osmo_stream_cli handles reconnecting for you.

Furthermore, osmo_stream_cli_destroy() also calls osmo_stream_cli_close(), but then goes on to free osmo_stream_cli. This means that we can't rely on the reconnect in the disconnect_cb() and would even leak the (osmo_)fd.

Actions #3

Updated by daniel 11 months ago

FYI this is the beginning of the backtrace:

#261832 0x000055fa5ed08fae in cbc_cbsp_link_cli_disconnect_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:107
#261833 0x00007f2fcd9b6939 in osmo_stream_cli_close () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#261834 0x00007f2fcd9b7bf2 in osmo_stream_cli_destroy () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#261835 0x000055fa5ed0a560 in cbc_cbsp_link_close (link=0x55fa5fcc3a40) at cbsp_link.c:365
#261836 0x000055fa5ed090d8 in cbc_cbsp_link_cli_read_cb (conn=0x55fa5fcc3de0) at cbsp_link.c:133
#261837 0x00007f2fcd9b7144 in ?? () from /usr/lib/x86_64-linux-gnu/libosmonetif.so.11
#261838 0x00007f2fcd9f8a58 in ?? () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20
#261839 0x00007f2fcd9f8b37 in osmo_select_main () from /usr/lib/x86_64-linux-gnu/libosmocore.so.20
#261840 0x000055fa5ed049fb in main (argc=3, argv=0x7ffdc15a35d8) at cbc_main.c:314

Actions #4

Updated by daniel 11 months ago

  • % Done changed from 0 to 30

Never mind. The osmo_stream_cli_reconnect() just schedules the timer which will be deleted in osmo_stream_cli_destroy() later.

So the one remaining question for me is what the reconnect behaviour of cbc_cbscp_link_cli should be. If it should reconnect we shouldn't call osmo_stream_cli_destroy() and if it shouldn't we shouldn't call osmo_stream_cli_reconnect() in disconnect_cb().

Actions #5

Updated by daniel 11 months ago

https://gerrit.osmocom.org/c/libosmo-netif/+/33344

It would be great if someone else could test with this patch.
I didn't encounter the crash during my tests, but still had the +8 test failures.

Actions #6

Updated by daniel 11 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 30 to 100

fixeria reported that it worked fine on his setup. After merging the tests went back to normal:
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-cbc-test/

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)