Bug #5337
closedttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_msc
100%
Description
After running ttcn3-bsc-test (actually few hours later), I see some ghost subscribers:
OsmoBSC# show subscriber all IMSI TMSI Use 001019876543210 ffffffff 3 (3*paging-start) 001010000100001 ffffffff 1 (conn)
Here is the relevant talloc chunks:
$ osmo_interact_vty.py -H 127.0.0.1 -p 4242 -c "en; show talloc-context application full filter subscr" full talloc report on 'osmo-bsc' (total 1068914 bytes in 898 blocks) SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000100001)[0x562b2895a4a0] contains 86 bytes in 1 blocks (ref 0) 0x562b2897f090 msc4294967295-conn4294967295_subscr-IMSI-001010000100001 contains 57 bytes in 1 blocks (ref 0) 0x562b28978a00 struct gsm_subscriber_connection contains 6776 bytes in 1 blocks (ref 0) 0x562b289656a0 struct bsc_subscr contains 152 bytes in 3 blocks (ref 0) 0x562b28970aa0 struct osmo_use_count_entry contains 40 bytes in 1 blocks (ref 0) 0x562b289712f0 struct osmo_use_count_entry contains 40 bytes in 1 blocks (ref 0) 0x562b2893c3b0 struct bsc_subscr contains 232 bytes in 5 blocks (ref 0) 0x562b28973ba0 struct osmo_use_count_entry contains 40 bytes in 1 blocks (ref 0) 0x562b28975690 struct osmo_use_count_entry contains 40 bytes in 1 blocks (ref 0) 0x562b28981fb0 struct osmo_use_count_entry contains 40 bytes in 1 blocks (ref 0) 0x562b2897d850 struct osmo_use_count_entry contains 40 bytes in 1 blocks (ref 0) 0x562b2897f3d0
Related issues
Updated by fixeria over 2 years ago
- Status changed from New to In Progress
- Priority changed from Normal to Low
- % Done changed from 0 to 10
The following leak:
IMSI TMSI Use 001010000100001 ffffffff 1 (conn)
can be reproduced by running the BSC_Tests.TC_no_msc.
Updated by fixeria over 2 years ago
- Related to Bug #4832: osmo-bsc hard-releases lchan if no MSC is found added
Updated by fixeria over 2 years ago
fixeria wrote in #note-2:
The following leak:
[...]
can be reproduced by running the BSC_Tests.TC_no_msc.
So in gsm_08_08.c/bsc_compl_l3() we allocate:
- a 'struct bsc_subscr' with IMSI=001010000100001, and
- a 'struct gsm_subscriber_connection' for the allocated subscriber.
I was interested to see if the new connection can be listed using 'show conns' command, and boom!
bsc_vty.c:725:2: runtime error: member access within null pointer of type 'struct bsc_msc_data' AddressSanitizer:DEADLYSIGNAL ================================================================= ==711471==ERROR: AddressSanitizer: SEGV on unknown address 0x00000000003c (pc 0x55c82025b5b6 bp 0x7ffc86eacc50 sp 0x7ffc86eacc20 T0) ==711471==The signal is caused by a READ memory access. ==711471==Hint: address points to the zero page. #0 0x55c82025b5b6 in dump_one_subscr_conn /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/bsc_vty.c:725 #1 0x55c82025bf31 in show_subscr_conn /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/bsc_vty.c:757 #2 0x7fca696030d2 in cmd_execute_command_real ../../../../src/libosmocore/src/vty/command.c:2604 #3 0x7fca69606448 in vty_command ../../../../src/libosmocore/src/vty/vty.c:464 #4 0x7fca69606448 in vty_execute ../../../../src/libosmocore/src/vty/vty.c:729 #5 0x7fca69606448 in vty_read ../../../../src/libosmocore/src/vty/vty.c:1471 #6 0x7fca69608e6d in client_data ../../../../src/libosmocore/src/vty/telnet_interface.c:154 #7 0x7fca695c9907 in poll_disp_fds ../../../src/libosmocore/src/select.c:361 #8 0x7fca695c9907 in _osmo_select_main ../../../src/libosmocore/src/select.c:393 #9 0x7fca695c9a0e in osmo_select_main_ctx ../../../src/libosmocore/src/select.c:449 #10 0x55c8201250e3 in main /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/osmo_bsc_main.c:1087 #11 0x7fca68998b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24) #12 0x55c82011b12d in _start (/home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/osmo-bsc+0x74512d) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /home/wmn/wmn/osmocom/osmo-bsc/src/osmo-bsc/bsc_vty.c:725 in dump_one_subscr_conn ==711471==ABORTING
Updated by fixeria over 2 years ago
- % Done changed from 10 to 20
The following leak:
IMSI TMSI Use 001019876543210 ffffffff 3 (3*paging-start)
can be reproduced by running:
- BSC_Tests.TC_lcs_loc_req_for_active_ms_ta_req,
- BSC_Tests.TC_lcs_loc_req_for_active_ms_le_timeout2,
- BSC_Tests.TC_cm_service_during_lcs_loc_req.
In order to identify them, I hacked BSC_Tests.f_gen_test_hdlr_pars() to randomize IMSI for each test case, because all test cases the same use IMSI '001019876543210'H by default. It might be a good idea to get this patch merged:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26506 BSC_Tests: ramdomize IMSI in f_gen_test_hdlr_pars() [NEW]
Updated by fixeria over 2 years ago
- Status changed from In Progress to New
- Assignee changed from fixeria to neels
I don't feel competent enough to fix this myself, so handing this ticket over to neels.
Updated by fixeria over 2 years ago
Looks like we have even more leaks.
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26506 BSC_Tests: ramdomize IMSI in f_gen_test_hdlr_pars() [NEW]
With this patch applied I am getting the following output:
OsmoBSC# show subscriber all IMSI TMSI Use 001018839845904 ffffffff 1 (paging-start) # TC_lcs_loc_req_for_active_ms_ta_req 001015247946574 ffffffff 1 (paging-start) # TC_lcs_loc_req_for_active_ms_le_timeout2 001019330051280 ffffffff 1 (paging-start) # TC_lcs_loc_req_for_active_ms_ta_req 001019060050196 ffffffff 1 (paging-start) # TC_lcs_loc_req_for_active_ms_le_timeout2 001017749471063 ffffffff 1 (paging-start) # TC_cm_service_during_lcs_loc_req 001010000100001 ffffffff 1 (conn) # TC_no_msc
So both TC_lcs_loc_req_for_active_ms_{ta_req,le_timeout2} trigger two subscriber leaks each.
Updated by neels over 2 years ago
- Subject changed from ttcn3-bsc-test: leaked struct bsc_subscr to ttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_msc
splitting up reported leaks into separate issues
Updated by fixeria about 2 years ago
- Status changed from New to Feedback
- Priority changed from Low to Normal
Adding a quick status update here:
- We started check to the IUT's talloc report in ttcn3-bsc-test (see
f_verify_talloc_count()
)- https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26619 bsc: detect subscr and conn leaks during f_shutdown_helper()
- osmo-bsc master has been fixed by Neels, so everything is green now
This is nice, but (as expected) we started to see regressions in several Jenkins jobs:
- https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/ (+25 since build 1210)
- https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-sccplite-latest/ (+16 since build 1103)
- https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-vamos/ (+4 since build 245)
- The respective CentOS jobs are affected too, as well as
2021q1
and2021q4
As a quick solution, we can disable mamleak checking for everything except '-master'. I prepared a change for osmo-ttcn3-hacks:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/27073 BSC_Tests: add module parameter 'mp_verify_talloc_count' [NEW]
However I think a proper solution would be back-porting the patches from master. laforge, neels what do you think?
As this problem was brought up several times during the weekly review, I am setting the normal priority.
Updated by fixeria about 2 years ago
- Related to Bug #5355: ttcn3-bsc-test: leaked struct bsc_subscr in LCS tests added
Updated by laforge about 2 years ago
On Sat, Feb 05, 2022 at 04:08:50PM +0000, redmine@osmocom.org wrote:
As a quick solution, we can disable mamleak checking for everything except '-master'. I prepared a change for osmo-ttcn3-hacks:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/27073 BSC_Tests: add module parameter 'mp_verify_talloc_count' [NEW]
However I think a proper solution would be back-porting the patches from master. laforge, neels what do you think?
The question is how many those are and how much risk we think those patches pose.
Updated by fixeria about 2 years ago
- Related to Bug #5444: ttcn3-bsc-test-vamos: leaked 'struct bsc_subscr' added
Updated by neels about 2 years ago
- Related to Feature #5446: correlate git version to ttcn3 tests added
Updated by neels about 2 years ago
However I think a proper solution would be back-porting the patches from master. laforge, neels what do you think?
My opinion on this is found here: https://osmocom.org/issues/5446
pasting:
This happens a lot: we improve ttcn3 testing and enhance osmo-foo master, and then
obviously the latest binaries cannot possibly pass the tests. We invent and managa
shims in jenkins.sh and ttcn3 config to not do something or other on latest.
A way to not have this burden would be that the ttcn3 test suite for a program
is correlated to the git version being tested, for example if the ttcn3 is kept
in the same git tree as the program. We should use the 'latest' version of
ttcn3 tests for a latest binary. With an implicit correlation, we can always
use exactly the tests that match the specific git revision that was built, no
matter if it was released or we're just rebasing a local branch.
I think in the long run it would save us a lot of grunt work, and it would
definitely save a lot of code cruft to make specific parts of the tests
optional.
Updated by fixeria about 2 years ago
- Related to Feature #2781: Extend OsmBSC TTCN-3 test coverage regarding resource leaks added
Updated by neels over 1 year ago
- Assignee changed from neels to fixeria
re-reading this issue that is assigned to me, i find that i'm not sure of the status.
Vadim, do you recall more about this issue?
Updated by fixeria over 1 year ago
- Status changed from Feedback to Resolved
- % Done changed from 20 to 100
fixeria wrote in #note-9:
This is nice, but (as expected) we started to see regressions in several Jenkins jobs:
- https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-latest/ (+25 since build 1210)
- https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-sccplite-latest/ (+16 since build 1103)
- https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-vamos/ (+4 since build 245)
osmo-bsc v1.9.0 was tagged 8 weeks ago, so the -latest already does contain the memleak fixes. No regressions seen anymore.
- The respective CentOS jobs are affected too, as well as
2021q1
and2021q4
We stopped running TTCN-3 tests for 2021q1
and 2021q4
; 2022q2
is not affected.