Bug #5354
closedttcn3-bts-test: memleaks after running the test suite
100%
Description
After running ttcn3-bts-test, I see the following chinks in talloc report:
Chunk 'sched_lchan_xcch.c:82'¶
$ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter sched_lchan_xcch.c" | wc -l 427 $ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter sched_lchan_xcch.c" full talloc report on 'OsmoBTS context' (total 4139819 bytes in 958 blocks) sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x616000023ae0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x616000025be0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x6160004a3ae0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x6160004babe0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x6160004a88e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x6160000276e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x61600002a6e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x6160004f50e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x61600047b2e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x61600047c4e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x6160004806e0 sched_lchan_xcch.c:82 contains 464 bytes in 1 blocks (ref 0) 0x61600047eee0 ...
Chunk 'cbch.c:201'¶
osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter cbch.c:201" | wc -l 37 osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter cbch.c:201" full talloc report on 'OsmoBTS context' (total 4139819 bytes in 958 blocks) cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000014960 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000014820 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x6110000146e0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x6110000145a0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000014460 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000014320 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x6110000141e0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x6110000140a0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000013ba0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000013920 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x6110000136a0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000013560 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000013420 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x6110000131a0 cbch.c:201 contains 112 bytes in 1 blocks (ref 0) 0x611000012f20 ...
Chunk 'struct tlv_parsed'¶
$ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter tlv_parsed" | wc -l 40 $ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter tlv_parsed" struct tlv_parsed contains 4106 bytes in 3 blocks (ref 0) 0x55d5028a6320 struct tlv_parsed contains 4117 bytes in 7 blocks (ref 0) 0x55d5028a52b0 struct tlv_parsed contains 4116 bytes in 4 blocks (ref 0) 0x55d5028a4240 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d5028a31d0 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d5028a2160 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d5028a10f0 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d5028a0080 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d50289f010 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d50289dfa0 struct tlv_parsed contains 4098 bytes in 3 blocks (ref 0) 0x55d50289cf30 ...
Related issues
Updated by fixeria over 2 years ago
- Checklist item Chunk 'sched_lchan_xcch.c:82' added
- Checklist item Chunk 'cbch.c:201' added
- Checklist item Chunk 'struct tlv_parsed' added
Updated by fixeria over 2 years ago
- Checklist item Chunk 'struct tlv_parsed' set to Done
- Status changed from New to In Progress
- % Done changed from 0 to 30
This is most likely not a memory leak. Every time we get OML Set BTS/TRX/TS Attributes, we merge TLVs using osmo_tlvp_copy() and osmo_tlvp_merge(). Each MO has its own 'struct tlv_parsed' chunk, and furthermore each attribute is a chunk too. I submitted several patches improving readability of talloc reports:
https://gerrit.osmocom.org/c/osmo-bts/+/26525 oml: use proper talloc context in oml_rx_set_radio_attr()
https://gerrit.osmocom.org/c/osmo-bts/+/26526 oml: use ts->trx as talloc-context in oml_rx_set_chan_attr()
https://gerrit.osmocom.org/c/osmo-bts/+/26527 oml: fix copy-pasted comments in oml_rx_set_*_attr()
https://gerrit.osmocom.org/c/osmo-bts/+/26528 oml: assign unique names to 'struct tlv_parsed' chunks
Updated by fixeria over 2 years ago
- Checklist item Chunk 'sched_lchan_xcch.c:82' set to Done
- % Done changed from 30 to 60
This is memleak was introduced quite a while ago:
commit 7c87612b4219bb236c5d74ca2988443bfb1929c6 Author: Philipp Maier <pmaier@sysmocom.de> Date: Sat Nov 14 22:32:29 2020 +0100 l1sap: add repeated uplink SACCH
Should be fixed by these patches:
https://gerrit.osmocom.org/c/osmo-bts/+/26531 osmo-bts-trx: use l1ts as talloc context for burst buffers
https://gerrit.osmocom.org/c/osmo-bts/+/26532 osmo-bts-trx: fix a memleak in trx_sched_set_lchan()
Updated by fixeria over 2 years ago
- Status changed from In Progress to Stalled
- Assignee changed from fixeria to laforge
laforge ARAIR, recently you already fixed some CBCH related memleaks. Would be good if you could (when you have time) take a look at the 'cbch.c:201' chunk. I checked bts_process_smscb_cmd() myself, and could not find anything suspicious. Perhaps the messages somehow remain the the queue, even when the A-bis connection is lost?
This leak can be reproduced by running BTS_Tests_SMSCB.control.
Updated by laforge over 2 years ago
I can confirm the problem exists; I can reproduce it locally now.
However, a brief code review doesn't really make me understand how/where we leak it.
Will have to revisit this again at a later point.
Updated by laforge over 1 year ago
- Assignee changed from laforge to arehbein
If this can still be reproduced, this might be something for @arehbein to add to his backlog.
So the first course of action is to see if the osmo-bsc still has all those "cbch.c" allocations after the ttcn3-bsc-tests have completed. If yes, we're still leaking memory and the code in osmo-bsc needs some more investigation.
Updated by fixeria over 1 year ago
laforge wrote in #note-6:
If this can still be reproduced, this might be something for @arehbein to add to his backlog.
So the first course of action is to see if the osmo-bsc still has all those "cbch.c" allocations after the ttcn3-bsc-tests have completed. If yes, we're still leaking memory and the code in osmo-bsc needs some more investigation.
FYI: not sure if it was implemented back then when I reported this issue, but as of now the testsuite generates a talloc report for each testcase. These reports can be found in the "Build Artifacts" (last build https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/lastBuild/artifact/) in Jenkins. Currently the BTS_Tests_SMSCB.TC_etws_pcu
is executed last, so looking at https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/1831/artifact/logs/bts-tester-generic/BTS_Tests_SMSCB.TC_etws_pcu.talloc/*view*/ (last build ATM), I still see the above-mentioned memleaks. However now it's "struct smscb_msg" chunks, what still corresponds to https://cgit.osmocom.org/osmo-bts/tree/src/common/cbch.c#n201. The problem is still present.
Updated by arehbein about 1 year ago
- Checklist item Chunk 'cbch.c:201' set to Done
Updated by arehbein about 1 year ago
- Status changed from Stalled to In Progress
- % Done changed from 60 to 90
The last memleak should also be fixed now, see https://gerrit.osmocom.org/c/osmo-bts/+/31155
The build for patchsets 1 to 4 failed for some of the jobs, because of the compiler:
cbch.c: In function ‘get_smscb_block’: cbch.c:143:5: error: suggest explicit braces to avoid ambiguous ‘else’ [-Werror=dangling-else] if (block_type->lb) ^
I added the braces to fix this.
Should we/can we adapt compilation for the builds to behave the same w.r.t. this warning?
The patch should compile, but it appears there is some dependency issue on the rpi4-raspbian11
-build image (was it recently added laforge? Not sure who usually works on those build images or who worked on this one):
configure: error: DAHDI input driver enabled but DAHDI not found
Updated by laforge about 1 year ago
arehbein wrote in #note-9:
The patch should compile, but it appears there is some dependency issue on the
rpi4-raspbian11
-build image (was it recently added laforge? Not sure who usually works on those build images or who worked on this one):
[...]
I've added osmith as watcher to this issue. He is the de-facto maintainer of all of our CI infrastructure. libosmo-abis should always have been built with DAHDI support, so I'm surprised to see this build failure pop up now.
Updated by arehbein about 1 year ago
- Status changed from In Progress to Feedback
Updated by fixeria about 1 year ago
- Related to Bug #5893: debian-buster-jenkins-arm image runs out of date jenkins_bts_model.sh added
Updated by fixeria about 1 year ago
Rebasing https://gerrit.osmocom.org/c/osmo-bts/+/31155 on top of the current master made it pass.
TL;DR #5893: your patch was behind https://gerrit.osmocom.org/c/osmo-bts/+/31012.
Updated by arehbein about 1 year ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100