Project

General

Profile

Actions

Bug #5354

closed

ttcn3-bts-test: memleaks after running the test suite

Added by fixeria over 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
arehbein
Category:
-
Target version:
-
Start date:
12/12/2021
Due date:
% Done:

100%

Spec Reference:

Description

After running ttcn3-bts-test, I see the following chinks in talloc report:

Chunk 'sched_lchan_xcch.c:82'

$ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter sched_lchan_xcch.c" | wc -l
427
$ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter sched_lchan_xcch.c" 
full talloc report on 'OsmoBTS context' (total 4139819 bytes in 958 blocks)
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x616000023ae0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x616000025be0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x6160004a3ae0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x6160004babe0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x6160004a88e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x6160000276e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x61600002a6e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x6160004f50e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x61600047b2e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x61600047c4e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x6160004806e0
  sched_lchan_xcch.c:82          contains    464 bytes in   1 blocks (ref 0) 0x61600047eee0
...

Chunk 'cbch.c:201'

osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter cbch.c:201" | wc -l
37
osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter cbch.c:201" 
full talloc report on 'OsmoBTS context' (total 4139819 bytes in 958 blocks)
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000014960
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000014820
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x6110000146e0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x6110000145a0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000014460
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000014320
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x6110000141e0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x6110000140a0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000013ba0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000013920
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x6110000136a0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000013560
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000013420
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x6110000131a0
    cbch.c:201                     contains    112 bytes in   1 blocks (ref 0) 0x611000012f20
...

Chunk 'struct tlv_parsed'

$ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter tlv_parsed" | wc -l
40
$ osmo_interact_vty.py -H 127.0.0.1 -p 4241 -c "en; show talloc-context application full filter tlv_parsed" 
    struct tlv_parsed              contains   4106 bytes in   3 blocks (ref 0) 0x55d5028a6320
    struct tlv_parsed              contains   4117 bytes in   7 blocks (ref 0) 0x55d5028a52b0
    struct tlv_parsed              contains   4116 bytes in   4 blocks (ref 0) 0x55d5028a4240
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d5028a31d0
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d5028a2160
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d5028a10f0
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d5028a0080
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d50289f010
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d50289dfa0
    struct tlv_parsed              contains   4098 bytes in   3 blocks (ref 0) 0x55d50289cf30
...

Checklist

  • Chunk 'sched_lchan_xcch.c:82'
  • Chunk 'cbch.c:201'
  • Chunk 'struct tlv_parsed'

Related issues

Related to Core testing infrastructure - Bug #5893: debian-buster-jenkins-arm image runs out of date jenkins_bts_model.shRejectedfixeria02/06/2023

Actions
Actions #1

Updated by fixeria over 2 years ago

  • Checklist item Chunk 'sched_lchan_xcch.c:82' added
  • Checklist item Chunk 'cbch.c:201' added
  • Checklist item Chunk 'struct tlv_parsed' added
Actions #2

Updated by fixeria over 2 years ago

  • Checklist item Chunk 'struct tlv_parsed' set to Done
  • Status changed from New to In Progress
  • % Done changed from 0 to 30

This is most likely not a memory leak. Every time we get OML Set BTS/TRX/TS Attributes, we merge TLVs using osmo_tlvp_copy() and osmo_tlvp_merge(). Each MO has its own 'struct tlv_parsed' chunk, and furthermore each attribute is a chunk too. I submitted several patches improving readability of talloc reports:

https://gerrit.osmocom.org/c/osmo-bts/+/26525 oml: use proper talloc context in oml_rx_set_radio_attr()
https://gerrit.osmocom.org/c/osmo-bts/+/26526 oml: use ts->trx as talloc-context in oml_rx_set_chan_attr()
https://gerrit.osmocom.org/c/osmo-bts/+/26527 oml: fix copy-pasted comments in oml_rx_set_*_attr()
https://gerrit.osmocom.org/c/osmo-bts/+/26528 oml: assign unique names to 'struct tlv_parsed' chunks

Actions #3

Updated by fixeria over 2 years ago

  • Checklist item Chunk 'sched_lchan_xcch.c:82' set to Done
  • % Done changed from 30 to 60

This is memleak was introduced quite a while ago:

commit 7c87612b4219bb236c5d74ca2988443bfb1929c6
Author: Philipp Maier <pmaier@sysmocom.de>
Date:   Sat Nov 14 22:32:29 2020 +0100

    l1sap: add repeated uplink SACCH

Should be fixed by these patches:

https://gerrit.osmocom.org/c/osmo-bts/+/26531 osmo-bts-trx: use l1ts as talloc context for burst buffers
https://gerrit.osmocom.org/c/osmo-bts/+/26532 osmo-bts-trx: fix a memleak in trx_sched_set_lchan()

Actions #4

Updated by fixeria over 2 years ago

  • Status changed from In Progress to Stalled
  • Assignee changed from fixeria to laforge

laforge ARAIR, recently you already fixed some CBCH related memleaks. Would be good if you could (when you have time) take a look at the 'cbch.c:201' chunk. I checked bts_process_smscb_cmd() myself, and could not find anything suspicious. Perhaps the messages somehow remain the the queue, even when the A-bis connection is lost?

This leak can be reproduced by running BTS_Tests_SMSCB.control.

Actions #5

Updated by laforge over 2 years ago

I can confirm the problem exists; I can reproduce it locally now.

However, a brief code review doesn't really make me understand how/where we leak it.

Will have to revisit this again at a later point.

Actions #6

Updated by laforge over 1 year ago

  • Assignee changed from laforge to arehbein

If this can still be reproduced, this might be something for @arehbein to add to his backlog.

So the first course of action is to see if the osmo-bsc still has all those "cbch.c" allocations after the ttcn3-bsc-tests have completed. If yes, we're still leaking memory and the code in osmo-bsc needs some more investigation.

Actions #7

Updated by fixeria over 1 year ago

laforge wrote in #note-6:

If this can still be reproduced, this might be something for @arehbein to add to his backlog.

So the first course of action is to see if the osmo-bsc still has all those "cbch.c" allocations after the ttcn3-bsc-tests have completed. If yes, we're still leaking memory and the code in osmo-bsc needs some more investigation.

FYI: not sure if it was implemented back then when I reported this issue, but as of now the testsuite generates a talloc report for each testcase. These reports can be found in the "Build Artifacts" (last build https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/lastBuild/artifact/) in Jenkins. Currently the BTS_Tests_SMSCB.TC_etws_pcu is executed last, so looking at https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/1831/artifact/logs/bts-tester-generic/BTS_Tests_SMSCB.TC_etws_pcu.talloc/*view*/ (last build ATM), I still see the above-mentioned memleaks. However now it's "struct smscb_msg" chunks, what still corresponds to https://cgit.osmocom.org/osmo-bts/tree/src/common/cbch.c#n201. The problem is still present.

Actions #8

Updated by arehbein about 1 year ago

  • Checklist item Chunk 'cbch.c:201' set to Done
Actions #9

Updated by arehbein about 1 year ago

  • Status changed from Stalled to In Progress
  • % Done changed from 60 to 90

The last memleak should also be fixed now, see https://gerrit.osmocom.org/c/osmo-bts/+/31155
The build for patchsets 1 to 4 failed for some of the jobs, because of the compiler:

cbch.c: In function ‘get_smscb_block’:
cbch.c:143:5: error: suggest explicit braces to avoid ambiguous ‘else’ [-Werror=dangling-else]
  if (block_type->lb)
     ^

I added the braces to fix this.
Should we/can we adapt compilation for the builds to behave the same w.r.t. this warning?

The patch should compile, but it appears there is some dependency issue on the rpi4-raspbian11-build image (was it recently added laforge? Not sure who usually works on those build images or who worked on this one):

configure: error: DAHDI input driver enabled but DAHDI not found

https://jenkins.osmocom.org/jenkins/job/gerrit-osmo-bts-build/BTS_MODEL=trx,FIRMWARE_VERSION=master,WITH_MANUALS=0,a4=default,label=rpi4-raspbian11/60/consoleFull

Actions #10

Updated by laforge about 1 year ago

arehbein wrote in #note-9:

The patch should compile, but it appears there is some dependency issue on the rpi4-raspbian11-build image (was it recently added laforge? Not sure who usually works on those build images or who worked on this one):
[...]

https://jenkins.osmocom.org/jenkins/job/gerrit-osmo-bts-build/BTS_MODEL=trx,FIRMWARE_VERSION=master,WITH_MANUALS=0,a4=default,label=rpi4-raspbian11/60/consoleFull

I've added osmith as watcher to this issue. He is the de-facto maintainer of all of our CI infrastructure. libosmo-abis should always have been built with DAHDI support, so I'm surprised to see this build failure pop up now.

Actions #11

Updated by arehbein about 1 year ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by fixeria about 1 year ago

  • Related to Bug #5893: debian-buster-jenkins-arm image runs out of date jenkins_bts_model.sh added
Actions #13

Updated by fixeria about 1 year ago

Rebasing https://gerrit.osmocom.org/c/osmo-bts/+/31155 on top of the current master made it pass.
TL;DR #5893: your patch was behind https://gerrit.osmocom.org/c/osmo-bts/+/31012.

Actions #14

Updated by arehbein about 1 year ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)