Project

General

Profile

Actions

Bug #4586

closed

osmo-bts-trx leaks memory

Added by ipse almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
06/06/2020
Due date:
% Done:

100%

Spec Reference:

Description

After some operations, osmo-bts-trx has an ever-growing list of records like this when you do show talloc-context application brief:

struct trx_ctrl_msg            contains    168 bytes in   1 blocks (ref 0) 0x55b7420379a0

I'm not sure of the source of this at the moment but it might be related to the use of dynamic TCH/F_PDCH channels.


Files


Related issues

Related to OsmoBTS - Bug #4592: osmo-bts-trx: make sure that handover detection worksStalledfixeria06/07/2020

Actions
Actions #1

Updated by ipse almost 4 years ago

  • Description updated (diff)
Actions #2

Updated by fixeria almost 4 years ago

Would be good to investigate how to reproduce this memleak, but I think I found a potential problem:

https://gerrit.osmocom.org/c/osmo-bts/+/18704 osmo-bts-trx/trx_if: fix memleak in trx_ctrl_cmd_cb()

Unfortunately, ttcn3-bts-test is unable to catch such memleaks because osmo-bts-trx is restarted between the test case executions :/

Actions #3

Updated by ipse almost 4 years ago

fixeria All items I checked manually where related to NOHANDOVER TRXC message:

(gdb) p *((struct trx_ctrl_msg *) 0x55b742039090)
$1 = {list = {next = 0x0, prev = 0x0}, cmd = "NOHANDOVER", '\000' <repeats 17 times>, params = "0 1", '\000' <repeats 96 times>, cmd_len = 10, params_len = 3, critical = 1, cb = 0x0}

Actions #4

Updated by fixeria almost 4 years ago

fixeria All items I checked manually where related to @NOHANDOVER TRXC message:

Interesting.

Can you please attach a TRXC packet capture, so we can see if there are response delays and retransmissions?

Actions #5

Updated by ipse almost 4 years ago

fixeria See attached a trace of a few hours of the capture. This has accidentally captured a restart but otherwise, memory has been leaking constantly (see the attached screenshot).

Actions #6

Updated by ipse almost 4 years ago

The patch indeed fixes the memory leak.

It's not clear to my though why do we keep sending duplicated NOHANDOVER commands at the first place, though.

Actions #7

Updated by fixeria almost 4 years ago

  • Related to Bug #4592: osmo-bts-trx: make sure that handover detection works added
Actions #8

Updated by fixeria almost 4 years ago

It's not clear to my though why do we keep sending duplicated NOHANDOVER commands at the first place, though.

Such a nice bug! I checked your packet capture and immediately noticed that TRXC HANDOVER is never sent to the transceiver, and TRXC NOHANDOVER is sent twice. Most likely, in some rare cases an active channel is deactivated (so we send CMD NOHANDOVER due to a bug instead of CMD HANDOVER) and then activated again, even faster than we get RSP NOHANDOVER from transceiver. In this case we try to enqueue CMD NOHANDOVER again, and here it's leaked because the last command was the same, and we still did not get the response.

Actions #9

Updated by ipse almost 4 years ago

I'm glad that you like the bug :)

I'm attaching the screenshot of the used memory here - notice that it doesn't grow as fast as before after the restart last night when we applied the patch. But it looks like there is another memory leak since it's still growing even if slower.

Here is the current talloc-context output. The scheduler_trx.c items look suspicious:

OsmoBTS# show talloc-context all 2
talloc report on 'null_context' (total 2641062 bytes in 7723 blocks)
  lapd context                   contains   1153 bytes in  37 blocks (ref 0) 0x55e5198d3580
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e8700
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5199193a0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519908190
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198d22c0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519917450
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e7b20
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198ff700
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e8a80
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e51993c440
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198f13a0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e51991d600
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198ea280
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519872210
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5199083a0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198d4660
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519872af0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198fd0a0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e2220
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198d6bf0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519908430
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519925ca0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e51992f7f0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198d3380
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198fdf00
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e1aa0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198ceee0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198f8f90
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198ecbb0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e51992ea20
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519913880
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e5170
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519904830
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e519900a40
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198cd2e0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e20d0
    struct lapd_history            contains     32 bytes in   1 blocks (ref 0) 0x55e5198e7bb0
  struct signal_handler          contains     40 bytes in   1 blocks (ref 0) 0x55e519872b80
  struct pcu_sock_state          contains    120 bytes in   1 blocks (ref 0) 0x55e5197df210
  struct lookup_helper           contains     24 bytes in   1 blocks (ref 0) 0x55e519871200
  struct signal_handler          contains     40 bytes in   1 blocks (ref 0) 0x55e519870d70
  struct signal_handler          contains     40 bytes in   1 blocks (ref 0) 0x55e519870c00
  struct signal_handler          contains     40 bytes in   1 blocks (ref 0) 0x55e5198705e0
  struct signal_handler          contains     40 bytes in   1 blocks (ref 0) 0x55e519860320
  vty                            contains 138180 bytes in 7114 blocks (ref 0) 0x55e519785d40
    struct vty                     contains   9104 bytes in   6 blocks (ref 0) 0x55e519870e00
    save_cwd                       contains      2 bytes in   1 blocks (ref 0) 0x55e5197992f0
    vty_command                    contains  79538 bytes in 4463 blocks (ref 0) 0x55e519785e20
    vty_vector                     contains  49536 bytes in 2643 blocks (ref 0) 0x55e519785db0
  OsmoBTS context                contains 2505505 bytes in 565 blocks (ref 0) 0x55e51976c4b0
    struct trx_ctrl_msg            contains    168 bytes in   1 blocks (ref 0) 0x55e5198bb630
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198dda00
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e5198e0010
    scheduler_trx.c:570            contains    928 bytes in   1 blocks (ref 0) 0x55e51991d1f0
    scheduler_trx.c:1155           contains    928 bytes in   1 blocks (ref 0) 0x55e519918d40
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198f1160
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5199196f0
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e51990ad70
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e5198e9c50
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198d63c0
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e519915500
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d0500
    struct trx_ctrl_msg            contains    168 bytes in   1 blocks (ref 0) 0x55e5198bb740
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198ccfc0
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e5198db270
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198e0390
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e5198e8440
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198cc9e0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198c9bc0
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e519924d60
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198e7820
    scheduler_trx.c:570            contains    928 bytes in   1 blocks (ref 0) 0x55e519932300
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e51992ba50
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e5199194b0
    scheduler_trx.c:1155           contains    928 bytes in   1 blocks (ref 0) 0x55e519931ef0
    scheduler_trx.c:570            contains    928 bytes in   1 blocks (ref 0) 0x55e5198dc720
    scheduler_trx.c:1155           contains    928 bytes in   1 blocks (ref 0) 0x55e5198d6660
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198ead70
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198ca1a0
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e519922370
    scheduler_trx.c:570            contains    928 bytes in   1 blocks (ref 0) 0x55e519919950
    scheduler_trx.c:1155           contains    928 bytes in   1 blocks (ref 0) 0x55e519916ed0
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198cc7a0
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e5198db4b0
    scheduler_trx.c:570            contains    928 bytes in   1 blocks (ref 0) 0x55e51992d6f0
    scheduler_trx.c:1155           contains    928 bytes in   1 blocks (ref 0) 0x55e5198e0910
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e519917830
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198ce900
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e51992e360
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e51992eae0
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198df8c0
    scheduler_trx.c:570            contains    928 bytes in   1 blocks (ref 0) 0x55e519930460
    scheduler_trx.c:929            contains    464 bytes in   1 blocks (ref 0) 0x55e51992c510
    scheduler_trx.c:1155           contains    928 bytes in   1 blocks (ref 0) 0x55e5198ecec0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198e5350
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198efba0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198dbc40
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198c79e0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198edb50
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198da8f0
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d89f0
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d0ae0
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d8090
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d7ab0
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d74d0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d6ef0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d5de0
    scheduler_trx.c:276            contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d5800
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198d2a60
    scheduler_trx.c:1036           contains   1392 bytes in   1 blocks (ref 0) 0x55e5198c7fc0
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198bbf80
    scheduler_trx.c:208            contains    464 bytes in   1 blocks (ref 0) 0x55e5198bbd40
    telnet_connection              contains     89 bytes in   2 blocks (ref 0) 0x55e5198715a0
    struct trx_l1h                 contains  53064 bytes in   1 blocks (ref 0) 0x55e519899d40
    struct trx_l1h                 contains  53064 bytes in   1 blocks (ref 0) 0x55e51987fe00
    struct phy_link                contains 105748 bytes in   5 blocks (ref 0) 0x55e5197dfa00
    struct osmo_stats_reporter     contains    207 bytes in   3 blocks (ref 0) 0x55e5197fcc70
    abis                           contains  49481 bytes in  17 blocks (ref 0) 0x55e519870ab0
    UNNAMED                        contains 102499 bytes in 196 blocks (ref 0) 0x7fe9ace7a090
    struct gsm_bts                 contains 2068795 bytes in 238 blocks (ref 0) 0x55e5197a4490
    logging                        contains   5566 bytes in  11 blocks (ref 0) 0x55e5197855b0
    msgb                           contains  10975 bytes in  28 blocks (ref 0) 0x55e51976c540

Btw, notice that we don't have handover enabled which might be the reason for not sending HANDOVER command.

Here is our BTS config:

phy 0
 instance 0
  osmotrx rx-gain 8
  osmotrx tx-attenuation oml
  osmotrx maxdly 20
 instance 1
  osmotrx rx-gain 8
  osmotrx tx-attenuation oml
  osmotrx maxdly 20
 osmotrx ip local 127.0.0.1
 osmotrx ip remote 127.0.0.1
bts 0
 band ${BAND}
 ipa unit-id 10453 0
 oml remote-ip 10.7.12.108
 gsmtap-sapi ccch
 gsmtap-sapi pdtch
 trx 0
  phy 0 instance 0
 trx 1
  phy 0 instance 1

and the relevant BSC config:

network
...
 neci 1
 paging any use tch 0
 handover 0
 handover algorithm 1
 handover1 window rxlev averaging 10
 handover1 window rxqual averaging 1
 handover1 window rxlev neighbor averaging 10
 handover1 power budget interval 6
 handover1 power budget hysteresis 3
 handover1 maximum distance 9999
 periodic location update 30
...
 bts 6
  type sysmobts
  band DCS1800
  cell_identity 10453
  location_area_code 39
  base_station_id_code 63
  ms max power 33
  cell reselection hysteresis 4
  neighbor-list mode manual
  rxlev access min 0
  radio-link-timeout 32
  channel allocator ascending
  rach tx integer 9
  rach max transmission 7
  channel-description attach 1
  channel-description bs-pa-mfrms 5
  channel-description bs-ag-blks-res 1
  early-classmark-sending forbidden
  ipa unit-id 10453 0
  oml ipa stream-id 255 line 0
  codec-support fr efr
  gprs mode egprs
  gprs routing area 39
  gprs cell bvci 10453
  gprs nsei 10453
  gprs nsvc 0 nsvci 10453
  gprs nsvc 0 local udp port 23000
  gprs nsvc 0 remote udp port 24000
  gprs nsvc 0 remote ip 10.7.12.108
  trx 0
   rf_locked 0
   arfcn 773
   nominal power 37
   max_power_red 0
   rsl e1 tei 0
   timeslot 0
    phys_chan_config CCCH+SDCCH4
    hopping enabled 0
   timeslot 1
    phys_chan_config SDCCH8
    hopping enabled 0
   timeslot 2
    phys_chan_config TCH/F
    hopping enabled 0
   timeslot 3
    phys_chan_config TCH/F
    hopping enabled 0
   timeslot 4
    phys_chan_config TCH/F
    hopping enabled 0
   timeslot 5
    phys_chan_config TCH/F
    hopping enabled 0
   timeslot 6
    phys_chan_config TCH/F
    hopping enabled 0
   timeslot 7
    phys_chan_config TCH/F
    hopping enabled 0
  trx 1
   rf_locked 0
   arfcn 779
   nominal power 37
   max_power_red 0
   rsl e1 tei 0
   timeslot 0
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 1
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 2
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 3
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 4
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 5
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 6
    phys_chan_config TCH/F_PDCH
    hopping enabled 0
   timeslot 7
    phys_chan_config TCH/F_PDCH
    hopping enabled 0

Actions #10

Updated by fixeria almost 4 years ago

But it looks like there is another memory leak since it's still growing even if slower.
Here is the current talloc-context output.

It would be much more informative if you attached two full reports (better as files):

  • one created just after the process is started,
  • another created a few hours/days after,

so we could do differential analysis.

scheduler_trx.c:208 contains 464 bytes in 1 blocks (ref 0) 0x55e5198dda00
scheduler_trx.c:929 contains 464 bytes in 1 blocks (ref 0) 0x55e5198db4b0

Those are buffers for 4 GMSK bursts, allocated by rx_data_fn(), tx_data_fn() for each logical channel (including BCCH).

scheduler_trx.c:570 contains 928 bytes in 1 blocks (ref 0) 0x55e51991d1f0
scheduler_trx.c:1155 contains 928 bytes in 1 blocks (ref 0) 0x55e519918d40

Those are buffers for 8 GMSK bursts, allocated for TCH/F logical channels.

scheduler_trx.c:276 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198d0500
scheduler_trx.c:1036 contains 1392 bytes in 1 blocks (ref 0) 0x55e5198da8f0

Those are buffers for 4 8-PSK bursts, allocated for PDTCH logical channels.

The scheduler_trx.c items look suspicious:

... so I don't see anything suspicious.

Btw, notice that we don't have handover enabled which might be the reason for not sending HANDOVER command.

As I figured out in #4592, osmo-bts-trx sends NOHANDOVER regardless of the previous state, i.e. no matter was it enabled or not.

Actions #11

Updated by ipse almost 4 years ago

Actually, observing it for a couple of days, it looks like there is no memory leak. The memory grows slowly for the first few hours but then stabilizes at ~7.0MiB RSS.

Sorry for the false alarm. I believe we can close this ticket. Thank you for the quick fix!

Actions #13

Updated by fixeria almost 4 years ago

  • Status changed from New to Resolved
  • Assignee set to fixeria
  • % Done changed from 0 to 100

Actually, observing it for a couple of days, it looks like there is no memory leak.

I think I fixed another memory leak even before this one was reported ;)

https://git.osmocom.org/osmo-bts/commit/?id=b8200203d8c650123c7e19cef195ad287f53812f

Please reopen if you see anything suspicious again.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)