Bug #5925
opendtx downlink causing ABORT in osmo-bts-sysmo
20%
Description
With param dtx downlink and making a call, causes
==24631== Process terminating with default action of signal 6 (SIGABRT) ==24631== at 0x6612E87: raise (raise.c:51) ==24631== by 0x66147F0: abort (abort.c:79) ==24631== by 0x5D648AF: osmo_panic (in /usr/lib/x86_64-linux-gnu/libosmocore.so.20.0.0) ==24631== by 0x114E8E: msgb_pull (msgb.h:407) ==24631== by 0x116751: ph_tch_req (l1_if.c:508) ==24631== by 0x116A08: ph_tch_req (l1_if.c:564) ==24631== by 0x116CE9: bts_model_l1sap_down (l1_if.c:632) ==24631== by 0x164A1F: l1sap_down (l1sap.c:1849) ==24631== by 0x16285D: l1sap_tch_rts_ind (l1sap.c:1324) ==24631== by 0x164821: l1sap_up (l1sap.c:1809) ==24631== by 0x117637: handle_ph_readytosend_ind (l1_if.c:886) ==24631== by 0x118438: l1if_handle_ind (l1_if.c:1119)
ph_tch_req() calls itself passing l1sap->oph.msg as msg
then
msgb_pull(msg, sizeof(*l1sap));
is called.
Updated by laforge 3 days ago
- Category set to osmo-bts-sysmo
can you elaborate a bit more in detail? what does "make a call" mean in this context? MO call? MT call? both legs of a call? Which codec was used?
The code path is the downlink transmit path, when the PH-RTS.ind for TCH arrives, and a TCH.req is sent down from l1sap. l1sap_tch_rts_ind
is dequeuing a RTP message from the RTP socket and passing it down. So what this looks like is we're passing a short/truncated message from RTP down into the BTS model?
l1sap_rtp_rx_cb()
. That function
- allocates using
l1sap_msgb_alloc
- copies the rtp payload into the msgb
- pulls the osmo_phsap_prim from the start of the msgb (which l1sap_msgb_alloc allocated there)
- then enqueues it into the dl_tch_queue
l1sap_tch_rts_ind
dequeues the mesasge and pushes a osmo_phsap_prim to its startph_tch_req
then pulls that very osmo_phsap_prim - and thats's when it ABORTs.
This is rather odd, as I currently don't see how even a zero-length RTP message would affect what we do to the header of the msgb.
Updated by fixeria 3 days ago
- Status changed from New to In Progress
laforge wrote in #note-1:
So what this looks like is we're passing a short/truncated message from RTP down into the BTS model? [...]
This is rather odd, as I currently don't see how even a zero-length RTP message would affect what we do to the header of the msgb.
ph_tch_req()
is a recursive function and conditionally calls itself at the very bottom:
if (dtx_recursion(lchan)) /* DTX: send voice after ONSET was sent */
return ph_tch_req(trx, l1sap->oph.msg, l1sap, true, false);
This can be seen in the backtrace:
==24631== by 0x116751: ph_tch_req (l1_if.c:508) ==24631== by 0x116A08: ph_tch_req (l1_if.c:564)
The problem it that it does msgb_pull(msg, sizeof(*l1sap))
twice:
/* create new message and fill data */
if (msg) {
msgb_pull(msg, sizeof(*l1sap));
so first time it pulls sizeof(*l1sap) == 56
bytes, then it attempts to pull 56 bytes again, but fails. I am not familiar with this code, and a bit surprised that queuing more than one DATA.req in response to a single RTS is legal. Looking at this function more closely, I think pull()ing is not really necessary and a potential fix would be to avoid doing so. I will prepare a patch.
Updated by fixeria 2 days ago
- Status changed from In Progress to Feedback
- Assignee changed from fixeria to keith
- % Done changed from 0 to 20
fixeria wrote in #note-4:
Looking at this function more closely, I think pull()ing is not really necessary and a potential fix would be to avoid doing so. I will prepare a patch.
keith please give this patch a try and let us know if it solves the problem:
https://gerrit.osmocom.org/c/osmo-bts/+/31991 osmo-bts-{sysmo,lc15,oc2g}: fix segfault in ph_tch_req() [NEW]
I have no osmo-bts-sysmo at hand (only remote access, which will take time to setup), so I cannot test this patch myself.