Project

General

Profile

Actions

Bug #5123

closed

coredump nightly mgw on 3g voicecall startup

Added by roh almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
04/20/2021
Due date:
% Done:

100%

Spec Reference:

Description

-nightly dumped core on me trying to start a voicecall:

Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).

range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:CB4F498E Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:CB4F498E CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:CB4F498E In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2  0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
    at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3  osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5  0x0804ed44 in rx_rtp (msg=0x810b090) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6  rtp_data_net (fd=0x810aa80, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7  0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8  _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9  0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) quit

/etc/osmocom/osmo-mgw.cfg

mgcp
  bind ip 10.23.24.1
  rtp port-range 4002 16000
  rtp bind-ip 10.23.24.1
  rtp ip-probing
  rtp ip-tos 184
  bind port 2427
  sdp audio payload number 98
  sdp audio payload name GSM
  number endpoints 31
  loop 0
  force-realloc 1
  rtcp-omit
  rtp-patch ssrc
  rtp-patch timestamp

osmo-mgw 1.8.1+gitr0+9ffaba7c1b-r2.18.0.24


Files

mgw.log mgw.log 6.59 KB roh, 04/20/2021 01:29 PM
mgw2.pcap mgw2.pcap 52.6 KB roh, 04/20/2021 01:29 PM
mgw.log mgw.log 6.24 KB roh, 04/20/2021 01:57 PM
mgw3.pcap mgw3.pcap 26.9 KB roh, 04/20/2021 01:57 PM
my_pcap.pcapng.gz my_pcap.pcapng.gz 42.6 KB pespin, 04/20/2021 04:09 PM

Related issues

Related to OsmoMGW - Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input dataResolveddexter04/18/2021

Actions
Actions #1

Updated by laforge almost 3 years ago

  • Related to Bug #5119: mgcp_client.c should not assert on unexpected codec name in the input data added
Actions #2

Updated by laforge almost 3 years ago

  • Assignee set to dexter
  • Priority changed from Normal to High

In general, no matter what happens at a remote implementation that sends packets to us, we must never OSMO_ASSERT(). This is a serious problem. OSMO_ASSERT() is to guard against conditions entirely under control of our implementation (mgw in this case).

Any remote user, even a malicious one, must always be ble to send us anything without us running into OSMO_ASSERT(). If a remote user can trigger this, it's a denial of service vulnerability.

Actions #3

Updated by laforge almost 3 years ago

The pcap file shows UDP packets from 10.23.24.192 to the MGW at 10.23.24.1 port 4002. Those are definitely IPv4 packets, so AF_INET.

Can you go to "frame 4" (and then print the two values tha triger the assert, e.g. libosmo-mgcp/mgcp_network.c:1272)

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
(gdb) frame 4
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
(gdb) p from_addr->u.sa.sa_family

Actions #4

Updated by roh almost 3 years ago

Starting program: /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
warning: the debug information found in "/usr/lib/.debug/libhogweed.so.4.3" does not match "/usr/lib/libhogweed.so.4" (CRC mismatch).

range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:933CE96A Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:933CE96A CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:933CE96A In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb7f2e49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7f21633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb7f216a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]

Program received signal SIGABRT, Aborted.
__GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
51    /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=6) at /usr/src/debug/glibc/2.25-r0/git/sysdeps/unix/sysv/linux/raise.c:51
#1  0x4334f5cf in __GI_abort () at /usr/src/debug/glibc/2.25-r0/git/stdlib/abort.c:89
#2  0xb7f2e4a2 in osmo_panic_default (args=0xbffffad4 "\344\325\005\bx\307\005\b\370\004", fmt=0x805c4c2 "Assert failed %s %s:%d\n")
    at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:49
#3  osmo_panic (fmt=0x805c4c2 "Assert failed %s %s:%d\n") at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/panic.c:84
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
#5  0x0804ed44 in rx_rtp (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1514
#6  rtp_data_net (fd=0x81274e0, what=1) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1477
#7  0xb7f21633 in poll_disp_fds (n_fd=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:350
#8  _osmo_select_main (polling=<optimized out>) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:378
#9  0xb7f216a3 in osmo_select_main (polling=0) at /usr/src/debug/libosmocore/1.5.1+gitrAUTOINC+49766ab1b6-r2.18.0/git/src/select.c:417
#10 0x0804acc7 in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/osmo-mgw/mgw_main.c:406
(gdb) frame 4
#4  0x08051271 in mgcp_dispatch_rtp_bridge_cb (msg=0x8127af0) at /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c:1272
1272    /usr/src/debug/osmo-mgw/1.8.1+gitrAUTOINC+9ffaba7c1b-r2.18.0/git/src/libosmo-mgcp/mgcp_network.c: No such file or directory.
(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0
(gdb) p from_addr->u.sa.sa_family
value has been optimized out
(gdb) 

Actions #5

Updated by laforge almost 3 years ago

mgcp traffic is not in the pcap file.

Actions #6

Updated by roh almost 3 years ago

tcpdump -s0 -w mgw3.pcap port not 22 -i any

Actions #7

Updated by laforge almost 3 years ago

So the

(gdb) p conn->u.rtp.end.addr.u.sa.sa_family
$1 = 0

already tells us that it's neither AF_INET (2) nor AF_INET6 (20), but either uninitialized or AF_UNSPEC, while the received packet is of course AF_INET...

Actions #8

Updated by laforge almost 3 years ago

tentative fix in https://gerrit.osmocom.org/c/osmo-mgw/+/23812 but I don't understand enough of osmo-mgw to know if it's the correct way to solve or not. It seems more reasonable that after CRCX the conn->u.rtp.end.addr.u.sa.sa_family is properly initialized?

Actions #9

Updated by pespin almost 3 years ago

Indeed, the problem is similar to that of "A]" in SYS#5435. That is, nano3g is starting to send data to us really quickly, immediately after receiving RAB-ASsignment Request and before answering with RAB-Assignment Response (I actually see none of those in the pcap trace I took myself...)

So, the problem is that mgw is receiving RTP traffic on the endpoint at a time where it only went through CRCX + CRCX ACK, setting up the local address, but never got a MDCX from osmo-msc (due to no Assignment Response?) to set the remote address, here the AF_UNSET.

Actions #10

Updated by pespin almost 3 years ago

I also add a pcap I took myself while seeing the issue in roh's setup.

# /usr/bin/osmo-mgw -s -c /etc/osmocom/osmo-mgw.cfg
range must end at an odd port number, autocorrecting port (16000) to: 16001
<0002> ../../../git/src/vty/telnet_interface.c:104 Available via telnet 127.0.0.1 4243
<0009> ../../../git/src/ctrl/control_if.c:916 CTRL at 127.0.0.1 4267
<0012> ../../../git/src/osmo-mgw/mgw_main.c:391 Configured for MGCP, listen on 10.23.24.1:2427
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:751 endpoint:rtpbridge/1@mgw CRCX: creating new connection ...
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:83 endpoint:rtpbridge/1@mgw RTP-setup: Endpoint is in loopback mode, stopping here!
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:237 endpoint:rtpbridge/1@mgw CI:B520FAE4 Failed to send dummy RTP packet.
<0012> ../../../git/src/libosmo-mgcp/mgcp_protocol.c:998 endpoint:rtpbridge/1@mgw CI:B520FAE4 CRCX: connection successfully created
<0000> ../../../git/src/libosmo-mgcp/mgcp_network.c:1056 endpoint:rtpbridge/1@mgw CI:B520FAE4 In loopback mode and remote address not set: allowing data from address: 10.23.24.192
Assert failed conn->u.rtp.end.addr.u.sa.sa_family == from_addr->u.sa.sa_family ../../../git/src/libosmo-mgcp/mgcp_network.c:1272
backtrace() returned 9 addresses
/usr/lib/libosmocore.so.17(osmo_panic+0x4a) [0xb763f49d]
/usr/bin/osmo-mgw() [0x8051271]
/usr/bin/osmo-mgw() [0x804ed44]
/usr/lib/libosmocore.so.17(+0xb633) [0xb7632633]
/usr/lib/libosmocore.so.17(osmo_select_main+0xc) [0xb76326a3]
/usr/bin/osmo-mgw() [0x804acc7]
/lib/libc.so.6(__libc_start_main+0xf9) [0x4333c290]
/usr/bin/osmo-mgw() [0x804adc6]
Aborted (core dumped)
# cat /etc/osmocom/osmo-mgw.cfg
!
! MGCP configuration example
!
log file /home/root/mgw.log
  logging filter all 1
  logging color 1
  logging print category-hex 1
  logging print category 0
  logging timestamp 1
  logging print file 1
  logging level set-all debug
mgcp
  bind ip 10.23.24.1
  rtp port-range 4002 16000
  rtp bind-ip 10.23.24.1
  rtp ip-probing
  rtp ip-tos 184
  bind port 2427
  sdp audio payload number 98
  sdp audio payload name GSM
  number endpoints 512
  loop 0
  force-realloc 1
  rtcp-omit
  rtp-patch ssrc
  rtp-patch timestamp
Actions #11

Updated by pespin almost 3 years ago

The related address bits which trigger the crash from the assert (addr) are set in code path:

mgcp_parse_sdp_data:
    case 'c':
        if (audio_ip_from_sdp(&rtp->addr, line) < 0)
mgcp_parse_sdp_data:
    case 'c':
        if (audio_ip_from_sdp(&rtp->addr, line) < 0)

That is, when osmo-msc/bsc sends CRCX or MDCX with SDP and "c" option set.
In the pcap trace causing the crash, it can be seen that only 1 CRCX is sent before receiving the RTP packet which triggers the assert, and this CRCX contains no "c" option.

I would simply drop that ASSERT since it's not useful at all and only causes problems.

It should be fairly simple to create a TTCN3 MGCP_Tests that triggers the crash by sending a CRCX without "c=" option to MGW, receive the CRCX ACK with the mgw-side rtp socket and send an RTP packet there. Then, with current osmo-mgw master it should crash. Then correct behavior can be checked by sending an MDCX with "c=" after sending the first RTP pkt and receiving a MDCX ACK (it wouldn't send us an ACK if it crashed beforehand). Leaving that to dexter if he feels like adding that test.

Actions #12

Updated by dexter almost 3 years ago

  • Status changed from New to In Progress
Actions #13

Updated by dexter almost 3 years ago

  • % Done changed from 0 to 90

I think I have fixed the problem now. The following TTCN3 test triggers the problem:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24173 MGCP_Test: test LOOPBACK with implicit destination addr

I have now dropped the OSMO_ASSERT() but I do not understand why the OSMO_ASSERT() is even there since it defeats the purpose of the code. When the call agent does not specify the destination address in loopback mode then the sa_family is of course not initialized and different from the from-address. So its indeed correct to remove the OSMO_ASSERT().

I also noticed that there is a problem with writing the sa_family, I do not understand this fully but I think it is better to copy the address as a whole anyway. Since the event happens only once and is a bit unusual, I think its a good idea to put a log statement.

See also:
https://gerrit.osmocom.org/c/osmo-mgw/+/24174 mgcp_network: fix implicit address loopback

Actions #14

Updated by dexter almost 3 years ago

The patch for osmo-mgw is merged but TC_one_crcx_loopback_rtp_implicit is still failing. This needs to be checked.

Actions #15

Updated by dexter almost 3 years ago

It turned out that the problem with TC_one_crcx_loopback_rtp_implicit was IPv6 related. The MGW is returning an IPv6 address when no local address is sent with the first CRCX. I have changed TC_one_crcx_loopback_rtp_implicit now that it expects IPv6 instead of IPv4.

See also: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/24250

Actions #16

Updated by dexter almost 3 years ago

  • Status changed from In Progress to Resolved
  • Assignee changed from dexter to roh
  • % Done changed from 90 to 100

The problems with the OSMO_ASSERT are resolved and the TTCN3 tests pass, so I think this can be closed.

(assigning this back to roh, so he can have a look himself and retest if he thinks this is necessary)

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)