Project

General

Profile

Bug #2885

OsmoMSC crashes on MNCC disconnect

Added by laforge 7 months ago. Updated 5 days ago.

Status:
In Progress
Priority:
Urgent
Assignee:
Category:
MGCP towards MGW
Target version:
-
Start date:
01/27/2018
Due date:
% Done:

0%

Estimated time:
Resolution:

Description

<0004> gsm_04_08.c:1359 transmit message MNCC_CALL_CONF_IND
<0007> msc_mgcp.c:280 MGW(MGW_0)[0xa9bb0c0]{ST_CRCX_CN}: CRCX/RAN: response yields error: 542 FORCED_FAIL
<0007> msc_mgcp.c:281 MGW(MGW_0)[0xa9bb0c0]{ST_CRCX_CN}: operation failed on MGW -- graceful shutdown...
<0007> msc_mgcp.c:730 MGW(MGW_0)[0xa9bb0c0]{ST_HALT}: DLCX: response yields error: 250 OK
<0007> msc_mgcp.c:731 MGW(MGW_0)[0xa9bb0c0]{ST_HALT}: operation failed on MGW -- graceful shutdown...
<0007> msc_mgcp.c:157 MGW(MGW_0)[0xa9bb0c0]{ST_HALT}: transition to state ST_CALL not permitted!
<0004> gsm_04_08.c:1359 transmit message MNCC_DISC_IND
<0012> input/ipa.c:67 connection closed with server
<0004> mncc_sock.c:85 MNCC Socket has LOST connection
<0001> gsm_04_08.c:191 Clearing all currently active transactions!!!
==17608== Invalid read of size 8
==17608==    at 0x128B6A: msc_mgcp_call_release (msc_mgcp.c:1052)
==17608==    by 0x11ED50: _gsm48_cc_trans_free (gsm_04_08.c:1419)
==17608==    by 0x12BF94: trans_free (transaction.c:123)
==17608==    by 0x11CFEA: gsm0408_clear_all_trans (gsm_04_08.c:196)
==17608==    by 0x125A07: mncc_sock_close (mncc_sock.c:95)
==17608==    by 0x125B1E: mncc_sock_read (mncc_sock.c:140)
==17608==    by 0x125B1E: mncc_sock_cb (mncc_sock.c:198)
==17608==    by 0x56D0950: osmo_fd_disp_fds (select.c:216)
==17608==    by 0x56D0950: osmo_select_main (select.c:256)
==17608==    by 0x11371B: main (msc_main.c:546)
==17608==  Address 0xaaa3810 is 96 bytes inside a block of size 200 free'd
==17608==    at 0x4C2DDBB: free (vg_replace_malloc.c:530)
==17608==    by 0x505BE82: _talloc_free (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.10)
==17608==    by 0x56D3C8E: _osmo_fsm_inst_dispatch (fsm.c:450)
==17608==    by 0x12830B: fsm_timeout_cb (msc_mgcp.c:204)
==17608==    by 0x56D4458: fsm_tmr_cb (fsm.c:185)
==17608==    by 0x56D0305: osmo_timers_update (timer.c:257)
==17608==    by 0x56D0904: osmo_select_main (select.c:253)
==17608==    by 0x11371B: main (msc_main.c:546)
==17608==  Block was alloc'd at
==17608==    at 0x4C2CB8F: malloc (vg_replace_malloc.c:299)
==17608==    by 0x505E150: _talloc_zero (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.10)
==17608==    by 0x128448: msc_mgcp_call_assignment (msc_mgcp.c:902)
==17608==    by 0x11C578: gsm48_cc_rx_call_conf (gsm_04_08.c:1847)
==17608==    by 0x11FE8C: gsm0408_rcv_cc (gsm_04_08.c:3269)
==17608==    by 0x11FE8C: gsm0408_dispatch (gsm_04_08.c:3380)
==17608==    by 0x12D05C: msc_dtap (osmo_msc.c:108)
==17608==    by 0x116BB2: rx_dtap (a_iface_bssap.c:683)
==17608==    by 0x116BB2: a_sccp_rx_dt (a_iface_bssap.c:710)
==17608==    by 0x114367: sccp_sap_up (a_iface.c:529)
==17608==    by 0x56D3C8E: _osmo_fsm_inst_dispatch (fsm.c:450)
==17608==    by 0x5D5D9D4: sccp_scoc_rx_from_scrc (sccp_scoc.c:1581)
==17608==    by 0x5D5B6CA: scrc_rx_mtp_xfer_ind_xua (sccp_scrc.c:449)
==17608==    by 0x5D5E5A4: mtp_user_prim_cb (sccp_user.c:176)
==17608== 

So it seems that upon MNCC disconnect, it tries to free some MGCP state again, which was already free'd due to an earlier MGCP failure.

20180127-osmomsc-mtcall-ran-crcx-disc-crash.pcap 20180127-osmomsc-mtcall-ran-crcx-disc-crash.pcap 5.25 KB pcap file showing the problem laforge, 01/27/2018 08:54 AM

History

#1 Updated by laforge 6 months ago

  • Assignee changed from dexter to daniel

#2 Updated by laforge 5 months ago

  • Assignee changed from daniel to sysmocom

#3 Updated by laforge 4 months ago

  • Assignee changed from sysmocom to stsp

#4 Updated by neels 4 months ago

(if this touches MSC's subscriber connection allocation/cleanup scheme at all, rather wait for the pending refactoring patch to be merged; if in doubt, ask me)

#5 Updated by laforge 8 days ago

  • Priority changed from High to Urgent

reclassifying as urgent. If we have a reproducible crash in osmo-mcs for 6 months, it's a really serious issue that needs to be investigate now.

#6 Updated by stsp 7 days ago

Unfortunately, there is not enough context provided in this issue to allow anyone to easily reproduce the problem.
At least not enough context for someone who has never even used this part of the software stack before.

A simple test where osmo-sip-connector is started and stopped on the mncc socket does not trigger the crash.
The pcap file suggests that a phone call is required. What is the best way to simulate this locally?
Will I need a full-blown osmo-sip-connector + asterisk setup and make an actual call?

Do we already know something about what needs to be done to reproduce the related "earlier MGCP failure" mentioned in the issue description?

#7 Updated by stsp 7 days ago

  • Status changed from New to In Progress

#8 Updated by fixeria 7 days ago

What is the best way to simulate this locally?

I think it makes sense to try to disconnect MNCC from OsmoMSC during an active call,
so it will trigger cleaning up process of the active transactions...

#9 Updated by fixeria 7 days ago

Will I need a full-blown osmo-sip-connector + asterisk setup and make an actual call?

Not exclusively Asterisk, but yes, you need some PBX. There are also Freeswitch
(I can help with configuration), and LCR (Linux Call Router)...

#10 Updated by laforge 7 days ago

On Tue, Aug 07, 2018 at 10:47:36AM +0000, fixeria [REDMINE] wrote:

Will I need a full-blown osmo-sip-connector + asterisk setup and make an actual call?

Not exclusively Asterisk, but yes, you need some PBX. There are also Freeswitch
(I can help with configuration), and LCR (Linux Call Router)...

we also have mncc-python examples which will establish a variety of calls from the MNCC
side without any PBX:

https://git.osmocom.org/mncc-python/

#11 Updated by stsp 5 days ago

laforge wrote:

we also have mncc-python examples which will establish a variety of calls from the MNCC
side without any PBX:

https://git.osmocom.org/mncc-python/

That is neat. I am trying to get this to work, however the call setup seems to fail because the subscribers aren't known to the VLR:

DCC <0001> gsm_04_08_cc.c:1875 (bts - trx - ts - ti -- sub 1111) Received 'MNCC_SETUP_REQ' from MNCC with unknown subscriber 1111

The subscribers are present in the HLR database. Perhaps some steps are missing in this simulation (e.g. I don't see where a location update would be sent)?

In any case, this isn't the only bug assigned to me which is stalling because I don't have a working physical setup to test with. Perhaps I should finally invest some time into setting that up...

#12 Updated by laforge 5 days ago

On Thu, Aug 09, 2018 at 11:41:16AM +0000, stsp [REDMINE] wrote:

That is neat. I am trying to get this to work, however the call setup seems to fail because the subscribers aren't known to the VLR:

the assumption when writing the code is that you use this with real
phones which of course perform a LU before you then use mncc-python to
create a number of voice calls between the phones.

#13 Updated by stsp 5 days ago

OK I will look into getting a more elaborate setup that can reproduce this problem.

Do you by chance remember in which environment the original error was found?
Would knowing more about that environment help me in some way?

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)