Project

General

Profile

Bug #3859

SGs FSM doesn't consider disconnected HLR

Added by laforge 7 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
SGs Interface
Target version:
-
Start date:
03/25/2019
Due date:
% Done:

100%

Resolution:

Description

When having incoming SGs LU REQ from the MMS while no HLR is connected, we get:

<0011> sgs_server.c:185 SGs socket bound to r=NULL<->l=0.0.0.0:29118
Mon Mar 25 17:20:23 2019 DLSS7 <001e> osmo_ss7.c:1283 0: ASP Restart for server not implemented yet!
Mon Mar 25 17:20:23 2019 DMNCC <0004> msc_main.c:604 Using internal MNCC handler.
Mon Mar 25 17:20:23 2019 DLGLOBAL <0012> telnet_interface.c:104 Available via telnet 0.0.0.0 4254
Mon Mar 25 17:20:23 2019 DSMPP <000c> smpp_smsc.c:1012 SMPP at 0.0.0.0 2775
Mon Mar 25 17:20:23 2019 DLCTRL <0019> control_if.c:911 CTRL at 0.0.0.0 4255
Mon Mar 25 17:20:23 2019 DLSMS <0018> sms_queue.c:250 Attempting to send 20 SMS
Mon Mar 25 17:20:23 2019 DLSMS <0018> sms_queue.c:234 SMS queue: no SMS to be sent
Mon Mar 25 17:20:23 2019 DLSMS <0018> sms_queue.c:261 Sending SMS done (0 attempted)
Mon Mar 25 17:20:23 2019 DLSMS <0018> sms_queue.c:317 SMSqueue added 0 messages in 0 rounds
Mon Mar 25 17:20:23 2019 DLMGCP <0022> mgcp_client.c:716 MGCP client: using endpoint domain '@mgw'
Mon Mar 25 17:20:23 2019 DLMGCP <0022> mgcp_client.c:791 MGCP GW connection: r=127.0.0.1:2427<->l=127.0.0.1:2727
Mon Mar 25 17:20:23 2019 DMSC <0006> msc_main.c:372 CS7 Instance identifiers: A = Iu = 0
Mon Mar 25 17:20:23 2019 DLSCCP <001f> sccp_user.c:397 OsmoMSC-A-Iu: Using SS7 instance 0, pc:0.23.1
Mon Mar 25 17:20:23 2019 DLSCCP <001f> sccp_user.c:415 OsmoMSC-A-Iu: Using AS instance as-clnt-OsmoMSC-A
Mon Mar 25 17:20:23 2019 DLSCCP <001f> sccp_user.c:420 OsmoMSC-A-Iu: Creating default route
Mon Mar 25 17:20:23 2019 DLSCCP <001f> sccp_user.c:476 OsmoMSC-A-Iu: Using ASP instance asp-clnt-OsmoMSC-A
Mon Mar 25 17:20:23 2019 DLSS7 <001e> osmo_ss7.c:471 0: Creating SCCP instance
Mon Mar 25 17:20:23 2019 DBSSAP <0010> a_iface.c:674 Initalizing SCCP connection to stp...
Mon Mar 25 17:20:27 2019 DSGS <0011> sgs_server.c:123 r=192.168.122.186:37270<->l=192.168.122.1:29118: Accepted new SGs connection
Mon Mar 25 17:24:41 2019 DSGS <0011> fsm.c:320 SGs-VLR-RESET(262-42-8001-01)[0x55fda7c789d0]{unknown 0}: Allocated
Mon Mar 25 17:24:41 2019 DSGS <0011> fsm.c:320 SGs-UE(num:0)[0x55fda7c760f0]{SGs-NULL}: Allocated
Mon Mar 25 17:24:41 2019 DSGS <0011> vlr_sgs_fsm.c:359 SGs-UE(num:0)[0x55fda7c760f0]{SGs-NULL}: state_chg to SGs-NULL
Mon Mar 25 17:24:41 2019 DVLR <000e> vlr.c:438 set IMSI on subscriber; IMSI=262423203001508 id=262423203001508
Mon Mar 25 17:24:41 2019 DVLR <000e> vlr.c:391 New subscr, IMSI: 262423203001508
Mon Mar 25 17:24:41 2019 DVLR <000e> vlr.c:438 set IMSI on subscriber; IMSI=262423203001508 id=262423203001508
Mon Mar 25 17:24:41 2019 DSGS <0011> vlr_sgs.c:96 SGs-UE(num:0)[0x55fda7c760f0]{SGs-NULL}: Received Event RX_LU_FROM_MME
Mon Mar 25 17:24:41 2019 DSGS <0011> vlr_sgs_fsm.c:55 SGs-UE(num:0)[0x55fda7c760f0]{SGs-NULL}: state_chg to SGs-LA-UPDATE-PRESENT
Mon Mar 25 17:24:41 2019 DVLR <000e> gsm_04_08.c:1767 SUBSCR(IMSI-262423203001508:TMSInew-0x8611AEA5) VLR: update for IMSI=262423203001508 (MSISDN=, used=1)
Mon Mar 25 17:24:41 2019 DVLR <000e> vlr.c:192 GSUP tx: 04010862423202031005f8280102
Mon Mar 25 17:24:41 2019 DLGSUP <001c> gsup_client.c:353 GSUP not connected, unable to send 04 01 08 62 42 32 02 03 10 05 f8 28 01 02 
Mon Mar 25 17:24:41 2019 DSGS <0011> vlr_sgs_fsm.c:65 SGs-UE(num:0)[0x55fda7c760f0]{SGs-LA-UPDATE-PRESENT}: (sub IMSI-262423203001508:TMSInew-0x8611AEA5) HLR LU request failed
Mon Mar 25 17:24:55 2019 DVLR <000e> vlr.c:438 set IMSI on subscriber; IMSI=262423203001508 id=262423203001508
Mon Mar 25 17:24:55 2019 DSGS <0011> vlr_sgs.c:96 SGs-UE(num:0)[0x55fda7c760f0]{SGs-LA-UPDATE-PRESENT}: Received Event RX_LU_FROM_MME
Mon Mar 25 17:24:55 2019 DSGS <0011> vlr_sgs.c:96 SGs-UE(num:0)[0x55fda7c760f0]{SGs-LA-UPDATE-PRESENT}: Event RX_LU_FROM_MME not permitted

Even after many minutes, there is no timeout or any other visible recovery. We have to consider such cases as the HLR might always be unreachable at least temporarily. What does the spec say? Shouldn't we return something at all to the MME in this case?

History

#1 Updated by laforge 7 months ago

What makes the problem ven worse: If the HLR is later recovered and the MME is sending another LU REQ, we get:

Mon Mar 25 17:31:03 2019 DSGS <0011> vlr_sgs.c:96 SGs-UE(num:0)[0x55fda7c760f0]{SGs-LA-UPDATE-PRESENT}: Received Event RX_LU_FROM_MME
Mon Mar 25 17:31:03 2019 DSGS <0011> vlr_sgs.c:96 SGs-UE(num:0)[0x55fda7c760f0]{SGs-LA-UPDATE-PRESENT}: Event RX_LU_FROM_MME not permitted

so a temporary HLR outage will break CSFB for an apparently indefinite time :/

#2 Updated by dexter 7 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

I found a way to reproduce the problem above using TTCN3. I also investigated the spec and found out that we are supposed to send an SGsAP-RESET-INDICATION to the MME in those cases. (see also: 3GPP TS 29.118 5.7 VLR failure procedure).

I have some code ready that triggers the sending of an SGsAP-RESET-INDICATION when the HLR (VLR) fails. This works so far. We now need a TTCN3 test that responds to the SGsAP-RESET-INDICATION properly.

#3 Updated by dexter 7 months ago

  • % Done changed from 30 to 90

There is now a TTCN3 test that provokes the problem. See the following patches:

https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/13556 SGsAP_Templates: Remove invalid template.
https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/13557 MSC_Tests: allow disabeling GSUP
https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/13558 MSC_Tests: Add testcase to simulate VLR/HLR failure (SGsAP)

There are several problems in the MSC. On the one side the code in the VLR did not report the failure back to the SGs related code in the msc. I have added a flag so thet the actual msc code gets aware of the failure. When the flag is set, the reset procedure is carried out. This works well on the TTCN3 test so far.

https://gerrit.osmocom.org/#/c/osmo-msc/+/13559 sgs_iface: detect and react to VLR/HLR failure

#4 Updated by dexter 6 months ago

The patches that add the TTCN3 tests are all merged. The MSC part still needs some review:

https://gerrit.osmocom.org/#/c/osmo-msc/+/13559 sgs_iface: detect and react to VLR/HLR failure

#5 Updated by laforge 5 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100

patch now reviewed/rebased/merged

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)