Project

General

Profile

Actions

Bug #4832

open

osmo-bsc hard-releases lchan if no MSC is found

Added by laforge over 3 years ago. Updated over 1 year ago.

Status:
Stalled
Priority:
Low
Assignee:
Category:
-
Target version:
-
Start date:
10/25/2020
Due date:
% Done:

20%

Spec Reference:

Description

As described in #4829:

So.... the problem is something else altogether, or at least a large part of the problem.

For some reason, osmo-bsc these days immediately hard-releases the lchan if there is no MSC. There is no signaling sent back to the MS about this:
  • no spoofing of MM/CM (LU REJECT, CM SERV REJECT, ...) which is still understandable, as it's a layering violation and those messages normally originat e at the MSC
  • No RR CHANNEL RELEASE is sent to the MS. That is a big fat bug.

Instead, we simply immedaitely close the lchan on the BTS side. This of course means that from that point onwards there re only bit-errors in downlink in the UE.

DRSL INFO <0003> ../../../git/src/osmo-bsc/abis_rsl.c:1453 (bts=0) CHAN RQD: reason: Location updating (ra=0x0f, neci=0x01, chreq_reason=0x03)
DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_select.c:247 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Selected
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1657 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) MS: Channel Request: reason=LOCATION_UPDATE ra=0x0f ta=0
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:329 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: Received Event LCHAN_EV_ACTIVATE
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:548 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: state_chg to WAIT_TS_READY
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:861 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update requested: 15 dBm
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:893 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update (power class 0): 0 -> 7
DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:626 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) Activation requested: FOR_MS_CHANNEL_REQUEST voice=no MGW-ci=none type=SDCCH tch-mode=SIGNALLING encr-alg=A5/0 ck=none
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/timeslot_fsm.c:106 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: Received Event LCHAN_EV_TS_READY
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: state_chg to WAIT_ACTIV_ACK
DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:476 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4,state=IN_USE) Tx RSL Channel Activate with act_type=INITIAL
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Rx CHAN_ACTIV_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1164 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: Received Event LCHAN_EV_RSL_CHAN_ACTIV_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:772 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Tx RR Immediate Assignment
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:822 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: state_chg to WAIT_RLL_RTP_ESTABLISH
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1899 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) SAPI=0 ESTABLISH INDICATION
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1932 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: Received Event LCHAN_EV_RLL_ESTABLISH_IND
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:850 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: state_chg to ESTABLISHED
DRSL ERROR <0003> ../../../git/src/osmo-bsc/gsm_08_08.c:483 MM GSM48_MT_MM_LOC_UPD_REQUEST: IMSI-001010000000001: No suitable MSC for this Complete Layer 3 request found
DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:644 (bts=0,trx=0,ts=0,ss=0) DEACTivate SACCH CMD
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1595 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{ESTABLISHED}: state_chg to WAIT_RF_RELEASE_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:778 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0] detaches lchan (primary lchan)
DMSC ERROR <0007> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:153 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0]{INIT}: Unable to deliver BSSMAP Clear Request message, no MSC for this conn
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: lchan detaches from conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0]
DLINP ERROR <0017> ../../git/src/input/ipaccess.c:412 Bad signalling message, sign_link returned error: No such file or directory.
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: (type=SDCCH) Rx RF_CHAN_REL_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1184 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: Received Event LCHAN_EV_RSL_RF_CHAN_REL_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1206 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: state_chg to WAIT_AFTER_ERROR
DLSS7 ERROR <0021> ../../git/src/m3ua.c:507 XUA_AS(as-clnt-A-0-m3ua)[0x253d70]{AS_DOWN}: Event AS-TRANSFER.req not permitted
DCHAN DEBUG <000f> ../../git/src/fsm.c:322 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: Timeout of X3111
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1550 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: state_chg to UNUSED
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:382 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Clearing lchan state

We should at the very minimum send a proper RR CHANNEL RELEASE to the MS, so it knows the channel is closed. That's the least we can do. However, it also means that the MS will likely start re-trying again and again. After all, it did not receive any reject with a cause value that would tell it to back off.

So I think we actually should either simply not hard-fast-close but let things run into some timeout (send SCCP connect request but then no response received?), or do the effort of spoofing a LU REJECT / CM SERVICE REJECt with a suitable Reject Cause IE ("Network Failure" or "Service option temporarily out of order" look good to me).

Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).


Files


Related issues

Related to OsmocomBB - Bug #4829: OsmocomBB Rx bit errors in dedicated modeStalledlaforge10/24/2020

Actions
Related to Cellular Network Infrastructure - Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnectedResolvedneels08/11/2020

Actions
Related to OsmoBSC - Bug #5337: ttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_mscResolvedfixeria12/05/2021

Actions
Actions #1

Updated by laforge over 3 years ago

  • Related to Bug #4829: OsmocomBB Rx bit errors in dedicated mode added
Actions #2

Updated by laforge over 3 years ago

  • Priority changed from Normal to High
Actions #3

Updated by neels over 3 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 90
Actions #4

Updated by neels over 3 years ago

  • % Done changed from 90 to 50

above patches add an RR Release on the lchan, still need to address the remaining aspects

Actions #5

Updated by neels over 3 years ago

Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).

This is not exactly related to this specific case, rather refer to #4701

Actions #6

Updated by neels over 3 years ago

  • Related to Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnected added
Actions #7

Updated by neels over 3 years ago

tested just now the behavior of a Samsung Galaxy S4m:

  • letting the LU Request time out:
    • The lchan stays open for close to 20 seconds.
    • The MS launches the next LU attempt about 15 seconds later.
  • when spoofing a Location Updating Reject
    • lchan stays open for 0.33 seconds
    • The MS launches the next LU attempt about 15 seconds later.

No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.
Letting the LU time out is bad because it occupies the lchan for a long time.
Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).

Conclusion from these tests would be to just properly release the lchan and not care about spoofing a reject.
So above patch to simply add an RR Release should do it.

For later reference, attaching the patch that I used to test the spoofing behavior, and a pcap showing it in operation.

Actions #8

Updated by neels over 3 years ago

out of curiosity, also tried just sending a LU Reject without releasing the lchan:
  • lchan stays open for about 10 seconds.
  • MS launches next LU attempt about 15 seconds later.
Actions #9

Updated by fixeria over 3 years ago

While looking at the capture you attached, I stumbled upon packet number 13 "(RR) Channel Release" with cause "Unknown (32)". I checked in 3GPP TS 44.018, section 10.5.2.31 "RR Cause", and yes, Wireshark is right - there is no such cause value. Where this value is coming from?

Actions #10

Updated by laforge over 3 years ago

For a LU REJECT with cause "network failure" (which likely is the right
cause here), T3211 applies and that is a 15s timer, indeed. We could
send a "permanent" cause, but that would be wrong.

No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.

Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).

If we wanted to make the MS back off for a longer time, the LU reject would need to
include the optional T3246 ("Extended wait time") IE.

Actions #11

Updated by laforge about 3 years ago

  • Status changed from Feedback to Stalled
  • Assignee changed from laforge to neels
  • Priority changed from High to Low

I think in order to avoid signaling overload (every single MS retrying every 15s), we should spoof a LU REJECT with the T3246 "extended wait time" IE. The value we send probably should scale with either the amount of time the MSC is already unreachable, or the rate of LU REQ we get, so we have some kind of back-off.

This could also be done in two steps:
  1. start with spoofing LU with a fixed back-off
  2. adjusting that back-off automatically.

re-prioritizing as low.

Actions #12

Updated by fixeria almost 3 years ago

  • % Done changed from 90 to 20

Today I also faced this problem while testing some TRXDv2 related changes (fortunately, I immediately remembered this ticket). I thought we're at least sending the RR Channel Release to the MS if the MSC is not available, but in fact we do not. The problem is that osmo-bsc sends RSL RF Channel Release immediately after sending RSL DATA REQuest with the RR Channel Release, so osmo-bts simply has no time to transmit this message. I guess we should wait until T3109 expires before deactivating the channel?

Actions #13

Updated by fixeria over 2 years ago

  • Related to Bug #5337: ttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_msc added
Actions #14

Updated by Hoernchen over 1 year ago

I just stumbled upon this because there was - allegedly - no msc available, which led to a lot of weird downlink errors on the ms side due to a disappearing channel...

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)