Project

General

Profile

Bug #4832

osmo-bsc hard-releases lchan if no MSC is found

Added by laforge about 1 month ago. Updated about 1 month ago.

Status:
Feedback
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
10/25/2020
Due date:
% Done:

90%

Spec Reference:

Description

As described in #4829:

So.... the problem is something else altogether, or at least a large part of the problem.

For some reason, osmo-bsc these days immediately hard-releases the lchan if there is no MSC. There is no signaling sent back to the MS about this:
  • no spoofing of MM/CM (LU REJECT, CM SERV REJECT, ...) which is still understandable, as it's a layering violation and those messages normally originat e at the MSC
  • No RR CHANNEL RELEASE is sent to the MS. That is a big fat bug.

Instead, we simply immedaitely close the lchan on the BTS side. This of course means that from that point onwards there re only bit-errors in downlink in the UE.

DRSL INFO <0003> ../../../git/src/osmo-bsc/abis_rsl.c:1453 (bts=0) CHAN RQD: reason: Location updating (ra=0x0f, neci=0x01, chreq_reason=0x03)
DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_select.c:247 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Selected
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1657 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) MS: Channel Request: reason=LOCATION_UPDATE ra=0x0f ta=0
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:329 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: Received Event LCHAN_EV_ACTIVATE
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:548 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: state_chg to WAIT_TS_READY
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:861 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update requested: 15 dBm
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:893 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update (power class 0): 0 -> 7
DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:626 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) Activation requested: FOR_MS_CHANNEL_REQUEST voice=no MGW-ci=none type=SDCCH tch-mode=SIGNALLING encr-alg=A5/0 ck=none
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/timeslot_fsm.c:106 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: Received Event LCHAN_EV_TS_READY
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: state_chg to WAIT_ACTIV_ACK
DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:476 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4,state=IN_USE) Tx RSL Channel Activate with act_type=INITIAL
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Rx CHAN_ACTIV_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1164 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: Received Event LCHAN_EV_RSL_CHAN_ACTIV_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:772 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Tx RR Immediate Assignment
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:822 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: state_chg to WAIT_RLL_RTP_ESTABLISH
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1899 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) SAPI=0 ESTABLISH INDICATION
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1932 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: Received Event LCHAN_EV_RLL_ESTABLISH_IND
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:850 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: state_chg to ESTABLISHED
DRSL ERROR <0003> ../../../git/src/osmo-bsc/gsm_08_08.c:483 MM GSM48_MT_MM_LOC_UPD_REQUEST: IMSI-001010000000001: No suitable MSC for this Complete Layer 3 request found
DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:644 (bts=0,trx=0,ts=0,ss=0) DEACTivate SACCH CMD
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1595 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{ESTABLISHED}: state_chg to WAIT_RF_RELEASE_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:778 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0] detaches lchan (primary lchan)
DMSC ERROR <0007> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:153 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0]{INIT}: Unable to deliver BSSMAP Clear Request message, no MSC for this conn
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: lchan detaches from conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0]
DLINP ERROR <0017> ../../git/src/input/ipaccess.c:412 Bad signalling message, sign_link returned error: No such file or directory.
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: (type=SDCCH) Rx RF_CHAN_REL_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1184 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: Received Event LCHAN_EV_RSL_RF_CHAN_REL_ACK
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1206 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: state_chg to WAIT_AFTER_ERROR
DLSS7 ERROR <0021> ../../git/src/m3ua.c:507 XUA_AS(as-clnt-A-0-m3ua)[0x253d70]{AS_DOWN}: Event AS-TRANSFER.req not permitted
DCHAN DEBUG <000f> ../../git/src/fsm.c:322 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: Timeout of X3111
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1550 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: state_chg to UNUSED
DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:382 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Clearing lchan state

We should at the very minimum send a proper RR CHANNEL RELEASE to the MS, so it knows the channel is closed. That's the least we can do. However, it also means that the MS will likely start re-trying again and again. After all, it did not receive any reject with a cause value that would tell it to back off.

So I think we actually should either simply not hard-fast-close but let things run into some timeout (send SCCP connect request but then no response received?), or do the effort of spoofing a LU REJECT / CM SERVICE REJECt with a suitable Reject Cause IE ("Network Failure" or "Service option temporarily out of order" look good to me).

Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).


Related issues

Related to OsmocomBB - Bug #4829: OsmocomBB Rx bit errors in dedicated modeIn Progress10/24/2020

Related to Cellular Network Infrastructure - Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnectedNew08/11/2020

History

#1 Updated by laforge about 1 month ago

  • Related to Bug #4829: OsmocomBB Rx bit errors in dedicated mode added

#2 Updated by laforge about 1 month ago

  • Priority changed from Normal to High

#3 Updated by neels about 1 month ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 90

#4 Updated by neels about 1 month ago

  • % Done changed from 90 to 50

above patches add an RR Release on the lchan, still need to address the remaining aspects

#5 Updated by neels about 1 month ago

Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).

This is not exactly related to this specific case, rather refer to #4701

#6 Updated by neels about 1 month ago

  • Related to Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnected added

#7 Updated by neels about 1 month ago

tested just now the behavior of a Samsung Galaxy S4m:

  • letting the LU Request time out:
    • The lchan stays open for close to 20 seconds.
    • The MS launches the next LU attempt about 15 seconds later.
  • when spoofing a Location Updating Reject
    • lchan stays open for 0.33 seconds
    • The MS launches the next LU attempt about 15 seconds later.

No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.
Letting the LU time out is bad because it occupies the lchan for a long time.
Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).

Conclusion from these tests would be to just properly release the lchan and not care about spoofing a reject.
So above patch to simply add an RR Release should do it.

For later reference, attaching the patch that I used to test the spoofing behavior, and a pcap showing it in operation.

#8 Updated by neels about 1 month ago

out of curiosity, also tried just sending a LU Reject without releasing the lchan:
  • lchan stays open for about 10 seconds.
  • MS launches next LU attempt about 15 seconds later.

#9 Updated by fixeria about 1 month ago

While looking at the capture you attached, I stumbled upon packet number 13 "(RR) Channel Release" with cause "Unknown (32)". I checked in 3GPP TS 44.018, section 10.5.2.31 "RR Cause", and yes, Wireshark is right - there is no such cause value. Where this value is coming from?

#10 Updated by laforge about 1 month ago

For a LU REJECT with cause "network failure" (which likely is the right
cause here), T3211 applies and that is a 15s timer, indeed. We could
send a "permanent" cause, but that would be wrong.

No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.

Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).

If we wanted to make the MS back off for a longer time, the LU reject would need to
include the optional T3246 ("Extended wait time") IE.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)