Bug #4832
openosmo-bsc hard-releases lchan if no MSC is found
20%
Description
As described in #4829:
So.... the problem is something else altogether, or at least a large part of the problem.
For some reason, osmo-bsc these days immediately hard-releases the lchan if there is no MSC. There is no signaling sent back to the MS about this:- no spoofing of MM/CM (LU REJECT, CM SERV REJECT, ...) which is still understandable, as it's a layering violation and those messages normally originat e at the MSC
- No RR CHANNEL RELEASE is sent to the MS. That is a big fat bug.
Instead, we simply immedaitely close the lchan on the BTS side. This of course means that from that point onwards there re only bit-errors in downlink in the UE.
DRSL INFO <0003> ../../../git/src/osmo-bsc/abis_rsl.c:1453 (bts=0) CHAN RQD: reason: Location updating (ra=0x0f, neci=0x01, chreq_reason=0x03) DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_select.c:247 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Selected DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1657 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) MS: Channel Request: reason=LOCATION_UPDATE ra=0x0f ta=0 DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:329 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: Received Event LCHAN_EV_ACTIVATE DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:548 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: state_chg to WAIT_TS_READY DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:861 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update requested: 15 dBm DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/gsm_data.c:893 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) MS Power level update (power class 0): 0 -> 7 DCHAN INFO <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:626 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: (type=SDCCH) Activation requested: FOR_MS_CHANNEL_REQUEST voice=no MGW-ci=none type=SDCCH tch-mode=SIGNALLING encr-alg=A5/0 ck=none DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/timeslot_fsm.c:106 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: Received Event LCHAN_EV_TS_READY DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_TS_READY}: state_chg to WAIT_ACTIV_ACK DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:476 (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4,state=IN_USE) Tx RSL Channel Activate with act_type=INITIAL DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Rx CHAN_ACTIV_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1164 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: Received Event LCHAN_EV_RSL_CHAN_ACTIV_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:772 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: (type=SDCCH) Tx RR Immediate Assignment DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:822 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_ACTIV_ACK}: state_chg to WAIT_RLL_RTP_ESTABLISH DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1899 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) SAPI=0 ESTABLISH INDICATION DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1932 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: Received Event LCHAN_EV_RLL_ESTABLISH_IND DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:850 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RLL_RTP_ESTABLISH}: state_chg to ESTABLISHED DRSL ERROR <0003> ../../../git/src/osmo-bsc/gsm_08_08.c:483 MM GSM48_MT_MM_LOC_UPD_REQUEST: IMSI-001010000000001: No suitable MSC for this Complete Layer 3 request found DRSL DEBUG <0003> ../../../git/src/osmo-bsc/abis_rsl.c:644 (bts=0,trx=0,ts=0,ss=0) DEACTivate SACCH CMD DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1595 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{ESTABLISHED}: state_chg to WAIT_RF_RELEASE_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:778 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0] detaches lchan (primary lchan) DMSC ERROR <0007> ../../../git/src/osmo-bsc/bsc_subscr_conn_fsm.c:153 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0]{INIT}: Unable to deliver BSSMAP Clear Request message, no MSC for this conn DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1644 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: lchan detaches from conn SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-001010000000001)[0x28adc0] DLINP ERROR <0017> ../../git/src/input/ipaccess.c:412 Bad signalling message, sign_link returned error: No such file or directory. DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1152 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: (type=SDCCH) Rx RF_CHAN_REL_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/abis_rsl.c:1184 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: Received Event LCHAN_EV_RSL_RF_CHAN_REL_ACK DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1206 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_RF_RELEASE_ACK}: state_chg to WAIT_AFTER_ERROR DLSS7 ERROR <0021> ../../git/src/m3ua.c:507 XUA_AS(as-clnt-A-0-m3ua)[0x253d70]{AS_DOWN}: Event AS-TRANSFER.req not permitted DCHAN DEBUG <000f> ../../git/src/fsm.c:322 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: Timeout of X3111 DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:1550 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{WAIT_AFTER_ERROR}: state_chg to UNUSED DCHAN DEBUG <000f> ../../../git/src/osmo-bsc/lchan_fsm.c:382 lchan(0-0-0-CCCH_SDCCH4-0)[0x28a0d0]{UNUSED}: (type=SDCCH) Clearing lchan state
We should at the very minimum send a proper RR CHANNEL RELEASE to the MS, so it knows the channel is closed. That's the least we can do. However, it also means that the MS will likely start re-trying again and again. After all, it did not receive any reject with a cause value that would tell it to back off.
So I think we actually should either simply not hard-fast-close but let things run into some timeout (send SCCP connect request but then no response received?), or do the effort of spoofing a LU REJECT / CM SERVICE REJECt with a suitable Reject Cause IE ("Network Failure" or "Service option temporarily out of order" look good to me).
Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).
Files
Related issues
Updated by laforge over 3 years ago
- Related to Bug #4829: OsmocomBB Rx bit errors in dedicated mode added
Updated by neels over 3 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 90
Updated by neels over 3 years ago
- % Done changed from 90 to 50
above patches add an RR Release on the lchan, still need to address the remaining aspects
Updated by neels over 3 years ago
Another oddity about this problem is that it only shows if the MSC is absent when the BSC starts up. If the BSC has once seen the MSC, then it considers it 'eligible'. Even if the MSC then is gone at a later point, it will not go back to this hard-fast-clear behavior (at least not quickly?).
This is not exactly related to this specific case, rather refer to #4701
Updated by neels over 3 years ago
- Related to Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnected added
Updated by neels over 3 years ago
- File 0001-spoof-LU-reject-when-there-is-no-MSC.patch 0001-spoof-LU-reject-when-there-is-no-MSC.patch added
- File spoof-LU-reject-when-there-is-no-MSC.pcapng spoof-LU-reject-when-there-is-no-MSC.pcapng added
- Status changed from In Progress to Feedback
- Assignee changed from neels to laforge
- % Done changed from 50 to 90
tested just now the behavior of a Samsung Galaxy S4m:
- letting the LU Request time out:
- The lchan stays open for close to 20 seconds.
- The MS launches the next LU attempt about 15 seconds later.
- releasing immediately (including RR Release with above fix https://gerrit.osmocom.org/c/osmo-bsc/+/20949 )
- lchan stays open for 0.33 seconds
- The MS launches the next LU attempt about 15 seconds later.
- when spoofing a Location Updating Reject
- lchan stays open for 0.33 seconds
- The MS launches the next LU attempt about 15 seconds later.
No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.
Letting the LU time out is bad because it occupies the lchan for a long time.
Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).
Conclusion from these tests would be to just properly release the lchan and not care about spoofing a reject.
So above patch to simply add an RR Release should do it.
For later reference, attaching the patch that I used to test the spoofing behavior, and a pcap showing it in operation.
Updated by neels over 3 years ago
- lchan stays open for about 10 seconds.
- MS launches next LU attempt about 15 seconds later.
Updated by fixeria over 3 years ago
While looking at the capture you attached, I stumbled upon packet number 13 "(RR) Channel Release" with cause "Unknown (32)". I checked in 3GPP TS 44.018, section 10.5.2.31 "RR Cause", and yes, Wireshark is right - there is no such cause value. Where this value is coming from?
Updated by laforge over 3 years ago
For a LU REJECT with cause "network failure" (which likely is the right
cause here), T3211 applies and that is a 15s timer, indeed. We could
send a "permanent" cause, but that would be wrong.
No matter whether we spoof a reject, plain release or let the LU time out, the Galaxy S4m always retries after roughly 15 seconds.
Spoofing a reject doesn't affect RF load by the MS retrying (at least not for the Galaxy S4m).
If we wanted to make the MS back off for a longer time, the LU reject would need to
include the optional T3246 ("Extended wait time") IE.
Updated by laforge about 3 years ago
- Status changed from Feedback to Stalled
- Assignee changed from laforge to neels
- Priority changed from High to Low
I think in order to avoid signaling overload (every single MS retrying every 15s), we should spoof a LU REJECT with the T3246 "extended wait time" IE. The value we send probably should scale with either the amount of time the MSC is already unreachable, or the rate of LU REQ we get, so we have some kind of back-off.
This could also be done in two steps:- start with spoofing LU with a fixed back-off
- adjusting that back-off automatically.
re-prioritizing as low.
Updated by fixeria almost 3 years ago
- % Done changed from 90 to 20
Today I also faced this problem while testing some TRXDv2 related changes (fortunately, I immediately remembered this ticket). I thought we're at least sending the RR Channel Release to the MS if the MSC is not available, but in fact we do not. The problem is that osmo-bsc sends RSL RF Channel Release immediately after sending RSL DATA REQuest with the RR Channel Release, so osmo-bts simply has no time to transmit this message. I guess we should wait until T3109 expires before deactivating the channel?
Updated by fixeria over 2 years ago
- Related to Bug #5337: ttcn3-bsc-test: leaked struct bsc_subscr in BSC_Tests.TC_no_msc added
Updated by Hoernchen over 1 year ago
I just stumbled upon this because there was - allegedly - no msc available, which led to a lot of weird downlink errors on the ms side due to a disappearing channel...