Project

General

Profile

Actions

Bug #6426

open

ttcn3-msc-test: TC_mm_id_resp_no_identity fails sporadically

Added by fixeria 13 days ago. Updated 12 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/02/2024
Due date:
% Done:

0%

Resolution:
Spec Reference:
Tags:

Description

This testcase was contributed back in 2019 (during 36C3), as part of #4340. And ever since it's failing sporadically:

Timeout waiting for LU reject
      MSC_Tests.ttcn:7453 MSC_Tests control part
      MSC_Tests.ttcn:6653 TC_mm_id_resp_no_identity testcase

Files


Related issues

Related to OsmoMSC - Bug #4340: Malformed MM Identity Response crashes OsmoMSCResolvedfixeria12/28/2019

Actions
Related to OsmoMSC - Bug #6427: ttcn3-msc-test: TC_mo_ussd_for_unknown_trans fails sporadicallyNew04/02/2024

Actions
Actions #1

Updated by fixeria 13 days ago

  • Related to Bug #4340: Malformed MM Identity Response crashes OsmoMSC added
Actions #2

Updated by fixeria 13 days ago

Looking at the PCAP of a failing scenario (attached), I do see the (MM) Location Updating Request. However, it takes osmo-msc ~5 seconds to react:

  210   1.706794          193 → 185          BSSAP 124 UDT (BSSMAP) Reset 
  271   1.707316          185 → 193          BSSAP 128 SACK (Ack=2, Arwnd=106496) UDT (BSSMAP) Reset Acknowledge 
  273   1.710793          193 → 185          BSSAP 168 SACK (Ack=4, Arwnd=106496) CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) Location Updating Request 
  394   1.711241          185 → 193          BSSAP 108 DT1 (DTAP) (MM) Identity Request 
  397   1.714030          193 → 185          BSSAP 116 DT1 (DTAP) (MM) Identity Response 
  482   6.714118          185 → 193          BSSAP 108 DT1 (DTAP) (MM) Location Updating Reject 
  484   6.714229          185 → 193          BSSAP 108 DT1 (BSSMAP) Clear Command 

The delay is related to a timeout expiry in osmo-msc:

  429   6.712440 172.18.13.10 → 172.18.13.103 GSMTAP 229 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: Timeout of X1 
  430   6.712500 172.18.13.10 → 172.18.13.103 GSMTAP 248 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: Received Event MSC_A_EV_CN_CLOSE 
  431   6.712523 172.18.13.10 → 172.18.13.103 GSMTAP 239 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: - lu: now used by 0 (-) 
  432   6.712543 172.18.13.10 → 172.18.13.103 GSMTAP 246 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: Received Event MSC_A_EV_UNUSED 
  433   6.712574 172.18.13.10 → 172.18.13.103 GSMTAP 260 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: State change to MSC_A_ST_RELEASING (X2, 30s) 
  434   6.712590 172.18.13.10 → 172.18.13.103 GSMTAP 245 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: Releasing: msc_a use is 0 (-) 
  435   6.712607 172.18.13.10 → 172.18.13.103 GSMTAP 254 VLR subscr unknown + msc_a_fsm_releasing_onenter: now used by 2 (active-conn,msc_a_fsm_releasing_onenter) 
  436   6.712624 172.18.13.10 → 172.18.13.103 GSMTAP 284 VLR subscr unknown + vlr_subscr_cancel_attach_fsm: now used by 3 (active-conn,msc_a_fsm_releasing_onenter,vlr_subscr_cancel_attach_fsm) 
  437   6.712649 172.18.13.10 → 172.18.13.103 GSMTAP 240 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: LOCATION UPDATING REJECT 
  438   6.712669 172.18.13.10 → 172.18.13.103 GSMTAP 259 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: Sending DTAP: MM GSM48_MT_MM_LOC_UPD_REJECT 
  439   6.712684 172.18.13.10 → 172.18.13.103 GSMTAP 240 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: RAN encode: BSSMAP: DTAP 

Let's now look into ttcn3-msc-test, specifically f_expect_lu_reject():

 919 function f_expect_lu_reject(template OCT1 cause := ?) runs on BSC_ConnHdlr {
 920         var PDU_DTAP_MT dtap_mt;
 921         timer T := 5.0;
 922 
 923         T.start;
 924         alt {
 925         [] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_LU_Rej(cause))) {
 926                 setverdict(pass);
 927                 }
 928         [] BSSAP.receive(tr_PDU_DTAP_MT(?)) -> value dtap_mt {
 929                 setverdict(fail, "Expected LU reject BSSAP message, got: ", dtap_mt);
 930                 }
 931         [] T.timeout {
 932                 setverdict(fail, "Timeout waiting for LU reject");
 933                 }
 934         }
 935 }

Bingo! It's a race condition:

  • it takes osmo-msc exactly 5 seconds to send LU REJECT,
  • the testsuite is expecting the LU REJECT to arrive within 5 seconds.
Actions #3

Updated by fixeria 13 days ago

  • Status changed from New to Feedback

We should ideally fix osmo-msc to react on malformed MM Identity Response immediately. Here is some draft code:

diff --git a/src/libmsc/gsm_04_08.c b/src/libmsc/gsm_04_08.c
index 17350faad..4f249ce56 100644
--- a/src/libmsc/gsm_04_08.c
+++ b/src/libmsc/gsm_04_08.c
@@ -1230,6 +1230,22 @@ int gsm0408_rcv_mm(struct msc_a *msc_a, struct msgb *msg)
                break;
        case GSM48_MT_MM_ID_RESP:
                rc = mm_rx_id_resp(msc_a, msg);
+               if (rc == 0)
+                       break;
+               switch (msc_a->complete_layer3_type) {
+               case COMPLETE_LAYER3_LU:
+                       gsm0408_loc_upd_rej(msc_a, GSM48_REJECT_INCORRECT_MESSAGE);
+                       break;
+               case COMPLETE_LAYER3_CM_SERVICE_REQ:
+               case COMPLETE_LAYER3_CM_RE_ESTABLISH_REQ:
+                       msc_gsm48_tx_mm_serv_rej(msc_a, GSM48_REJECT_INCORRECT_MESSAGE);
+                       break;
+               case COMPLETE_LAYER3_PAGING_RESP:
+               case COMPLETE_LAYER3_NONE:
+               default:
+                       msc_a_release_cn(msc_a, GSM_CAUSE_NET_FAIL);
+                       break;
+               }
                break;
        case GSM48_MT_MM_CM_SERV_REQ:
                rc = gsm48_rx_mm_serv_req(msc_a, msg);

For now let's enlarge the timeout:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36502 msc: fix race condition in TC_mm_id_resp_no_identity

Actions #4

Updated by fixeria 13 days ago

  • Related to Bug #6427: ttcn3-msc-test: TC_mo_ussd_for_unknown_trans fails sporadically added
Actions #5

Updated by neels 12 days ago

fixeria wrote in #note-3:

We should ideally fix osmo-msc to react on malformed MM Identity Response immediately. Here is some draft code:

ack, it shouldn't wait for the timeout on incorrect data.

Your draft patch pivots on the wrong point though: it just sends the reject message without updating the internal state.
So instead, you should make msc_a fail to authenticate, to properly tear down the conn immediately.
Otherwise we respond with a reject, but still keep the SCCP conn open for 5 seconds.
The API intended for this is msc_a_release_cn() = release the conn because the CN wants to.
The idea was that this should also trigger all pending rejection messages, but am not sure that it actually does.

All in all this not very critical.
One might even argue that it's a feature: if someone is sending malformed MI, then they deserve to wait 5 seconds =)

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)