Bug #6426
openttcn3-msc-test: TC_mm_id_resp_no_identity fails sporadically
0%
Description
This testcase was contributed back in 2019 (during 36C3), as part of #4340. And ever since it's failing sporadically:
Timeout waiting for LU reject MSC_Tests.ttcn:7453 MSC_Tests control part MSC_Tests.ttcn:6653 TC_mm_id_resp_no_identity testcase
Files
Related issues
Updated by fixeria 27 days ago
Looking at the PCAP of a failing scenario (attached), I do see the (MM) Location Updating Request
. However, it takes osmo-msc ~5 seconds to react:
210 1.706794 193 → 185 BSSAP 124 UDT (BSSMAP) Reset 271 1.707316 185 → 193 BSSAP 128 SACK (Ack=2, Arwnd=106496) UDT (BSSMAP) Reset Acknowledge 273 1.710793 193 → 185 BSSAP 168 SACK (Ack=4, Arwnd=106496) CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) Location Updating Request 394 1.711241 185 → 193 BSSAP 108 DT1 (DTAP) (MM) Identity Request 397 1.714030 193 → 185 BSSAP 116 DT1 (DTAP) (MM) Identity Response 482 6.714118 185 → 193 BSSAP 108 DT1 (DTAP) (MM) Location Updating Reject 484 6.714229 185 → 193 BSSAP 108 DT1 (BSSMAP) Clear Command
The delay is related to a timeout expiry in osmo-msc:
429 6.712440 172.18.13.10 → 172.18.13.103 GSMTAP 229 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: Timeout of X1 430 6.712500 172.18.13.10 → 172.18.13.103 GSMTAP 248 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: Received Event MSC_A_EV_CN_CLOSE 431 6.712523 172.18.13.10 → 172.18.13.103 GSMTAP 239 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: - lu: now used by 0 (-) 432 6.712543 172.18.13.10 → 172.18.13.103 GSMTAP 246 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: Received Event MSC_A_EV_UNUSED 433 6.712574 172.18.13.10 → 172.18.13.103 GSMTAP 260 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_AUTH_CIPH}: State change to MSC_A_ST_RELEASING (X2, 30s) 434 6.712590 172.18.13.10 → 172.18.13.103 GSMTAP 245 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: Releasing: msc_a use is 0 (-) 435 6.712607 172.18.13.10 → 172.18.13.103 GSMTAP 254 VLR subscr unknown + msc_a_fsm_releasing_onenter: now used by 2 (active-conn,msc_a_fsm_releasing_onenter) 436 6.712624 172.18.13.10 → 172.18.13.103 GSMTAP 284 VLR subscr unknown + vlr_subscr_cancel_attach_fsm: now used by 3 (active-conn,msc_a_fsm_releasing_onenter,vlr_subscr_cancel_attach_fsm) 437 6.712649 172.18.13.10 → 172.18.13.103 GSMTAP 240 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: LOCATION UPDATING REJECT 438 6.712669 172.18.13.10 → 172.18.13.103 GSMTAP 259 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: Sending DTAP: MM GSM48_MT_MM_LOC_UPD_REJECT 439 6.712684 172.18.13.10 → 172.18.13.103 GSMTAP 240 msc_a(unknown:GERAN-A-290:LU)[0x55f81ea00720]{MSC_A_ST_RELEASING}: RAN encode: BSSMAP: DTAP
Let's now look into ttcn3-msc-test, specifically f_expect_lu_reject()
:
919 function f_expect_lu_reject(template OCT1 cause := ?) runs on BSC_ConnHdlr {
920 var PDU_DTAP_MT dtap_mt;
921 timer T := 5.0;
922
923 T.start;
924 alt {
925 [] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_LU_Rej(cause))) {
926 setverdict(pass);
927 }
928 [] BSSAP.receive(tr_PDU_DTAP_MT(?)) -> value dtap_mt {
929 setverdict(fail, "Expected LU reject BSSAP message, got: ", dtap_mt);
930 }
931 [] T.timeout {
932 setverdict(fail, "Timeout waiting for LU reject");
933 }
934 }
935 }
Bingo! It's a race condition:
- it takes osmo-msc exactly 5 seconds to send LU REJECT,
- the testsuite is expecting the LU REJECT to arrive within 5 seconds.
Updated by fixeria 27 days ago
- Status changed from New to Feedback
We should ideally fix osmo-msc to react on malformed MM Identity Response
immediately. Here is some draft code:
diff --git a/src/libmsc/gsm_04_08.c b/src/libmsc/gsm_04_08.c
index 17350faad..4f249ce56 100644
--- a/src/libmsc/gsm_04_08.c
+++ b/src/libmsc/gsm_04_08.c
@@ -1230,6 +1230,22 @@ int gsm0408_rcv_mm(struct msc_a *msc_a, struct msgb *msg)
break;
case GSM48_MT_MM_ID_RESP:
rc = mm_rx_id_resp(msc_a, msg);
+ if (rc == 0)
+ break;
+ switch (msc_a->complete_layer3_type) {
+ case COMPLETE_LAYER3_LU:
+ gsm0408_loc_upd_rej(msc_a, GSM48_REJECT_INCORRECT_MESSAGE);
+ break;
+ case COMPLETE_LAYER3_CM_SERVICE_REQ:
+ case COMPLETE_LAYER3_CM_RE_ESTABLISH_REQ:
+ msc_gsm48_tx_mm_serv_rej(msc_a, GSM48_REJECT_INCORRECT_MESSAGE);
+ break;
+ case COMPLETE_LAYER3_PAGING_RESP:
+ case COMPLETE_LAYER3_NONE:
+ default:
+ msc_a_release_cn(msc_a, GSM_CAUSE_NET_FAIL);
+ break;
+ }
break;
case GSM48_MT_MM_CM_SERV_REQ:
rc = gsm48_rx_mm_serv_req(msc_a, msg);
For now let's enlarge the timeout:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36502 msc: fix race condition in TC_mm_id_resp_no_identity
Updated by neels 26 days ago
fixeria wrote in #note-3:
We should ideally fix osmo-msc to react on malformed
MM Identity Response
immediately. Here is some draft code:
ack, it shouldn't wait for the timeout on incorrect data.
Your draft patch pivots on the wrong point though: it just sends the reject message without updating the internal state.
So instead, you should make msc_a fail to authenticate, to properly tear down the conn immediately.
Otherwise we respond with a reject, but still keep the SCCP conn open for 5 seconds.
The API intended for this is msc_a_release_cn() = release the conn because the CN wants to.
The idea was that this should also trigger all pending rejection messages, but am not sure that it actually does.
All in all this not very critical.
One might even argue that it's a feature: if someone is sending malformed MI, then they deserve to wait 5 seconds =)