Project

General

Profile

Actions

Bug #6414

closed

ttcn3-msc-test: TC_ho_inter_bsc_a5_[134] are failing with io_uring

Added by fixeria 25 days ago. Updated 13 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
03/22/2024
Due date:
% Done:

100%

Resolution:
Spec Reference:
Tags:

Description

Since recently, we started executing ttcn3-msc-test (among with some other testsuites) with LIBOSMO_IO_BACKEND=IO_URING in a separate job. Compared to the "normal" job, the IO_URING one exhibits a regression since build 5 (Mar 15, 2024):

  • TC_ho_inter_bsc_a5_3 is failing "reliably",
  • TC_ho_inter_bsc_a5_14 are alternating between pass and fail.

I was unable to reproduce the problem, neither locally on my machine nor on the dedicated build server in Docker env.


Files

Actions #1

Updated by fixeria 25 days ago

I am attaching two PCAP files:

Comparing them side-by-side, I found an interesting diff:

  • MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap:
    • frame 2216 BSSMAP Handover Detect
    • frame 2283 BSSMAP Handover Succeeded
    • frame 2303 BSSMAP Handover Complete
  • MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap:
    • frame 2216 BSSMAP Handover Detect + BSSMAP Handover Complete
    • frame 2347 BSSMAP Handover Succeeded

so in the passing scenario, frame 2216 carries two SCTP chunks.

Actions #2

Updated by fixeria 25 days ago

fixeria wrote in #note-1:

Comparing them side-by-side, I found an interesting diff:

  • MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap:
    • frame 2216 BSSMAP Handover Detect
    • frame 2283 BSSMAP Handover Succeeded
    • frame 2303 BSSMAP Handover Complete

This is also what I am seeing when running the testcase locally, and it's passing. So it's unlikely the problem here.

There is another nitpick though:

  • MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap:
    • frame 2673 BSSMAP Clear Command
    • frame 2686 DTAP (CC) Release Complete
    • frame 2719 BSSMAP Clear Complete (!)
  • MSC_Tests.TC_ho_inter_bsc_a5_3_fail.pcap:
    • frame 2675 BSSMAP Clear Command
    • frame 2678 DTAP (CC) Release Complete

Note that in the failing scenario, there is no BSSMAP Clear Complete.

Actions #3

Updated by fixeria 25 days ago

According to the logging, the testcase gets stuck waiting for some message to be received on a port. Most likely, it's somewhere in f_call_hangup(), because the last line logged is "f_call_hangup 2: rx MNCC REL ind" before the Tguard timeout happens.

1822         log("f_call_hangup 1: rx DTAP CC DISC");
1823 
1824         if (release_by_ms) {
1825                 var BIT1 tid_remote := '1'B;
1826                 if (cpars.mo_call) {
1827                         tid_remote := '0'B;
1828                 }
1829                 /* B-side (MS) Release of call */
1830                 BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_RELEASE(cpars.transaction_id, tid_remote, '0000000'B)));
1831                 MNCC.receive(tr_MNCC_REL_ind(cpars.mncc_callref));
1832                 log("f_call_hangup 2: rx MNCC REL ind");
1833                 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))); // <-- (!) here?
1834                 log("f_call_hangup 3: rx DTAP CC REL COMPL");
1835         } else {
Actions #4

Updated by fixeria 24 days ago

fixeria wrote in #note-3:

According to the logging, the testcase gets stuck waiting for some message to be received on a port. Most likely, it's somewhere in f_call_hangup(), because the last line logged is "f_call_hangup 2: rx MNCC REL ind" before the Tguard timeout happens.

I managed to reproduce the problem on the build server by running the whole testsuite (not the problematic testcases directly). With some modifications, I confirmed my assumption about getting stuck in f_call_hangup().
As it turns out, even though the CC Release Complete message is present in both PCAPs, in the failing scenario it never reaches the testsuite. Somehow we receive something else on the BSSAP port, not even matching PDU_DTAP_MT:?.

Actions #5

Updated by fixeria 23 days ago

  • % Done changed from 0 to 20

fixeria wrote in #note-4:

As it turns out, even though the CC Release Complete message is present in both PCAPs, in the failing scenario it never reaches the testsuite. Somehow we receive something else on the BSSAP port, not even matching PDU_DTAP_MT:?.

After hours of debugging and trying to understand why do we get stuck in f_call_hangup(), I finally made a progress and some interesting discoveries.

Running one of the affected testcases, e.g. MSC_Tests.TC_ho_inter_bsc_a5_3 with this patch (purely logging, no logic changes) applied:

diff --git a/msc/BSC_ConnectionHandler.ttcn b/msc/BSC_ConnectionHandler.ttcn
index 92ec3b15..f03cfcec 100644
--- a/msc/BSC_ConnectionHandler.ttcn
+++ b/msc/BSC_ConnectionHandler.ttcn
@@ -1829,9 +1829,15 @@ runs on BSC_ConnHdlr {
                /* B-side (MS) Release of call */
                BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_RELEASE(cpars.transaction_id, tid_remote, '0000000'B)));
                MNCC.receive(tr_MNCC_REL_ind(cpars.mncc_callref));
-               log("f_call_hangup 2: rx MNCC REL ind");
-               BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
-               log("f_call_hangup 3: rx DTAP CC REL COMPL");
+               log("======================== f_call_hangup 2: rx MNCC REL ind");
+               var PDU_DTAP_MT dtap_mt;
+               BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id))) -> value dtap_mt;
+               if (isbound(dtap_mt)) {
+                       log("======================== f_call_hangup 3: got CC REL COMPL: ", dtap_mt);
+               } else {
+                       log("======================== f_call_hangup 3: got NOTHING");
+               }
+               log("======================== f_call_hangup 3: continue");
        } else {
                /* A-side (PLMN) Release of call */
                MNCC.send(ts_MNCC_REL_req(cpars.mncc_callref, valueof(ts_MNCC_cause(42))));
diff --git a/msc/MSC_Tests.ttcn b/msc/MSC_Tests.ttcn
index 7f993c4f..c38145a4 100644
--- a/msc/MSC_Tests.ttcn
+++ b/msc/MSC_Tests.ttcn
@@ -179,13 +179,16 @@ private altstep as_Tguard_direct() runs on MTC_CT {

 private altstep as_optional_cc_rel(CallParameters cpars, boolean respond := false) runs on BSC_ConnHdlr {
        [] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_RELEASE(cpars.transaction_id))) {
-                       if (respond) {
-                               var BIT1 tid_remote := '1'B;
-                               if (cpars.mo_call) {
-                                       tid_remote := '0'B;
-                               }
-                               BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id, tid_remote)));
+               log("================================ as_optional_cc_rel(): got CC RELEASE");
+               if (respond) {
+                       var BIT1 tid_remote := '1'B;
+                       if (cpars.mo_call) {
+                               tid_remote := '0'B;
                        }
+                       log("================================ as_optional_cc_rel(): sending CC REL COMPL");
+                       BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id, tid_remote)));
+               }
+               log("================================ as_optional_cc_rel(): not doing repeat!");
                }
 }

I am seeing the following output when the testcase is passing:

TC_ho_inter_bsc_a5_30(15)@LEGION: ======================== f_call_hangup 2: rx MNCC REL ind
TC_ho_inter_bsc_a5_30(15)@LEGION: ================================ as_optional_cc_rel(): got CC RELEASE
TC_ho_inter_bsc_a5_30(15)@LEGION: ================================ as_optional_cc_rel(): sending CC REL COMPL
TC_ho_inter_bsc_a5_30(15)@LEGION: ================================ as_optional_cc_rel(): not doing repeat!
TC_ho_inter_bsc_a5_30(15)@LEGION: ======================== f_call_hangup 3: got NOTHING
TC_ho_inter_bsc_a5_30(15)@LEGION: ======================== f_call_hangup 3: continue

This output means that in the successful scenario we never receive CC RELEASE COMPLETE. The standalone receive statement gets bypassed by altstep as_optional_cc_rel() running in background.
This definitely sounds confusing, and took me some time to realize, but the important thing here is that altstep as_optional_cc_rel() does not repeat.

The standalone BSSAP.receive statement we have in f_call_hangup():

log("f_call_hangup 2: rx MNCC REL ind");
BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
log("f_call_hangup 3: rx DTAP CC REL COMPL");

actually evaluates to the following code, given that the altstep as_optional_cc_rel() is activated in background:

log("f_call_hangup 2: rx MNCC REL ind");
alt {
[] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
[] as_optional_cc_rel(cpars, true);
}
log("f_call_hangup 3: rx DTAP CC REL COMPL");

which expands to:


log("f_call_hangup 2: rx MNCC REL ind");
alt {
[] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
[] BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_RELEASE(cpars.transaction_id))) {
    BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id, tid_remote)));
    // no repeat statement here!
    }
}
log("f_call_hangup 3: rx DTAP CC REL COMPL");

With this code, we continue if either of the two DTAP messages is received! Now let's see what happens with io_uring...

Actions #6

Updated by fixeria 23 days ago

As was discussed with neels in Jabber, the idea is that only one of the two scenarios should happen:

  • we receive MT DTAP with CC RELEASE and respond with MO DTAP CC RELEASE COMPLETE,
  • we send MO DTAP with CC RELEASE and expect to receive MT DTAP CC RELEASE COMPLETE.

I find this a bit weird, specifically this permissive uncertainty on who initiates the call release. Anyways.

Below is my speculative (not confirmed yet) understanding of the failing scenario:

1804 function f_call_hangup(inout CallParameters cpars, boolean release_by_ms, boolean is_csfb := false)
1805 runs on BSC_ConnHdlr {
1806                         
1807         var MobileIdentityLV mi;
1808         var MNCC_PDU mncc;
1809         var MgcpCommand mgcp_cmd;
1810         var boolean respond_to_dlcx;
1811         var boolean dlcx_contained_ci := false;
1812         var template PDU_BSSAP t_clear := tr_BSSMAP_ClearCommand;
1813 
1814         if (is_csfb) {
1815                 t_clear := tr_BSSMAP_ClearCommandCSFB;
1816         }
1817 
1818         log("f_call_hangup 0: tx MNCC_DISC_REQ");
1819         MNCC.send(ts_MNCC_DISC_req(cpars.mncc_callref, valueof(ts_MNCC_cause(23))));
1820         BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_DISC(cpars.transaction_id)));
1821         // ^^^^^^^^^^ this receive() operation gets skipped if as_optional_cc_rel() triggers here
1822         //            may happen if MT DTAP with CC RELEASE is received earlier than this message
1823         //            as a result, MT DTAP with CC DISONNECT remains in the queue of port BSSAP
1824                                 
1825         log("f_call_hangup 1: rx DTAP CC DISC");
1826                                         
1827         if (release_by_ms) {    
1828                 var BIT1 tid_remote := '1'B;
1829                 if (cpars.mo_call) {
1830                         tid_remote := '0'B;
1831                 }
1832                 /* B-side (MS) Release of call */
1833                 BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_RELEASE(cpars.transaction_id, tid_remote, '0000000'B)));
1834                 MNCC.receive(tr_MNCC_REL_ind(cpars.mncc_callref));
1835                 // ^^^^^^^^^ this receive() may also be skipped due to as_optional_cc_rel() triggering here
1836                 //           but this is unlikely to affect the outcome, because we do not receive() on port MNCC anymore
1837                 log("f_call_hangup 2: rx MNCC REL ind");
1838                 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_REL_COMPL(cpars.transaction_id)));
1839                 // ^^^^^^^^^ this receive() blocks the testcase execution until Tguard expires
1840                 //           no mater if we receive MT DTAP with CC RELEASE COMPLETE or not,
1841                 //           the port queue is jammed due to CC DISONNECT was not dequeued above
1842                 log("f_call_hangup 3: rx DTAP CC REL COMPL");
1843         } else {        
1844                 /* A-side (PLMN) Release of call */
1845                 MNCC.send(ts_MNCC_REL_req(cpars.mncc_callref, valueof(ts_MNCC_cause(42))));
1846                 BSSAP.receive(tr_PDU_DTAP_MT(tr_ML3_MT_CC_RELEASE(cpars.transaction_id)));
1847                 log("f_call_hangup 2.a: rx DTAP CC RELEASE");
1848                 BSSAP.send(ts_PDU_DTAP_MO(ts_ML3_MO_CC_REL_COMPL(cpars.transaction_id)));
1849                 log("f_call_hangup 3.a: rx MNCC REL cnf");
1850                 MNCC.receive(tr_MNCC_REL_cnf(cpars.mncc_callref, cause := *));
1851         }
Actions #7

Updated by fixeria 22 days ago

fixeria wrote in #note-6:

Below is my speculative (not confirmed yet) understanding of the failing scenario: [...]

Interestingly enough, below I assumed that as_optional_cc_rel() triggering at line 1834 (near MNCC.receive) would unlikely affect the testcase outcome, but it does!

I reproduced the problem with some additional logging, and below is a brief sequence of events (see attached log file):

* [1572] "17:12:59.867039 1384 MSC_Tests.ttcn:5859 Altstep as_optional_cc_rel was activated as default, id 4" 
* [1594] "17:12:59.871612 1384 BSC_ConnectionHandler.ttcn:1818 f_call_hangup 0: tx MNCC_DISC_REQ" 
* [1611] "17:12:59.876589 1384 BSC_ConnectionHandler.ttcn:1820 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MT ..." -- Rx CC DISCONNECT, id 21
* [1614] "17:12:59.877007 1384 BSC_ConnectionHandler.ttcn:1820 Message with id 21 was extracted from the queue of BSSAP." 
* [1615] "17:12:59.877042 1384 BSC_ConnectionHandler.ttcn:1822 f_call_hangup 1: rx DTAP CC DISC" 

So far so good. While I assumed that as_optional_cc_rel() triggers here, it does not, and CC DISCONNECT gets dequeued as expected.

* [1616] "17:12:59.877642 1384 BSC_ConnectionHandler.ttcn:1830 Sent on BSSAP to MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MO ..." -- Tx CC RELEASE (cause=0)
* [1656] "17:13:19.878345 1384 BSC_ConnectionHandler.ttcn:1831 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MT ..." -- Rx CC RELEASE (cause=47), id 22
* [1659] "17:13:19.878505 1379 MNCC_Emulation.ttcn:308 Sent on MNCC_CLIENT to TC_ho_inter_bsc_a5_30(1384) @MNCC_Types.MNCC_PDU ..." -- Rx MNCC_REL_IND (not enqueued yet)
* [1662] "17:13:19.878657 1384 MSC_Tests.ttcn:181 Receive operation on port BSSAP succeeded, message from MSC_Test_0-RAN(1374): @RAN_Emulation.PDU_DTAP_MT ..." -- CC RELEASE (cause=47), id 22
* [1664] "17:13:19.878697 1384 MSC_Tests.ttcn:181 Message with id 22 was extracted from the queue of BSSAP." 
* [1665] "17:13:19.878732 1384 MSC_Tests.ttcn:182 ================================ as_optional_cc_rel(): got CC RELEASE" 
* [1672] "17:13:19.878944 1384 MSC_Tests.ttcn:188 ================================ as_optional_cc_rel(): sending CC REL COMPL" 
* [1673] "17:13:19.879275 1384 MSC_Tests.ttcn:189 Sent on BSSAP to MSC_Test_0-RAN(1374) @RAN_Emulation.PDU_DTAP_MO ..." -- Tx CC RELEASE COMPLETE
* [1674] "17:13:19.879382 1384 MSC_Tests.ttcn:191 ================================ as_optional_cc_rel(): not doing repeat!" 
* [1675] "17:13:19.879415 1384 BSC_ConnectionHandler.ttcn:1831 Default with id 4 (altstep as_optional_cc_rel) finished. Skipping current alt statement or receiving operation." -- (!) skipping MNCC.receive
* [1676] "17:13:19.879445 1384 BSC_ConnectionHandler.ttcn:1832 ======================== f_call_hangup 2: rx MNCC REL ind" -- this is false, we have not received anything on MNCC port yet

Here we see that as_optional_cc_rel() triggers while we're waiting for MNCC_REL_IND on port MNCC (see BSC_ConnectionHandler.ttcn:1831), and this receive operation gets skipped.

* [1678] "17:13:19.879620 1384 BSC_ConnectionHandler.ttcn:1834 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @BSSAP_Types.PDU_BSSAP" -- Rx BSSAP CLEAR COMMAND
* [1680] "17:13:19.879760 1384 BSC_ConnectionHandler.ttcn:1834 Message enqueued on MNCC from MSC_Test-MNCC(1379) @MNCC_Types.MNCC_PDU" -- Rx MNCC_REL_IND

Here we get a BSSAP CLEAR COMMAND from osmo-msc, which is likely triggered by CC RELEASE COMPLETE, which was sent from as_optional_cc_rel()?
Then the MNCC_REL_IND is received on port MNCC, which we will never dequeue because the receive operation was skipped by as_optional_cc_rel() above.

* [1693] "17:13:49.878418 1374 RAN_Emulation.ttcnpp:1246 Message enqueued on BSSAP from MSC_Test_0-SCCP(1373) @SCCPasp_Types.ASP_SCCP_N_DISCONNECT_ind" 
* [1703] "17:13:49.878981 1384 BSC_ConnectionHandler.ttcn:1834 Message enqueued on BSSAP from MSC_Test_0-RAN(1374) @RAN_Emulation.RAN_Conn_Prim : MSC_CONN_PRIM_DISC_IND (0) id 24" 
* [1706] "17:13:49.958098 1384 BSC_ConnectionHandler.ttcn:203 Timeout g_Tguard: 60 s" 
* [1707] "17:13:49.958277 1384 BSC_ConnectionHandler.ttcn:204 setverdict(fail): pass -> fail reason: "Tguard timeout", new component reason: "Tguard timeout"" 

Looks like the MSC terminates the SCCP connection, and finally the Tguard expires.

As can be seen, we never receive CC RELEASE COMPLETE, neither in successful nor in the failing scenario. It's just a lucky coincidence that in the successful scenario the as_optional_cc_rel() "unblocks" the flow by skipping BSSAP.receive operation. In the failing scenario (with io_uring) it simply happens earlier, so the execution flow gets blocked until the Tguard expires.

neels I believe we need to rework the testcase, since we cannot rely on pure luck and hope that the altstep triggers in the right time. If you have ideas how to do this properly, please let me know! And feel free to take this ticket over if you wish.

Actions #8

Updated by neels 21 days ago

On Sun, Mar 24, 2024 at 10:31:49AM +0000, wrote:

neels I believe we need to rework the testcase, since we cannot rely on pure luck and hope that the altstep triggers in the right time. If you have ideas how to do this properly, please let me know! And feel free to take this ticket over if you wish.

That code feels distant to me now, and while I might remember a thing or two, I think it's much more of an advantage that you already looked at tall the details. So if you have any ideas to solve it, they are probably good, and please go ahead! I would have to first read everything again and wrap my head around the symptoms -- or if you have a specific idea we can chat about it?

Actions #9

Updated by fixeria 20 days ago

  • Status changed from Feedback to In Progress

neels wrote in #note-8:

That code feels distant to me now, and while I might remember a thing or two, I think it's much more of an advantage that you already looked at tall the details.

I know this feeling. I just hoped that you would remember your own code quicker than me understanding and finding problems in it :P

So if you have any ideas to solve it, they are probably good, and please go ahead! I would have to first read everything again and wrap my head around the symptoms [...]

I didn't have any ideas at the moment of writing this, and all I had to do is dig into the code and try to understand what goes wrong. And I think I did.

Actions #10

Updated by fixeria 20 days ago

Below is logging that piqued my curiosity:

msc_a(IMSI-262420000000053:MSISDN-491230000053:TMSI-0x80C2D1DC:GERAN-A-2347:CM_SERVICE_REQ)[0x55f012dee8e0]{MSC_A_ST_COMMUNICATING}: Duplicate DTAP: bin=0, expected n_sd == 0, got 3 (ran_msg.c:159)
msc_a(IMSI-262420000000053:MSISDN-491230000053:TMSI-0x80C2D1DC:GERAN-A-2347:CM_SERVICE_REQ)[0x55f012dee8e0]{MSC_A_ST_COMMUNICATING}: Dropping duplicate message (3GPP TS 24.007 11.2.3.2 Message Type Octet / Duplicate Detection) (msc_a.c:1363)

This can be seen in both successful and failing scenarios when running TC_ho_inter_bsc_a5_*, but not when running other non-A5 testcases:

  • TC_ho_inter_bsc
  • TC_ho_inter_bsc_ipv6
  • TC_ho_inter_bsc_csd

What I also noticed [later] is that the execution time is different: ~12s for non-A5 testcases vs ~32s for A5 testcases. And the BSSAP message flow looks slightly different!

This brings us to the following code snippet in MSC_Tests.ttcn:

5857         var default ccrel := activate(as_optional_cc_rel(cpars, true));
5858         
5859         /* blatant cheating */
5860         var N_Sd_Array last_n_sd := f_bssmap_last_n_sd();
5861         last_n_sd[0] := 3;
5862         f_bssmap_continue_after_n_sd(last_n_sd);
5863         
5864         f_call_hangup(cpars, true);
5865         f_sleep(1.0);
5866         deactivate(ccrel);

This code orders the RAN emulation component to patch sequence number in DTAP messages starting from 3.
If I remove this "blatant cheating" block, then I am observing the following situation:

  • non-A5 testcases take longer to execute and make osmo-msc complain about unexpected n_sd,
  • the A5 testcases run quicker and trigger no such warnings anymore.

So what's so special about the A5 testcases? Answer: the authentication! What?!?

Let's take a look at another PCAP from passing normal job, build 2335 MSC_Tests.TC_ho_inter_bsc and compare it against MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap:

$ tshark -T fields -e frame.number -e gsm_a.dtap.seq_no -e _ws.col.Info -Y gsm_a.dtap.seq_no -r MSC_Tests.TC_ho_inter_bsc.pcap.gz
386     0       SACK (Ack=5, Arwnd=106496) CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) Location Updating Request 
612     0       SACK (Ack=4, Arwnd=106496) DT1 (DTAP) (MM) Location Updating Accept 
677     1       SACK (Ack=8, Arwnd=106496) DT1 (DTAP) (MM) TMSI Reallocation Complete 
859     0       CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) CM Service Request 
1006    0       DT1 (DTAP) (MM) CM Service Accept 
1013    1       SACK (Ack=13, Arwnd=106496) DT1 (DTAP) (CC) Setup 
1235    0       SACK (Ack=10, Arwnd=106496) DT1 (DTAP) (CC) Call Proceeding 
1275    0       DT1 (DTAP) (CC) Alerting 
1322    0       DT1 (DTAP) (CC) Connect 
1342    2       SACK (Ack=17, Arwnd=106496) DT1 (DTAP) (CC) Connect Acknowledge 
2208    0       DT1 (DTAP) (CC) Disconnect 
2210    3       SACK (Ack=24, Arwnd=106496) DT1 (DTAP) (CC) Release 
2337    0       SACK (Ack=19, Arwnd=106496) DT1 (DTAP) (CC) Release Complete

$ tshark -T fields -e frame.number -e gsm_a.dtap.seq_no -e _ws.col.Info -Y gsm_a.dtap.seq_no -r MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap 
375     0       SACK (Ack=4, Arwnd=106496) CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) Location Updating Request 
551     0       SACK (Ack=4, Arwnd=106496) DT1 (DTAP) (MM) Authentication Request 
553     1       SACK (Ack=6, Arwnd=106496) DT1 (DTAP) (MM) Authentication Response 
802     0       DT1 (DTAP) (MM) Location Updating Accept 
805     2       DT1 (DTAP) (MM) TMSI Reallocation Complete 
987     0       CR (BSSMAP) Complete Layer 3 Information (DTAP) (MM) CM Service Request 
1126    0       DT1 (DTAP) (MM) Authentication Request 
1129    1       DT1 (DTAP) (MM) Authentication Response 
1259    2       SACK (Ack=15, Arwnd=106496) DT1 (DTAP) (CC) Setup 
1495    0       SACK (Ack=14, Arwnd=106496) DT1 (DTAP) (CC) Call Proceeding 
1521    0       DT1 (DTAP) (CC) Alerting 
1568    0       DT1 (DTAP) (CC) Connect 
1588    3       SACK (Ack=19, Arwnd=106496) DT1 (DTAP) (CC) Connect Acknowledge 
2480    0       DT1 (DTAP) (CC) Disconnect 
2482    3       SACK (Ack=26, Arwnd=106496) DT1 (DTAP) (CC) Release 
2671    0       DT1 (DTAP) (CC) Release 
2686    0       DT1 (DTAP) (CC) Release Complete

Note presence of the Authentication Request/Response, and how this affects sequence numbers (second column) in the messages.

  • in MSC_Tests.TC_ho_inter_bsc.pcap.gz no authentication is performed
    • [frame 2210] CC RELEASE is sent with the correct sequence number (3), so that
    • [frame 2337] the network immediately responds with CC RELEASE COMPLETE and clears connection as expected.
  • in MSC_Tests.TC_ho_inter_bsc_a5_3_pass.pcap the authentication is performed
    • [frame 1129] MM Authentication Response does shift sequence number of all pending messages (basically +1),
    • [frame 1588] CC Connect Acknowledge is the last message with correct sequence number,
    • [frame 2482] CC Release gets discarded by osmo-msc because it should have sequence number 0: (3 + 1) % 4 == 0
    • [frame 2671] osmo-msc releases the call itself due to timeout of timeout of T306
    • [frame 2686] as_optional_cc_rel() triggers and responds with CC Release Complete

So now that I have the full picture, my proposal is to bring those A5 and non-A5 testcases into consistence by having authentication enabled or disabled for them all.

Actions #11

Updated by fixeria 20 days ago

  • Status changed from In Progress to Feedback
  • % Done changed from 60 to 100

fixeria wrote in #note-10:

So now that I have the full picture, my proposal is to bring those A5 and non-A5 testcases into consistence by having authentication enabled or disabled for them all.

I found a better solution, fixing the "blatant cheating":

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36467 msc: fix f_tc_ho_inter_bsc0(): properly patch n_sd [NEW]
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36468 msc: cosmetic: fix formatting in altsteps [NEW]
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/36469 msc: fix missing 'repeat' in as_optional_cc_rel() [NEW]

This fixes the problems with sequence counter, while keeping ciphering & authentication enabled only for the A5-testcases.

Actions #12

Updated by fixeria 17 days ago

  • Status changed from Feedback to Resolved

All patches have been merged, the problem is resolved \o/.

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-msc-test-io_uring/19/

Actions #13

Updated by neels 13 days ago

All patches have been merged, the problem is resolved \o/.

\o/

suggestion: OsmoDevCon shirt with just the word 'repeat' on it.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)