Bug #6351
closedrlcmac: Bug calculating/encoding 2 LLC frames fitting in 1 CS4 UL Data block
100%
Description
During initial GMM ATTACH + SM PDP ACT procedure, if using CS4 in UL, we end up in a situation where basically the MS, upon receiving GMM Attach Accept, triggers creation of a new UL TBF and wants to send 2 LLC frames one after the other (GMM AttachCompl + SM PDP Act Context Req):
1242 16:09:52.495676 Feb 2, 2024 17:09:52.495676000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 199 ✓ GRE(91223344) Update TLLI 0x91223344 -> 0xe0083f9b 1243 16:09:52.495750 Feb 2, 2024 17:09:52.495750000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 224 ✓ GMME(IMSI-001010000000101:PTMSI-e0083f9b:TLLI-e0083f9b) Tx GMM ATTACH COMPL 1245 16:09:52.495901 Feb 2, 2024 17:09:52.495901000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 228 ✓ modem_llc_prim_down_cb(): Rx GRR-UNITDATA.request ll=[01 c0 0d 08 03 55 1c ea ] 1247 16:09:52.496061 Feb 2, 2024 17:09:52.496061000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 209 ✓ GRE(e0083f9b) Enqueueing LLC-PDU len=8 SAPI=GMM radio_prio=1 1260 16:09:52.497107 Feb 2, 2024 17:09:52.497107000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 201 ✓ PDP(ID-0:NSAPI-6) Tx SM Activate PDP Context Request 1263 16:09:52.497349 Feb 2, 2024 17:09:52.497349000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 321 ✓ modem_llc_prim_down_cb(): Rx GRR-UNITDATA.request ll=[01 c0 11 8a 41 06 03 0e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 01 21 28 09 08 69 6e 74 65 72 6e 65 74 9e 49 7a ] 1265 16:09:52.497473 Feb 2, 2024 17:09:52.497473000 CET 127.0.0.1 45126 127.0.1.1 4729 GSMTAP 210 ✓ GRE(e0083f9b) Enqueueing LLC-PDU len=39 SAPI=GMM radio_prio=1
So it's 1 LLC frame of 8 bytes, and 1 LLC frame of 39 bytes, that's 47 bytes of data in total.
Then, in the pcap we can see that only the Attach Compl is transmitted with UL DATA block BSN=0 and CV=0.
However, the Act PDP Context Req is not included in the encoded BSN=0 UL data block, and ofc no extra block is expected because the MS sent the first one with CV=0.
This happens because our gprs_rlcmac_ul_tbf_calculate_cv() function calculates it can fit both in 1 CS4 UL data block:
20240202172854510 DRLCMAC NOTICE tbf_ul.c:507 TBF(UL:NR-3:TLLI-e172f149) calculate_cv blk_state_init: tx_cs=4 bs_cv_max=15 blk_data_len=50 │ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 20240202172854510 DRLCMAC NOTICE tbf_ul.c:608 TBF(UL:NR-3:TLLI-e172f149) calculate_cv after pq[0][0]: blk_count=0 offset=49 │52: modem4: <POINTOPOINT,MULTICAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 500 20240202172854510 DRLCMAC NOTICE tbf_ul.c:615 TBF(UL:NR-3:TLLI-e172f149) calculate_cv after done: blk_count=0 offset=49 │ link/none 20240202172854510 DRLCMAC NOTICE tbf_ul.c:627 TBF(UL:NR-3:TLLI-e172f149) calculate_cv after adapting: : blk_count=1 offset=0 │root@mssdr-ms:/home/ubuntu# ip addr 20240202172854510 DRLCMAC NOTICE tbf_ul.c:629 TBF(UL:NR-3:TLLI-e172f149) blk_count after adapting: blk_count=1 offset=0 -> x=0 bs_cv_max=15
Taking into account we are in contention resolution, it's 54-4 bytes which can fit. This whole thing needs to be checked for wrong calculations. Starting by the fact that GPRS may not allow to send multiple LLC frames in one RLC/MAC block? Maybe only in EGPRS it's allowed? I need to check.
So what needs to be done:
1- Check if several LLC frames can be put inside 1 RLC/MAC UL data block.
2- If answer is no, fix gprs_rlcmac_ul_tbf_calculate_cv() to return CV=1 in this case.
3- If answer is yes, then check if the size calculations are correct in gprs_rlcmac_ul_tbf_calculate_cv()
4- if everything is fine, then we may have a bug in the block encoder.
The quick hack for now is to avoid using UL CS4, that should prevent this incident to happen since it's the only block size which gprs_rlcmac_ul_tbf_calculate_cv() considers can fit into 1 block.
Updated by pespin 3 months ago
A GPRS RLC Data block can contain several LLC PDUs, as per TS 44.060, 10.0a.1 "The RLC data unit contains octets from one or more upper layer PDUs.". See also for instance B.1 Example 1.
I can reproduce the issue with a unit test (here triggering through normal UL TBF assignment hence with content resolution ON, and hence why it says blk_data_len=46 instead of 50 like in the physical setup, because in there the UL TBF is created through a DL TBF and hence no content resolution is needed (aka no need to stuff TLLI 4 bytes in it).
What I do is when submitting the SM PDP Context Act Req from the physical setup, I submit with len - 4 to end up filling the message the same, which indeed triggers the issue:
+DLGLOBAL DEBUG GRE(00002342) Enqueueing LLC-PDU len=8 SAPI=GMM radio_prio=1 +DLGLOBAL INFO UL_TBF{NEW}: Allocated +DLGLOBAL INFO UL_TBF_ASS{IDLE}: Allocated +DLGLOBAL INFO UL_TBF_ASS{IDLE}: Received Event START +DLGLOBAL INFO UL_TBF{NEW}: Received Event UL_ASS_START +DLGLOBAL INFO UL_TBF{NEW}: state_chg to ASSIGN +DLGLOBAL INFO UL_TBF_ASS{IDLE}: Requesting one-phase packet access using CCCH +DLGLOBAL DEBUG Tx to lower layers: L1CTL-RACH.request +DLGLOBAL INFO UL_TBF_ASS{IDLE}: state_chg to WAIT_CCCH_IMM_ASS +DLGLOBAL INFO Rx from upper layers: GRR-UNITDATA.request +DLGLOBAL DEBUG GRE(00002342) Enqueueing LLC-PDU len=35 SAPI=GMM radio_prio=1 +DLGLOBAL DEBUG Rx from lower layers: L1CTL-CCCH_DATA.indication +DLGLOBAL DEBUG Tx to lower layers: L1CTL-PDCH_ESTABLISH.request +DLGLOBAL INFO UL_TBF_ASS{WAIT_CCCH_IMM_ASS}: Received Event RX_CCCH_IMM_ASS +DLGLOBAL INFO UL_TBF_ASS{WAIT_CCCH_IMM_ASS}: ImmAss TFI=0 initCS=CS-4 cur_tn=7 cur_fn=0 start_fn=0 +DLGLOBAL INFO UL_TBF_ASS{WAIT_CCCH_IMM_ASS}: ImmAss DynamicAlloc (1phase access) ts_nr=7 usf=0 +DLGLOBAL INFO UL_TBF_ASS{WAIT_CCCH_IMM_ASS}: state_chg to COMPLETED +DLGLOBAL INFO UL_TBF{ASSIGN}: Received Event UL_ASS_COMPL +DLGLOBAL INFO TBF(UL:NR-0:TLLI-00002342) Send L1CTL-CFG_UL_TBF.req ul_tbf_nr=0 ul_slotmask=0x80 tbf_starting_time(present=0 fn=0) +DLGLOBAL DEBUG Tx to lower layers: L1CTL-CFG_UL_TBF.request +DLGLOBAL INFO UL_TBF{ASSIGN}: state_chg to FLOW +DLGLOBAL INFO UL_TBF_ASS{COMPLETED}: state_chg to IDLE +DLGLOBAL DEBUG Rx from lower layers: L1CTL-PDCH_RTS.indication +DLGLOBAL DEBUG Rx RTS.ind (fn=4, ts=7, usf=0) +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Sending new block at BSN 0, CS=CS-4 +DLGLOBAL NOTICE TBF(UL:NR-0:TLLI-00002342) calculate_cv blk_state_init: tx_cs=4 bs_cv_max=15 blk_data_len=46 +DLGLOBAL NOTICE TBF(UL:NR-0:TLLI-00002342) calculate_cv after pq[0][0]: blk_count=0 offset=45 +DLGLOBAL NOTICE TBF(UL:NR-0:TLLI-00002342) calculate_cv after done: blk_count=0 offset=45 +DLGLOBAL NOTICE TBF(UL:NR-0:TLLI-00002342) calculate_cv after adapting: : blk_count=1 offset=0 +DLGLOBAL NOTICE TBF(UL:NR-0:TLLI-00002342) blk_count after adapting: blk_count=1 offset=0 -> x=0 bs_cv_max=15 +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Entering Countdown procedure CV=0 +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Dequeue next LLC (len=8) +DLGLOBAL DEBUG -- Chunk with length 8 is less than remaining space (46): add length header to delimit LLC frame +DLGLOBAL DEBUG -- Final block, so we done. +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Complete UL frame, len=0 +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Dequeue next LLC (len=35) +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) data block (BSN=0, CS-4, CV=0): 21 00 00 23 42 01 c0 0d 08 03 55 1c ea 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) need_padding 0 spb_status 0 spb 0 (BSN1 0 BSN2 -1) +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Copying 1 RLC blocks, 1 BSNs +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) Copying data unit 0 (BSN=0 CV=0) +DLGLOBAL DEBUG TBF(UL:NR-0:TLLI-00002342) msg block (BSN 0, CS-4): 00 01 00 21 00 00 23 42 01 c0 0d 08 03 55 1c ea 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 2b 00 +DLGLOBAL INFO UL_TBF{FLOW}: Received Event FIRST_UL_DATA_SENT
See how the block encoder picks only 1 RLC block instead of 2.
Now we need to find whether the CV calculation is wrong, or whether the block encoder is wrong.
Updated by pespin 3 months ago
- Status changed from New to Feedback
- % Done changed from 0 to 90
Unit test reproducing the bug plus proposed fix can be found here:
emote: https://gerrit.osmocom.org/c/libosmo-gprs/+/35854 tests: rlcmac: Reproduce bug encoding 2 llc frames in 1 rlc block [NEW]
remote: https://gerrit.osmocom.org/c/libosmo-gprs/+/35855 rlcmac: Fix Last RLC block (CV=0) note encoding >1 LLC frames [NEW]
I'll now continue testing in the physical setup.