Project

General

Profile

Actions

Bug #6197

open

"Cannot handle SM for unknown MM CTX"

Added by fixeria 5 months ago. Updated 5 months ago.

Status:
Feedback
Priority:
Low
Assignee:
Category:
-
Target version:
-
Start date:
09/28/2023
Due date:
% Done:

0%

Spec Reference:

Description

I am observing relatively long PDP Context activation with Sony Ericsson K800i and recent osmo-sgsn:

osmo-sgsn 1.11.0
osmo-pcu 1.3.1.1-c1b0

I don't remember if this was the case before, most likely not.

As can be seen from the attached PCAP, the MS orders a PDP Context activation right after completing the Attach (frame 259):

  130 16.160906108    127.0.0.1 → 127.0.0.1    GPRS-LLC 107 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Attach Request 
  156 16.161190149    127.0.0.1 → 127.0.0.1    GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Identity Request 
  157 16.797408334    127.0.0.1 → 127.0.0.1    GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 1(DTAP) (GMM) Identity Response 
  173 16.797533278    127.0.0.1 → 127.0.0.1    GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 1(DTAP) (GMM) Identity Request 
  181 17.200282825    127.0.0.1 → 127.0.0.1    GPRS-LLC 86 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 2(DTAP) (GMM) Identity Response 
  226 17.225222456    127.0.0.1 → 127.0.0.1    GPRS-LLC 113 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 2(DTAP) (GMM) Attach Accept 
  239 17.697504097    127.0.0.1 → 127.0.0.1    GPRS-LLC 77 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 3(DTAP) (GMM) Attach Complete 
  259 17.739056217    127.0.0.1 → 127.0.0.1    GPRS-LLC 136 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 4(DTAP) (SM) Activate PDP Context Request  <-- (!)
  274 17.739270297    127.0.0.1 → 127.0.0.1    GPRS-LLC 76 SAPI: LLGMM, U, XID
  275 17.739280476    127.0.0.1 → 127.0.0.1    GPRS-LLC 75 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Detach Request  <-- (!)

The SGSN is responding with GMM Detach Request (frame 275), here is the related logging:

  259 17.739056217    127.0.0.1 → 127.0.0.1    GPRS-LLC 136 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 4(DTAP) (SM) Activate PDP Context Request 
  260 17.739109847    127.0.0.1 → 127.0.5.1    GSMTAP 180 NSE(00101)-NSVC(00101) Rx NS-UNITDATA 
  261 17.739128332    127.0.0.1 → 127.0.5.1    GSMTAP 263 GPRS-NS2-VC(UDP-NSE00101-NSVC00101-0_0_0_0:23000-127_0_0_1:23023)[0x55e69ba253d0]{UNBLOCKED}: Received Event RX-UNITDATA 
  262 17.739139873    127.0.0.1 → 127.0.5.1    GSMTAP 180 NSE(00101)-NSVC(00101) Rx NS-UNITDATA 
  263 17.739148439    127.0.0.1 → 127.0.5.1    GSMTAP 183 BSSGP TLLI=0x85c79efb Rx UPLINK-UNITDATA 
  264 17.739181411    127.0.0.1 → 127.0.5.1    GSMTAP 236 LLME(ffffffff/85c79efb){UNASSIGNED} LLC RX: unknown TLLI 0x85c79efb, creating LLME on the fly 
  265 17.739188394    127.0.0.1 → 127.0.5.1    GSMTAP 193 LLC SAPI=1 C   U GEA0 IOV-UI=0x000000 FCS=0x3ef88c 
  266 17.739193033    127.0.0.1 → 127.0.5.1    GSMTAP 149 CMD=UI 
  267 17.739196720    127.0.0.1 → 127.0.5.1    GSMTAP 147 DATA 
  268 17.739200627    127.0.0.1 → 127.0.5.1    GSMTAP 143  
  269 17.739212048    127.0.0.1 → 127.0.5.1    GSMTAP 214 LLME(ffffffff/85c79efb){UNASSIGNED} Cannot handle SM for unknown MM CTX 
  270 17.739224792    127.0.0.1 → 127.0.5.1    GSMTAP 198 LLME(ffffffff/85c79efb){UNASSIGNED} LLGM Reset (SAPI=1) 
  271 17.739243207    127.0.0.1 → 127.0.5.1    GSMTAP 180 NSE(00101)-NSVC(00101) Tx NS-UNITDATA 
  272 17.739252294    127.0.0.1 → 127.0.5.1    GSMTAP 215 <- GMM DETACH REQ (type: re-attach required, cause: Implicitly detached) 
  273 17.739257293    127.0.0.1 → 127.0.5.1    GSMTAP 180 NSE(00101)-NSVC(00101) Tx NS-UNITDATA 
  274 17.739270297    127.0.0.1 → 127.0.0.1    GPRS-LLC 76 SAPI: LLGMM, U, XID
  275 17.739280476    127.0.0.1 → 127.0.0.1    GPRS-LLC 75 SAPI: LLGMM, UI, protected, non-ciphered information, N(U) = 0(DTAP) (GMM) Detach Request 

The MS repeats the request again (frame 517) 30 seconds after the first attempt, and finally gets a PDP Context activated.
The key difference between frames 259 (first attempt) and 517 (second attempt) is TLLI indicated in the BSSGP header.


Files

Actions #1

Updated by pespin 5 months ago

  • Status changed from New to Feedback
  • Assignee set to fixeria

fixeria the main problem to me seems to be that the MS keeps using the same TLLI it was using before the GMM Attach also after the GMM Attach procedure has finished. The SM Activate PDP Ctx Req is transmitted in BSSGP with the old TLLI=0x85c79efb.

See how MS signals the up-to-then-current TLLI=0x85c79efb and PTMSI=0xc5c79efb during GMM Attach Req in frame_nr 130.
Then, SGSN assigns a new PTMSI=0xe7ba3193 in GMM Attach Accept in frame_nr 226.
When MS confirms the Attach wtih GMM Attach Accept, you can see the SGSN applying the derived TLLI from the PTMSI which was generated during GMM Attach Accept:

253    14:15:03.476231029    Sep 28, 2023 16:15:03.476231029 CEST    127.0.0.1    42590    127.0.5.1    4729    GSMTAP    215    LLME(85c79efb/e7ba3193){ASSIGNED} LLGM Assign pre (ffffffff => e7ba3193) 
254    14:15:03.476234726    Sep 28, 2023 16:15:03.476234726 CEST    127.0.0.1    42590    127.0.5.1    4729    GSMTAP    216    LLME(ffffffff/e7ba3193){ASSIGNED} LLGM Assign post (ffffffff => e7ba3193) 

So from that point on, osmo-sgsn doesn't longer expect TLLI=85c79efb previously used at the start of the Attach procedure.

However, in the followup SM Activate PDP Context Req, the MS still uses that one instead of the new one, and that makes it fail:

264    14:15:03.517812062    Sep 28, 2023 16:15:03.517812062 CEST    127.0.0.1    42590    127.0.5.1    4729    GSMTAP    236    LLME(ffffffff/85c79efb){UNASSIGNED} LLC RX: unknown TLLI 0x85c79efb, creating LLME on the fly 
269    14:15:03.517842699    Sep 28, 2023 16:15:03.517842699 CEST    127.0.0.1    42590    127.0.5.1    4729    GSMTAP    214    LLME(ffffffff/85c79efb){UNASSIGNED} Cannot handle SM for unknown MM CTX 

We'd need to check a couple things here:
- The TLLI is in BSSGP, which means it's filled in by osmo-pcu. So it may be a problem at osmo-pcu not using the new TLLI when needed.
- We need to check in the specs the exact moment where the old TLLI cannot be used anymore from the MS/PCU.

For the 1st one, we probably need a new pcap which also includes osmo-pcu RLC/MAC. I personally use the following:

pcu
 gsmtap-remote-host 192.168.30.1
 gsmtap-category dl-unknown
 gsmtap-category dl-ctrl
 gsmtap-category dl-data-gprs
 gsmtap-category dl-data-egprs
 gsmtap-category dl-agch
 gsmtap-category dl-pch
 gsmtap-category ul-unknown
 gsmtap-category ul-ctrl
 gsmtap-category ul-data-gprs
 gsmtap-category ul-data-egprs
 gsmtap-category ul-rach

in osmo-bts:

 gsmtap-remote-host 127.0.0.1
 gsmtap-sapi enable-all
 no gsmtap-sapi pdtch
 no gsmtap-sapi ptcch
 no gsmtap-sapi pacch

Actions #2

Updated by pespin 5 months ago

You can see that in the SM Activate Req retransmitted a few seconds later (frame_nr=517), it now uses the newly assigned TLLI=0xe7ba3193 and everything works as expected.

Actions #3

Updated by pespin 5 months ago

3GPP TS 24.008 4.7.3.1.3 GPRS attach accepted by the network:

The P-TMSI reallocation may be part of the GPRS attach procedure. When the ATTACH REQUEST includes the IMSI
or IMEI, the SGSN shall allocate the P-TMSI. The P-TMSI that shall be allocated is then included in the ATTACH
ACCEPT message together with the routing area identifier. The network shall, in this case, change to state GMM-
COMMON-PROCEDURE-INITIATED and shall start timer T3350 as described in subclause 4.7.6. Furthermore, the
network may assign a P-TMSI signature for the GMM context which is then also included in the ATTACH ACCEPT
message.
[...]
If the message contains a P-TMSI, the MS shall use this P-TMSI as the new temporary identity for GPRS services. In
this case, an ATTACH COMPLETE message is returned to the network. The MS shall delete its old P-TMSI and shall
store the new one. If no P-TMSI has been included by the network in the ATTACH ACCEPT message, the old P-TMSI,
if any available, shall be kept.
If the message contains a P-TMSI signature, the MS shall use this P-TMSI signature as the new temporary signature for
the GMM context. The MS shall delete its old P-TMSI signature, if any is available, and shall store the new one. If the
message contains no P-TMSI signature, the old P-TMSI signature, if available, shall be deleted.

So from that spec fragment, I think it's clear the PCU/MS is misbheaving.

We need a pcap with RLC/MAC to find out where the problem is.

Actions #4

Updated by pespin 5 months ago

4.7.3.1.6 Abnormal cases on the network side
The following abnormal cases can be identified:
a) Lower layer failure
If a low layer failure occurs before the message ATTACH COMPLETE has been received from the MS and a
new P-TMSI (or a new P-TMSI and a new P-TMSI signature) has been assigned, the network shall consider both
the old and new P-TMSI each with its corresponding P-TMSI-signature as valid until the old P-TMSI can be
considered as invalid by the network (see subclause 4.7.1.5) or the GMM context which has been marked as
detached in the network is released, and shall not resent the message ATTACH ACCEPT. During this period the
network may:
- use the identification procedure followed by a P-TMSI reallocation procedure if the old P-TMSI is used by
the MS in a subsequent message.
Actions #5

Updated by fixeria 5 months ago

For the record, the firmware version is R1GP001 (prgCXC1250210_GENERIC_WI).

I found a switch in the settings menu ("Connectivity" -> "Data communication" -> "Preferred service"), which controls whether the MS is staying GMM-attached all time ("PS and CS") or doing the GMM attach/detach every time a PDP context is activated/deactivated ("CS only"). The problem was observed with the "CS only", and is gone after a switched to "PS and CS". The phone is now always GMM-attached and PDP Context activation works fine. Thus lowering the priority.

pespin please find a PCAP with RLC/MAC traces attached.

Actions #6

Updated by fixeria 5 months ago

  • Assignee changed from fixeria to pespin

The RLC/MAC traces reveal several interesting things:

  • When sending GMM Attach Complete, the phone is using CS-4 on the Uplink;
  • Frame 493 (RLC/MAC UL DATA) carries not one, but two LLC segments:
    • the first segment (8 bytes, 01c00d0803551cea) containing the Attach Accept message;
    • the second segment (41 bytes) is a part of the PDP Context Activation request;
  • Frame 499 (RLC/MAC UL DATA) carries another LLC segment, containing the remaining 26 bytes.

So what the phone is doing is actually re-using the same TBF to initiate the SM procedure. And the PCU is just including TLLI of that same TBF.

Actions #7

Updated by pespin 5 months ago

fixeria agreeing with your analysis.
There's no way that the PCU can get to know the new TLLI in a reasonable way in that scenario, because it is not going through contention resolution.
I'd say that's actually a bug in the MS stack, though it's difficult to say. It shouldn't be requesting a new TBF if the TLLI changes, or at least without providing the new TLLI when using the new TBF. We may want to check TS 44.060 on the matter.

Actions #8

Updated by pespin 5 months ago

TS 44.060 5.5.1.8 TLLI management:

After contention resolution the mobile station shall apply new TLLI in RLC/MAC control block if the mobile has
received a new P-TMSI.

The GMM Attach Accept is sent in gsm fn=348625, pcap frame_nr=404.
MS sends UL CTRL blocks in:

428    15:37:49.623898459    Sep 28, 2023 17:37:49.623898459 CEST    127.0.0.1    34344    127.0.1.3    4729    GSM RLC/MAC    81    GPRS UL CTRL: PACKET_DOWNLINK_ACK_NACK
437    15:37:49.665406374    Sep 28, 2023 17:37:49.665406374 CEST    127.0.0.1    34344    127.0.1.3    4729    GSM RLC/MAC    81    GPRS UL CTRL: PACKET_CONTROL_ACKNOWLEDGEMENT
442    15:37:49.684257074    Sep 28, 2023 17:37:49.684257074 CEST    127.0.0.1    34344    127.0.1.3    4729    GSM RLC/MAC    81    GPRS UL CTRL: PACKET_DOWNLINK_ACK_NACK

The 2 DL_ACK_NACK blocks contain no TLLI (not even sure if they can contain a TLLI field, need to check TS 44.060).
The PKT CONTROL ACK in frame_nr 437 contains a TLLI=0xb9518db1, which IIUC is the old one?

So according to the specs above the MS should have sent the new TLLI instead. (Another topic is whether we'd update it correctly in osmo-pcu if it had sent the new one, needs to be checked).

Actions #9

Updated by pespin 5 months ago

  • Assignee changed from pespin to fixeria

I just checked TS 44.060 and DL ACK/NACK cannot have a TLLI field. So my conclusion here is that MS should have sent the new TLLI in PKT CTRL ACK in frame_nr 437, but it sent the old one instead => MS stack bug.

Actions #10

Updated by pespin 5 months ago

we do update the TLLI in that scenario in osmo-pcu already, so if MS would have sent it the new TLLI, then we'd have updated it properly AFAICT:

void gprs_rlcmac_pdch::rcv_control_ack(Packet_Control_Acknowledgement_t *packet, uint32_t fn)
{
...
uint32_t tlli = packet->TLLI;
...
    ms_update_announced_tlli(tbf->ms(), tlli);
    /* Gather MS from TBF again, since it may be NULL or may have been merged during ms_update_announced_tlli */
    ms = tbf->ms();

Actions #11

Updated by fixeria 5 months ago

I would still like to test it against a commercial network, or at least against a nanoBTS.
It needs to be checked though if the TEMS firmware exhibits the same behavior when operating in the "CS only" mode.

Actions #12

Updated by fixeria 5 months ago

fixeria wrote in #note-11:

It needs to be checked though if the TEMS firmware exhibits the same behavior when operating in the "CS only" mode.

TEMS firmware (CXC1722434_TEMS R2B) for K800i does exhibit the same behavior in "CS only".
This is good news, because we can get the MS side packet traces.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)