Project

General

Profile

Actions

Bug #4879

open

endless pdch.cpp:809 Got CS-N RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264

Added by fixeria over 3 years ago. Updated 8 months ago.

Status:
Stalled
Priority:
High
Assignee:
Target version:
-
Start date:
11/30/2020
Due date:
% Done:

0%

Spec Reference:

Description

I just upgraded all osmo-{ran,cni} components to the recent master, and now quite often run into a situation when the MS (at least Sony Ericsson K800i, TEMS) keeps sending the same Uplink block again and again. I am not sure what exactly causes it, but I can reproduce it more or less reliably by starting Opera Mini (http://people.osmocom.org/fixeria/j2me/opera_mini.jar). When started for the first time, Opera initiates the installation process, and this is where the problem usually shows up.

DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
DRLCMACMEAS INFO gprs_rlcmac_meas.cpp:108 MS(TLLI=0xc5849e78, IMSI=901990000000021, TA=0, 10/0, UL DL) UL RSSI: -29 dBm

Please see the attached capture file. Some highlights:

43585 RACH!
43591 IMM ASS (single block)
43731 UL Packet Resource Request
43799 DL Packet Uplink Assignment (TS=6 USF=0)
...
43910 UL DATA (BSN=0 CV=15)
...
43915 UL DATA | TCP FIN,ACK (Opera Mini closes connection to the server)
...
43918 UL DATA (BSN=3 CV=15)
43955 UL DATA (BSN=5 CV=14)  <-- 14 RLC blocks left
...
44070 UL DATA (BSN=14 CV=5)
...
44146 UL DATA (BSN=19 CV=0)  <-- 0 RLC blocks left
...
44149 UL DATA (BSN=19 CV=0)  <-- re-transmission
44152 UL DATA (BSN=19 CV=0)  <-- re-transmission

starting from frame 44149, the MS keeps transmitting the same RLC/MAC block ('35bdc794cd2b631285b2d43513'O). Interestingly enough, after each re-transmission the PCU logs "GPRS DL CTRL: PACKET_UPLINK_ACK_NACK", but does not actually send it (dummy RLC/MAC frames are not recorded). And this goes like that unless I turn off the phone. At the same time, Downlink blocks are received and accepted by the MS on the same timeslot.

OsmoPCU 58cd1d2f8a0474de45112e8d6e460051494eba79
OsmoBTS def24f0d9af2463a5ef557d35f23abd5b4d07120


Files

pcu_loop.pcap.gz pcu_loop.pcap.gz 1.28 MB fixeria, 11/29/2020 11:11 PM
Actions #1

Updated by pespin over 3 years ago

Summary of discussion in IRC:

<pespin_> Dl assignment in 43908 and Pkt Control Ack in 43953
<pespin_> Poll is expected in TS6 and MS sends the Ctrl Ack in TS7
<pespin_> The DL assignment is assigning TS6 and TS7
<pespin_> Frame 43840 tbf.cpp:363 TBF(TFI=0 TLLI=0xc5849e78 DIR=DL STATE=NULL) Setting Control TS 6\n
<pespin_> so I think it's clear we expect control slot to be in TS 6 but MS expects control slot to be in TS 7
<pespin_> now we need to find how/why :)
...
19<fixeria> pespin_: lol, the other phone (SE W595) loses the network signal when I start osmo-pcu :D
20<pespin_>30 fixeria, looks like the problem probably comes from first UL TBF bein assigned on TS7 (where the 2nd phase happens, PKT REsource Request) and then the MS is assigned TS6
20<pespin_>30 but the MS continues trying to use TS7?
18<fixeria18> all re-transmissions occur on TS6 AFAICS
20<pespin_>30 sorry the other way around, we expect to poll on TS7 but we assigned TS6
20<pespin_>30 fixeria, the issue is much before, during PKT Resource Request
20<pespin_>30 see gsmtap log before 43799
20<pespin_>30 in 43799 (PKt resource Request), we assign TS6 but we still expect to keep polling on TS7
20<pespin_>30 that's wrong I'd say

Actions #2

Updated by pespin over 3 years ago

<pespin_> fixeria, I have the feeling there's some kind of timing/scheduling issue. Pkt resource request fame 43731: as a response, Ul Assignment assigns UL TBF on TS6 (and later Dl TBF on TS6 and 7). Then MS sends UL data BSN=0..3 in TS6, and BSN=4 in TS7, exactly 4 FNs later
<pespin_> but it was never assigned TS7 on uplink...
<pespin_> so it looks as if at some point MS starts transmitting on TS7 instead of TS6..
pespin_>30 fixeria, I think either the MS is buggy or we have some issue in the scheduler when receiving
Actions #3

Updated by pespin over 3 years ago

The stalled UL window scenario probably happens for 2 reasons:
  • First, since the MS unexpectedly sent some data on TS7 instead of TS6, we didn't get all packets and hence we cannot ACK everything, that's why there's holes in the ACK/NACK bitmap.
  • Probably the MS is somehow expecting to receive the ACK/NACK packets on PACCH at TS7 instead of TS6 (don't know why, but it probably does, same reason why it sends stuff on TS7 sometimes).
Actions #4

Updated by pespin over 3 years ago

I tried to reproduce this issue for a while with a Sony Ericsson K800i (no TEMS) here and I couldn't. I didn't notice the stall nor saw anything related in osmo-pcu.log nor on wireshark looking a rlcmac+gprs-ns+gtp, everything is working smoothly here.

Info about my phone "Current software version":
CXC1250210 R8BF003 080130 Sony Ericsson

Actions #5

Updated by fixeria about 3 years ago

I just faced this problem again (version 0.8.0.374-55ac). This UL TBF remains 'alive' forever:

OsmoPCU# show tbf all 
UL TBFs
TBF: TFI=0 TLLI=0xdb01841c (valid) TA=0 DIR=UL IMSI=901703894347965
 created=1612561855 state=00000086 [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=10/0
 TS_alloc=6!  TRX_ID=0 CS=CS-4 WS=64 V(Q)=9 V(R)=19
 TBF Statistics:
 Data Blocks:
  CS1        :        0 (0/s 0/m 0/h 0/d)
  CS2        :    29046 (50/s 4337/m 27665/h 27665/d)
  CS3        :        0 (0/s 0/m 0/h 0/d)
  CS4        :        0 (0/s 0/m 0/h 0/d)

and the number of CS2 blocks keeps growing even despite the phone was switched off.

OsmoPCU# show ms all 
MS TLLI=db01841c, IMSI=901703894347965
  Timing advance (TA):    0
  Coding scheme uplink:   CS-4
  Coding scheme downlink: CS-4
  Mode:                   GPRS
  MS class:               10
  EGPRS MS class:         0
  PACCH:                  6 
  LLC queue length:       0
  LLC queue octets:       0
  RSSI:                   -60 dBm
  Bit error rate:         22 %
  Link quality:           8 dB
  Burst timing offset:    0/4 bit
  Downlink NACK rate:     0 %
  MS RX quality:          18 %
  MS C value:             63 dB
  MS SIGN variance:       0 dB
  MS I level (slot 0):    0 dB
  MS I level (slot 1):    0 dB
  MS I level (slot 2):    0 dB
  MS I level (slot 3):    0 dB
  MS I level (slot 4):    0 dB
  MS I level (slot 5):    0 dB
  MS I level (slot 6):    0 dB
  MS I level (slot 7):    0 dB
  Uplink TBF:             TFI=0, state=FLOW
  MS Statistics:
   Amount of DL CTRL messages scheduled:    24503 (35/s 3307/m 23229/h 19143/d)

This happened [again] during the installation of Opera Mini (first run, license agreement, etc).

Actions #6

Updated by fixeria about 3 years ago

I just faced this problem again (version 0.8.0.374-55ac). This UL TBF remains 'alive' forever:

I was planning to attach with gdb and examine the TBF structure closer, but it has gone :/

and the number of CS2 blocks keeps growing even despite the phone was switched off.

Apparently, switching off took longer than usually. Will keep it on next time I face it.

Info about my phone "Current software version":
CXC1250210 R8BF003 080130 Sony Ericsson

This time I am facing the problem with a 'normal' (not TEMS) K800i:
CXC1250210 R8BF003 080130 - exactly the same fw version.

Actions #7

Updated by pespin almost 3 years ago

  • Status changed from New to Feedback
  • Assignee set to fixeria

Hi fixeria , assigning this ticket back to you to see if you can reproduce it again using osmo-pcu master (I was unable to reproduce it at the time). Since a lot of scheduler/polling/lifecycle code has been improved, it could well be that the issue is fixed or at least mitigated.

Actions #8

Updated by fixeria about 2 years ago

  • Status changed from Feedback to Stalled

I will try to debug this further with the help of my spare time project:

https://github.com/axilirator/sedbgmux

and come back with more information. Working on more urgent projects ATM, so setting to Stalled.

Actions #9

Updated by pespin 8 months ago

Some new feedback using osmo-pcu+osmo-bts master here would be welcome, more than 1 year passed since last comment.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)