Bug #4879
openendless pdch.cpp:809 Got CS-N RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264
0%
Description
I just upgraded all osmo-{ran,cni} components to the recent master, and now quite often run into a situation when the MS (at least Sony Ericsson K800i, TEMS) keeps sending the same Uplink block again and again. I am not sure what exactly causes it, but I can reproduce it more or less reliably by starting Opera Mini (http://people.osmocom.org/fixeria/j2me/opera_mini.jar). When started for the first time, Opera initiates the installation process, and this is where the problem usually shows up.
DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACUL INFO pdch.cpp:809 Got CS-2 RLC block: R=0, SI=0, TFI=0, CPS=0, RSB=0, rc=264 DRLCMACMEAS INFO gprs_rlcmac_meas.cpp:108 MS(TLLI=0xc5849e78, IMSI=901990000000021, TA=0, 10/0, UL DL) UL RSSI: -29 dBm
Please see the attached capture file. Some highlights:
43585 RACH! 43591 IMM ASS (single block) 43731 UL Packet Resource Request 43799 DL Packet Uplink Assignment (TS=6 USF=0) ... 43910 UL DATA (BSN=0 CV=15) ... 43915 UL DATA | TCP FIN,ACK (Opera Mini closes connection to the server) ... 43918 UL DATA (BSN=3 CV=15) 43955 UL DATA (BSN=5 CV=14) <-- 14 RLC blocks left ... 44070 UL DATA (BSN=14 CV=5) ... 44146 UL DATA (BSN=19 CV=0) <-- 0 RLC blocks left ... 44149 UL DATA (BSN=19 CV=0) <-- re-transmission 44152 UL DATA (BSN=19 CV=0) <-- re-transmission
starting from frame 44149, the MS keeps transmitting the same RLC/MAC block ('35bdc794cd2b631285b2d43513'O). Interestingly enough, after each re-transmission the PCU logs "GPRS DL CTRL: PACKET_UPLINK_ACK_NACK", but does not actually send it (dummy RLC/MAC frames are not recorded). And this goes like that unless I turn off the phone. At the same time, Downlink blocks are received and accepted by the MS on the same timeslot.
OsmoPCU 58cd1d2f8a0474de45112e8d6e460051494eba79
OsmoBTS def24f0d9af2463a5ef557d35f23abd5b4d07120
Files
Updated by pespin over 3 years ago
Summary of discussion in IRC:
<pespin_> Dl assignment in 43908 and Pkt Control Ack in 43953 <pespin_> Poll is expected in TS6 and MS sends the Ctrl Ack in TS7 <pespin_> The DL assignment is assigning TS6 and TS7 <pespin_> Frame 43840 tbf.cpp:363 TBF(TFI=0 TLLI=0xc5849e78 DIR=DL STATE=NULL) Setting Control TS 6\n <pespin_> so I think it's clear we expect control slot to be in TS 6 but MS expects control slot to be in TS 7 <pespin_> now we need to find how/why :) ... 19<fixeria> pespin_: lol, the other phone (SE W595) loses the network signal when I start osmo-pcu :D 20<pespin_>30 fixeria, looks like the problem probably comes from first UL TBF bein assigned on TS7 (where the 2nd phase happens, PKT REsource Request) and then the MS is assigned TS6 20<pespin_>30 but the MS continues trying to use TS7? 18<fixeria18> all re-transmissions occur on TS6 AFAICS 20<pespin_>30 sorry the other way around, we expect to poll on TS7 but we assigned TS6 20<pespin_>30 fixeria, the issue is much before, during PKT Resource Request 20<pespin_>30 see gsmtap log before 43799 20<pespin_>30 in 43799 (PKt resource Request), we assign TS6 but we still expect to keep polling on TS7 20<pespin_>30 that's wrong I'd say
Updated by pespin over 3 years ago
<pespin_> fixeria, I have the feeling there's some kind of timing/scheduling issue. Pkt resource request fame 43731: as a response, Ul Assignment assigns UL TBF on TS6 (and later Dl TBF on TS6 and 7). Then MS sends UL data BSN=0..3 in TS6, and BSN=4 in TS7, exactly 4 FNs later <pespin_> but it was never assigned TS7 on uplink... <pespin_> so it looks as if at some point MS starts transmitting on TS7 instead of TS6.. pespin_>30 fixeria, I think either the MS is buggy or we have some issue in the scheduler when receiving
Updated by pespin over 3 years ago
- First, since the MS unexpectedly sent some data on TS7 instead of TS6, we didn't get all packets and hence we cannot ACK everything, that's why there's holes in the ACK/NACK bitmap.
- Probably the MS is somehow expecting to receive the ACK/NACK packets on PACCH at TS7 instead of TS6 (don't know why, but it probably does, same reason why it sends stuff on TS7 sometimes).
Updated by pespin over 3 years ago
I tried to reproduce this issue for a while with a Sony Ericsson K800i (no TEMS) here and I couldn't. I didn't notice the stall nor saw anything related in osmo-pcu.log nor on wireshark looking a rlcmac+gprs-ns+gtp, everything is working smoothly here.
Info about my phone "Current software version":
CXC1250210 R8BF003 080130 Sony Ericsson
Updated by fixeria over 3 years ago
I just faced this problem again (version 0.8.0.374-55ac). This UL TBF remains 'alive' forever:
OsmoPCU# show tbf all UL TBFs TBF: TFI=0 TLLI=0xdb01841c (valid) TA=0 DIR=UL IMSI=901703894347965 created=1612561855 state=00000086 [CCCH:0, PACCH:2] 1st_TS=6 1st_cTS=6 ctrl_TS=6 MS_CLASS=10/0 TS_alloc=6! TRX_ID=0 CS=CS-4 WS=64 V(Q)=9 V(R)=19 TBF Statistics: Data Blocks: CS1 : 0 (0/s 0/m 0/h 0/d) CS2 : 29046 (50/s 4337/m 27665/h 27665/d) CS3 : 0 (0/s 0/m 0/h 0/d) CS4 : 0 (0/s 0/m 0/h 0/d)
and the number of CS2 blocks keeps growing even despite the phone was switched off.
OsmoPCU# show ms all MS TLLI=db01841c, IMSI=901703894347965 Timing advance (TA): 0 Coding scheme uplink: CS-4 Coding scheme downlink: CS-4 Mode: GPRS MS class: 10 EGPRS MS class: 0 PACCH: 6 LLC queue length: 0 LLC queue octets: 0 RSSI: -60 dBm Bit error rate: 22 % Link quality: 8 dB Burst timing offset: 0/4 bit Downlink NACK rate: 0 % MS RX quality: 18 % MS C value: 63 dB MS SIGN variance: 0 dB MS I level (slot 0): 0 dB MS I level (slot 1): 0 dB MS I level (slot 2): 0 dB MS I level (slot 3): 0 dB MS I level (slot 4): 0 dB MS I level (slot 5): 0 dB MS I level (slot 6): 0 dB MS I level (slot 7): 0 dB Uplink TBF: TFI=0, state=FLOW MS Statistics: Amount of DL CTRL messages scheduled: 24503 (35/s 3307/m 23229/h 19143/d)
This happened [again] during the installation of Opera Mini (first run, license agreement, etc).
Updated by fixeria over 3 years ago
I just faced this problem again (version 0.8.0.374-55ac). This UL TBF remains 'alive' forever:
I was planning to attach with gdb and examine the TBF structure closer, but it has gone :/
and the number of CS2 blocks keeps growing even despite the phone was switched off.
Apparently, switching off took longer than usually. Will keep it on next time I face it.
Info about my phone "Current software version":
CXC1250210 R8BF003 080130 Sony Ericsson
This time I am facing the problem with a 'normal' (not TEMS) K800i:
CXC1250210 R8BF003 080130 - exactly the same fw version.
Updated by pespin about 3 years ago
- Status changed from New to Feedback
- Assignee set to fixeria
Hi fixeria , assigning this ticket back to you to see if you can reproduce it again using osmo-pcu master (I was unable to reproduce it at the time). Since a lot of scheduler/polling/lifecycle code has been improved, it could well be that the issue is fixed or at least mitigated.
Updated by fixeria over 2 years ago
- Status changed from Feedback to Stalled
I will try to debug this further with the help of my spare time project:
https://github.com/axilirator/sedbgmux
and come back with more information. Working on more urgent projects ATM, so setting to Stalled.