Project

General

Profile

Actions

Bug #1524

open

PACCH on the wrong timeslot

Added by zecke about 8 years ago. Updated over 2 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
Target version:
-
Start date:
02/22/2016
Due date:
% Done:

0%

Spec Reference:

Description

The PACCH is probably on the wrong timeslot and both the E71 and the iPhone5c do not care about it but the Acer Z200 with a Mediatek chipset runs into frequent timeouts on GPRS/EGPRS downloads. This phone is frequently used by a user of osmo-pcu.

Jacob has prepared but not finished some work to have the PACCH on the "right" (it can change due new uplink assignments) timeslot. The Mediatek based system is likely to easily reproduce the issue.

The symptom is frequently seeing POLL timeouts. This means that the PCU expected an answer from the phone but it never occurred. The first step should be to reproduce the issue.


Files

osmopcu-1.log osmopcu-1.log 8.22 KB msuraev, 02/24/2016 03:44 PM
osmopcu-2.log osmopcu-2.log 101 KB msuraev, 02/24/2016 05:00 PM
pcu.pcapng.gz pcu.pcapng.gz 1.27 MB msuraev, 02/25/2016 10:32 AM
debug.log debug.log 2.35 MB msuraev, 02/25/2016 10:32 AM
osmopcu-3.log osmopcu-3.log 8.94 KB Log from mediatek MT6580M based MS matt9j, 07/10/2018 05:02 AM

Related issues

Related to OsmoBTS - Bug #1795: osmo-bts-trx: fails to assign second lchan on TCH/H TSClosedfixeria08/09/2016

Actions
Related to OsmoBTS - Feature #1648: Verify Multi-TRX support for osmo-bts-trxClosedmsuraev03/11/2016

Actions
Related to OsmoPCU - Feature #1526: Acquire/update timing advance (TA)Stalledfixeria02/22/2016

Actions
Related to OsmoPCU - Feature #2709: use osmo_fsm for TBFResolvedpespin12/05/2017

Actions
Related to OsmoPCU - Bug #1532: Increased number of poll timeouts on shared PDCHsNewpespin02/22/2016

Actions
Related to OsmoPCU - Bug #1759: Wrong calculation of DL window size for DL assignmentStalled06/28/2016

Actions
Related to OsmoPCU - Feature #3014: fix re-apply patches reverted by #3013, related: UL and DL packet assignment, and Timing AdvanceResolvedmsuraev02/27/2018

Actions
Actions #1

Updated by zecke about 8 years ago

  • Assignee set to msuraev
Actions #2

Updated by msuraev about 8 years ago

Testing results of jerlbeck/wip/pacch-handling branch:

  • iphone: couple of errors seen after surfing for few minutes, surfing itself is not affected
  • acer: a lot of errors seen, surfing fails until phone is pinged from host, after that surfing is possible although there are still plenty of errors

Errors looks as follows:
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:485 TBF poll timeout for FN=1539512, TS=4 (curr FN 1539525)
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:557 - Timeout for polling PACKET DOWNLINK ACK.
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:902 - Assignment was on PACCH
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:908 - Downlink ACK was received
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:485 TBF poll timeout for FN=1540010, TS=4 (curr FN 1540023)
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:557 - Timeout for polling PACKET DOWNLINK ACK.
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:902 - Assignment was on PACCH
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:908 - Downlink ACK was received
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:485 TBF poll timeout for FN=1540534, TS=4 (curr FN 1540547)
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:557 - Timeout for polling PACKET DOWNLINK ACK.
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:902 - Assignment was on PACCH
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:908 - Downlink ACK was received

The most common is "DOWNLINK ACK" but sometimes others appear as well: "CONTROL ACK for PACKET UPLINK ASSIGNMENT" for example.

Actions #3

Updated by zecke about 8 years ago

  • Status changed from New to In Progress
Actions #4

Updated by msuraev about 8 years ago

Testing master branch gave the same results.

While testing with telenor modem I was unable to reproduce issue with either branch.

Actions #5

Updated by zecke about 8 years ago

On 23 Feb 2016, at 15:42, ms [REDMINE] <> wrote:

Testing master branch gave the same results.

While testing with gxdm/telenor modem I was unable to reproduce issue with either branch.

We know that qualcomm firmware is quite forgiven. Let's stick with the Mediatek/Acerphone for the tests. Mattias mentioned a Sony Ericsson W995 as picky as well.

Actions #6

Updated by msuraev about 8 years ago

Attached is the log with extra debug info obtained from osmopcu with acer phone trying to connect.

Actions #7

Updated by msuraev about 8 years ago

Bigger and cleaner log for reference.

Actions #8

Updated by msuraev about 8 years ago

Attaching latest logs.

Actions #9

Updated by msuraev about 8 years ago

Curiously, when changing timeslot configuration in osmo-nitb.cfg from:
timeslot 0
phys_chan_config CCCH+SDCCH4
hopping enabled 0
timeslot 1
phys_chan_config SDCCH8
hopping enabled 0
timeslot 2
phys_chan_config PDCH
hopping enabled 0
timeslot 3
phys_chan_config PDCH
hopping enabled 0
timeslot 4
phys_chan_config PDCH
hopping enabled 0
timeslot 5
phys_chan_config PDCH
hopping enabled 0
timeslot 6
phys_chan_config PDCH
hopping enabled 0
timeslot 7
phys_chan_config TCH/H
hopping enabled 0
to:
timeslot 0
phys_chan_config CCCH+SDCCH4
hopping enabled 0
timeslot 1
phys_chan_config SDCCH8
hopping enabled 0
timeslot 2
phys_chan_config PDCH
hopping enabled 0
timeslot 3
phys_chan_config TCH/F
hopping enabled 0
timeslot 4
phys_chan_config PDCH
hopping enabled 0
timeslot 5
phys_chan_config TCH/F
hopping enabled 0
timeslot 6
phys_chan_config PDCH
hopping enabled 0
timeslot 7
phys_chan_config TCH/H
hopping enabled 0

Have not done any extensive testing over different timeslot configurations so could be just a single outlier.

Actions #10

Updated by msuraev about 8 years ago

The config change above makes acer phone work albeit slow.

Actions #11

Updated by laforge almost 8 years ago

  • Status changed from In Progress to New
Actions #12

Updated by msuraev about 7 years ago

  • Related to Bug #1795: osmo-bts-trx: fails to assign second lchan on TCH/H TS added
Actions #13

Updated by laforge about 7 years ago

  • Priority changed from Normal to High
Actions #14

Updated by msuraev about 7 years ago

  • Status changed from New to Stalled
Actions #15

Updated by laforge about 7 years ago

Actions #16

Updated by msuraev almost 7 years ago

Tested with dynamic TS - same results as with static config.

Actions #17

Updated by msuraev almost 7 years ago

  • Related to Feature #1648: Verify Multi-TRX support for osmo-bts-trx added
Actions #18

Updated by msuraev almost 7 years ago

  • Related to Feature #1526: Acquire/update timing advance (TA) added
Actions #19

Updated by msuraev almost 7 years ago

Gerrit 2859, 2654, 2673 are merged. This does not fix the issue but should make debugging it easier.

Actions #20

Updated by msuraev over 6 years ago

Actions #21

Updated by msuraev about 6 years ago

  • Related to Bug #1532: Increased number of poll timeouts on shared PDCHs added
Actions #22

Updated by msuraev about 6 years ago

  • Related to Bug #1759: Wrong calculation of DL window size for DL assignment added
Actions #23

Updated by laforge about 6 years ago

  • Assignee changed from msuraev to 4368
Actions #24

Updated by laforge almost 6 years ago

  • Assignee changed from 4368 to lynxis
Actions #25

Updated by matt9j over 5 years ago

I am able to reproduce this issue reliably with two MS's in my possession, a Lenovo A319 and a BLU Dash 3.5(Mediatek MT6572M). I see similar behavior between my SDR setup osmo-pcu(0.5.0.9.3df1), osmo-bts-trx(0.8.1.20.0257), and osmo-trx-uhd(4.9.2; Boost_105500; UHD_003.009.005) and an embedded setup based on a Nuran Litecell 1.5 osmo-pcu(0.2.0.936-85cf), and osmo-bts-lc15(0.4.0.566-eb5b7). lynxis please let me know if there is any prototype testing or debug I can do to aid your development.

Actions #26

Updated by matt9j over 5 years ago

I was also able to test with a newer mediatek phone, and the issue still persisted. Log attached.

Phone: Gionee P5-Mini
Released: April 2016
Chipset: Mediatek MT6580M

Actions #27

Updated by keith over 5 years ago

I appear to be seeing this with Motorola KRZR K3

https://osmocom.org/issues/3472

Actions #28

Updated by laforge over 5 years ago

On Sat, Aug 18, 2018 at 02:12:30PM +0000, keith [REDMINE] wrote:

I appear to be seeing this with Motorola KRZR K3

Side note: It seems those phones are unfortunately not available on eBay or antywhere,
at least not that I could fine. This is sad, as I'd like to add it to our collection
of "phones with known issues" collection :/

Actions #29

Updated by ipse over 5 years ago

I wonder if some of these issues could instead be related to the broken Uplink ACK/NACK bitmaps which we discovered recently. We'll push a workaround (not a real fix yet, unfortunately) and I would appreciate if people with "broken" phones retest.

Actions #30

Updated by keith over 5 years ago

On phones. I have two K3s, so can and will test anything, although that is not the same as having one on the desk to probe.. I could leave one at sysmocom sometime.

There are quite a number of models of phones mentioned in this ticket. As zecke originally says, I've noticed the iPhone5c very stable on this, always sends (doesn't exhibit #2455), always receives. I have a Huawei U8350 about which I can say the same.

Not so with Asus Zenfone2 and HTC desire 628 - Both give very frustrating UX, with both phone and network initiated data transfer when idle unstable and difficult to reproduce. They keep resetting the pdp-context too.

Yesterday I briefly searched internetz for a database of what phones have what basebands and other relevant things but other than some sparse info on sites like XDA-developers I did not find such a thing. I wonder does such a thing exist? If we knew, then laforge maybe we can find another phone with same/similar characteristics. There were really a LOT of Motorola RAZR types in the series. The KRZR K3 is hard to find for some reason, but the K1 is plentiful. I wonder how different they are under the skin?

I seem to remember a mention someplace sometime of making our own database of GPRS status with phone models..

On broken ACK/NACK bitmaps and patches ipse I searched all tickets for "bitmaps", I only found #1624. Can you make a ticket about what you discovered + Please ping me when the workaround is there to test.

Actions #31

Updated by matt9j over 5 years ago

ipse I would be happy to test when code is available. Please update this ticket with the desired tests and workaround branch or direct message me and I can run the tests you need and upload results here.

laforge and keith we have a ton of low-end phones that exhibit this problem here. I can try and send you some if it would be helpful to add to the Sysmocom testbench. The shipping cost from the US would probably be significantly greater than the new market value of the phones themselves though. If you pick up any cheap handset with a Mediatek chipset from our experiences so far it will likely exhibit this problem. https://www.gsmarena.com/ doesn't have a searchable database by chipset, but does have chipset information available for most recently released handsets and searchable by phone model. I just tried to look up the K3 (from 2007) and it unfortunately doesn't seem to have any chipset info.

Actions #32

Updated by keith over 5 years ago

I was thinking about the title of this ticket..

Not saying that this is not an issue, but a couple of things to occurred to me:

1) I was experiencing what looks like this problem (exactly as described in comment https://osmocom.org/issues/1524#note-2) when I only have ONE PDCH configured, so how could the problem be the wrong TS? [ EDIT: I'm not 100% sure that I was seeing "Downlink ACK was received" ]

2) I discovered yesterday that since http://git.osmocom.org/osmo-pcu/commit/?id=9bbe1600cc02e1b538380393edb1dcdabe9247a2 osmo-pcu is sending an invalid Timing Advance of 220 when we page to setup a TBF DL. This is because we don't know TA if the previous TBF UL is gone. The correct procedure to ascertain TA in this case needs to be implemented. This totally makes sense and explains why I always saw bursts on the spectrum - the phone responding to the paging, but the BTS/PCU not receiving because of the erroneous TA.

I would suppose then that some phones see the Invalid TA of 220 and react with:
"oh that's invalid, I'm going to use 0" and given that we are probably testing within 550m it works.

and others say "oh that's invalid I'm going to use 63" and as we are not 35km away it doesn't work. :-/

BTW, I read this document which I found highly useful. It's rather a critical analysis of GPRS design rather that a description of how it works, but one can glean a lot of info from reading it, much faster than reading specs. (The intro is in German + English, the rest of the doc is english, so don't look at the first page and shy away, non german speakers)
https://publik.tuwien.ac.at/files/PubDat_112092.pdf

[ 2nd EDIT: I think this comment should be on #1526 and I have not actually seen anything I can be sure relates to this ticket. ]

Actions #33

Updated by keith over 5 years ago

Actions #34

Updated by keith over 5 years ago

Sorry to be so messy with my comments.. I'm overloading my brain a bit with a personal crash-course on the PCU. :-/
I appear to be seeing something like this now if I use
egprs only
in the pcu config.

Actions #35

Updated by laforge over 5 years ago

I purchased a KRZR 3 for the sysmocom lab. Not sure when somebody will be able to try to reproduce, but at least we should be able to do now.

Actions #36

Updated by laforge over 5 years ago

Actions #37

Updated by laforge over 5 years ago

  • Assignee changed from lynxis to msuraev
Actions #38

Updated by msuraev almost 5 years ago

  • Related to Feature #3014: fix re-apply patches reverted by #3013, related: UL and DL packet assignment, and Timing Advance added
Actions #39

Updated by laforge almost 5 years ago

  • Assignee changed from msuraev to lynxis
Actions #40

Updated by laforge over 4 years ago

  • Priority changed from High to Normal
Actions #41

Updated by laforge about 4 years ago

  • Assignee deleted (lynxis)
Actions #42

Updated by laforge about 4 years ago

  • Assignee set to daniel
Actions #43

Updated by daniel about 4 years ago

  • Status changed from Stalled to In Progress
Actions #44

Updated by laforge almost 4 years ago

  • Status changed from In Progress to Stalled
  • Assignee changed from daniel to pespin

moving PCU related tickets to pespin

Actions #45

Updated by pespin over 2 years ago

  • Status changed from Stalled to Feedback
  • Assignee changed from pespin to laforge

A lot of stuff has happened in osmo-pcu since the comments 3-4 years ago. IMHO it's nowadays a lot easier to debug stuff with the current FSMs.

laforge is there still a KRZR 3 in the sysmocom office still today? May it be worth posting it to me to give it a try? or better someone at the office give it a test with osmocom master and see if there's still visible issues and report a pcap with gsmtap + gsmtap_log to me.

Actions #46

Updated by laforge over 2 years ago

  • Assignee changed from laforge to roh

assigning to roh to provide the related test / captures / logs.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)