Project

General

Profile

Bug #1524

PACCH on the wrong timeslot

Added by zecke over 2 years ago. Updated 14 days ago.

Status:
Stalled
Priority:
High
Assignee:
Target version:
-
Start date:
02/22/2016
Due date:
% Done:

0%

Spec Reference:

Description

The PACCH is probably on the wrong timeslot and both the E71 and the iPhone5c do not care about it but the Acer Z200 with a Mediatek chipset runs into frequent timeouts on GPRS/EGPRS downloads. This phone is frequently used by a user of osmo-pcu.

Jacob has prepared but not finished some work to have the PACCH on the "right" (it can change due new uplink assignments) timeslot. The Mediatek based system is likely to easily reproduce the issue.

The symptom is frequently seeing POLL timeouts. This means that the PCU expected an answer from the phone but it never occurred. The first step should be to reproduce the issue.

osmopcu-1.log osmopcu-1.log 8.22 KB msuraev, 02/24/2016 03:44 PM
osmopcu-2.log osmopcu-2.log 101 KB msuraev, 02/24/2016 05:00 PM
pcu.pcapng.gz pcu.pcapng.gz 1.27 MB msuraev, 02/25/2016 10:32 AM
debug.log debug.log 2.35 MB msuraev, 02/25/2016 10:32 AM
osmopcu-3.log osmopcu-3.log 8.94 KB Log from mediatek MT6580M based MS matt9j, 07/10/2018 05:02 AM

Related issues

Related to OsmoBTS - Bug #1795: osmo-bts-trx: fails to assign second lchan on TCH/H TSClosed2016-08-09

Related to OsmoBTS - Feature #1648: Verify Multi-TRX support for osmo-bts-trxClosed2016-03-11

Related to OsmoPCU - Feature #1526: Acquire/update timing advance (TA)Stalled2016-02-22

Related to OsmoPCU - Feature #2709: use osmo_fsm for TBFNew2017-12-05

Related to OsmoPCU - Bug #1532: Increased number of poll timeouts on shared PDCHsNew2016-02-22

Related to OsmoPCU - Bug #1759: Wrong calculation of DL window size for DL assignmentStalled2016-06-28

History

#1 Updated by zecke over 2 years ago

  • Assignee set to msuraev

#2 Updated by msuraev over 2 years ago

Testing results of jerlbeck/wip/pacch-handling branch:

  • iphone: couple of errors seen after surfing for few minutes, surfing itself is not affected
  • acer: a lot of errors seen, surfing fails until phone is pinged from host, after that surfing is possible although there are still plenty of errors

Errors looks as follows:
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:485 TBF poll timeout for FN=1539512, TS=4 (curr FN 1539525)
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:557 - Timeout for polling PACKET DOWNLINK ACK.
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:902 - Assignment was on PACCH
Mon Feb 22 15:41:29 2016 <0002> tbf.cpp:908 - Downlink ACK was received
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:485 TBF poll timeout for FN=1540010, TS=4 (curr FN 1540023)
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:557 - Timeout for polling PACKET DOWNLINK ACK.
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:902 - Assignment was on PACCH
Mon Feb 22 15:41:31 2016 <0002> tbf.cpp:908 - Downlink ACK was received
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:485 TBF poll timeout for FN=1540534, TS=4 (curr FN 1540547)
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:557 - Timeout for polling PACKET DOWNLINK ACK.
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:902 - Assignment was on PACCH
Mon Feb 22 15:41:34 2016 <0002> tbf.cpp:908 - Downlink ACK was received

The most common is "DOWNLINK ACK" but sometimes others appear as well: "CONTROL ACK for PACKET UPLINK ASSIGNMENT" for example.

#3 Updated by zecke over 2 years ago

  • Status changed from New to In Progress

#4 Updated by msuraev over 2 years ago

Testing master branch gave the same results.

While testing with telenor modem I was unable to reproduce issue with either branch.

#5 Updated by zecke over 2 years ago

On 23 Feb 2016, at 15:42, ms [REDMINE] <> wrote:

Testing master branch gave the same results.

While testing with gxdm/telenor modem I was unable to reproduce issue with either branch.

We know that qualcomm firmware is quite forgiven. Let's stick with the Mediatek/Acerphone for the tests. Mattias mentioned a Sony Ericsson W995 as picky as well.

#6 Updated by msuraev over 2 years ago

Attached is the log with extra debug info obtained from osmopcu with acer phone trying to connect.

#7 Updated by msuraev over 2 years ago

Bigger and cleaner log for reference.

#8 Updated by msuraev over 2 years ago

Attaching latest logs.

#9 Updated by msuraev over 2 years ago

Curiously, when changing timeslot configuration in osmo-nitb.cfg from:
timeslot 0
phys_chan_config CCCH+SDCCH4
hopping enabled 0
timeslot 1
phys_chan_config SDCCH8
hopping enabled 0
timeslot 2
phys_chan_config PDCH
hopping enabled 0
timeslot 3
phys_chan_config PDCH
hopping enabled 0
timeslot 4
phys_chan_config PDCH
hopping enabled 0
timeslot 5
phys_chan_config PDCH
hopping enabled 0
timeslot 6
phys_chan_config PDCH
hopping enabled 0
timeslot 7
phys_chan_config TCH/H
hopping enabled 0
to:
timeslot 0
phys_chan_config CCCH+SDCCH4
hopping enabled 0
timeslot 1
phys_chan_config SDCCH8
hopping enabled 0
timeslot 2
phys_chan_config PDCH
hopping enabled 0
timeslot 3
phys_chan_config TCH/F
hopping enabled 0
timeslot 4
phys_chan_config PDCH
hopping enabled 0
timeslot 5
phys_chan_config TCH/F
hopping enabled 0
timeslot 6
phys_chan_config PDCH
hopping enabled 0
timeslot 7
phys_chan_config TCH/H
hopping enabled 0

Have not done any extensive testing over different timeslot configurations so could be just a single outlier.

#10 Updated by msuraev over 2 years ago

The config change above makes acer phone work albeit slow.

#11 Updated by laforge over 2 years ago

  • Status changed from In Progress to New

#12 Updated by msuraev almost 2 years ago

  • Related to Bug #1795: osmo-bts-trx: fails to assign second lchan on TCH/H TS added

#13 Updated by laforge almost 2 years ago

  • Priority changed from Normal to High

#14 Updated by msuraev over 1 year ago

  • Status changed from New to Stalled

#15 Updated by laforge over 1 year ago

#16 Updated by msuraev over 1 year ago

Tested with dynamic TS - same results as with static config.

#17 Updated by msuraev over 1 year ago

  • Related to Feature #1648: Verify Multi-TRX support for osmo-bts-trx added

#18 Updated by msuraev over 1 year ago

  • Related to Feature #1526: Acquire/update timing advance (TA) added

#19 Updated by msuraev over 1 year ago

Gerrit 2859, 2654, 2673 are merged. This does not fix the issue but should make debugging it easier.

#20 Updated by msuraev 11 months ago

#21 Updated by msuraev 9 months ago

  • Related to Bug #1532: Increased number of poll timeouts on shared PDCHs added

#22 Updated by msuraev 8 months ago

  • Related to Bug #1759: Wrong calculation of DL window size for DL assignment added

#23 Updated by laforge 8 months ago

  • Assignee changed from msuraev to sysmocom

#24 Updated by laforge 6 months ago

  • Assignee changed from sysmocom to lynxis

#25 Updated by matt9j 4 months ago

I am able to reproduce this issue reliably with two MS's in my possession, a Lenovo A319 and a BLU Dash 3.5(Mediatek MT6572M). I see similar behavior between my SDR setup osmo-pcu(0.5.0.9.3df1), osmo-bts-trx(0.8.1.20.0257), and osmo-trx-uhd(4.9.2; Boost_105500; UHD_003.009.005) and an embedded setup based on a Nuran Litecell 1.5 osmo-pcu(0.2.0.936-85cf), and osmo-bts-lc15(0.4.0.566-eb5b7). lynxis please let me know if there is any prototype testing or debug I can do to aid your development.

#26 Updated by matt9j 3 months ago

I was also able to test with a newer mediatek phone, and the issue still persisted. Log attached.

Phone: Gionee P5-Mini
Released: April 2016
Chipset: Mediatek MT6580M

#27 Updated by keith about 2 months ago

I appear to be seeing this with Motorola KRZR K3

https://osmocom.org/issues/3472

#28 Updated by laforge about 2 months ago

On Sat, Aug 18, 2018 at 02:12:30PM +0000, keith [REDMINE] wrote:

I appear to be seeing this with Motorola KRZR K3

Side note: It seems those phones are unfortunately not available on eBay or antywhere,
at least not that I could fine. This is sad, as I'd like to add it to our collection
of "phones with known issues" collection :/

#29 Updated by ipse about 2 months ago

I wonder if some of these issues could instead be related to the broken Uplink ACK/NACK bitmaps which we discovered recently. We'll push a workaround (not a real fix yet, unfortunately) and I would appreciate if people with "broken" phones retest.

#30 Updated by keith about 2 months ago

On phones. I have two K3s, so can and will test anything, although that is not the same as having one on the desk to probe.. I could leave one at sysmocom sometime.

There are quite a number of models of phones mentioned in this ticket. As zecke originally says, I've noticed the iPhone5c very stable on this, always sends (doesn't exhibit #2455), always receives. I have a Huawei U8350 about which I can say the same.

Not so with Asus Zenfone2 and HTC desire 628 - Both give very frustrating UX, with both phone and network initiated data transfer when idle unstable and difficult to reproduce. They keep resetting the pdp-context too.

Yesterday I briefly searched internetz for a database of what phones have what basebands and other relevant things but other than some sparse info on sites like XDA-developers I did not find such a thing. I wonder does such a thing exist? If we knew, then laforge maybe we can find another phone with same/similar characteristics. There were really a LOT of Motorola RAZR types in the series. The KRZR K3 is hard to find for some reason, but the K1 is plentiful. I wonder how different they are under the skin?

I seem to remember a mention someplace sometime of making our own database of GPRS status with phone models..

On broken ACK/NACK bitmaps and patches ipse I searched all tickets for "bitmaps", I only found #1624. Can you make a ticket about what you discovered + Please ping me when the workaround is there to test.

#31 Updated by matt9j about 2 months ago

ipse I would be happy to test when code is available. Please update this ticket with the desired tests and workaround branch or direct message me and I can run the tests you need and upload results here.

laforge and keith we have a ton of low-end phones that exhibit this problem here. I can try and send you some if it would be helpful to add to the Sysmocom testbench. The shipping cost from the US would probably be significantly greater than the new market value of the phones themselves though. If you pick up any cheap handset with a Mediatek chipset from our experiences so far it will likely exhibit this problem. https://www.gsmarena.com/ doesn't have a searchable database by chipset, but does have chipset information available for most recently released handsets and searchable by phone model. I just tried to look up the K3 (from 2007) and it unfortunately doesn't seem to have any chipset info.

#32 Updated by keith about 2 months ago

I was thinking about the title of this ticket..

Not saying that this is not an issue, but a couple of things to occurred to me:

1) I was experiencing what looks like this problem (exactly as described in comment https://osmocom.org/issues/1524#note-2) when I only have ONE PDCH configured, so how could the problem be the wrong TS? [ EDIT: I'm not 100% sure that I was seeing "Downlink ACK was received" ]

2) I discovered yesterday that since http://git.osmocom.org/osmo-pcu/commit/?id=9bbe1600cc02e1b538380393edb1dcdabe9247a2 osmo-pcu is sending an invalid Timing Advance of 220 when we page to setup a TBF DL. This is because we don't know TA if the previous TBF UL is gone. The correct procedure to ascertain TA in this case needs to be implemented. This totally makes sense and explains why I always saw bursts on the spectrum - the phone responding to the paging, but the BTS/PCU not receiving because of the erroneous TA.

I would suppose then that some phones see the Invalid TA of 220 and react with:
"oh that's invalid, I'm going to use 0" and given that we are probably testing within 550m it works.

and others say "oh that's invalid I'm going to use 63" and as we are not 35km away it doesn't work. :-/

BTW, I read this document which I found highly useful. It's rather a critical analysis of GPRS design rather that a description of how it works, but one can glean a lot of info from reading it, much faster than reading specs. (The intro is in German + English, the rest of the doc is english, so don't look at the first page and shy away, non german speakers)
https://publik.tuwien.ac.at/files/PubDat_112092.pdf

[ 2nd EDIT: I think this comment should be on #1526 and I have not actually seen anything I can be sure relates to this ticket. ]

#33 Updated by keith about 2 months ago

#34 Updated by keith about 2 months ago

Sorry to be so messy with my comments.. I'm overloading my brain a bit with a personal crash-course on the PCU. :-/
I appear to be seeing something like this now if I use
egprs only
in the pcu config.

#35 Updated by laforge about 2 months ago

I purchased a KRZR 3 for the sysmocom lab. Not sure when somebody will be able to try to reproduce, but at least we should be able to do now.

#36 Updated by laforge 14 days ago

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)