Project

General

Profile

Actions

Bug #5944

open

DTXu + AMR on TCH/H appears to be not usable on osmo-bts-sysmo

Added by keith about 1 year ago. Updated about 1 year ago.

Status:
In Progress
Priority:
Low
Assignee:
Category:
osmo-bts-sysmo
Target version:
-
Start date:
03/11/2023
Due date:
% Done:

0%

Spec Reference:

Description

The problem is that conversation is uncomfortable bordering on impossible due to random lost words and cut off of initial speech after silence periods.

I've been experimenting sending the AMR RTP stream into decoders or SIP endpoints, (all using libopencore-amr) and also, without using any external MNCC, thereby only having the the audio stream pass through osmo-mgw before going to a B-leg MS. I observe the problem in all cases.

I've put quite some hours into this so far, without being able to achieve any improvements. Rather than leave it all fade from memory, I'll just make a quick note of the main points of what was observed:

It all seems fine as long as you have sequences of speech followed by SID_FIRST_P1, SID_FIRST_P2, then eventually an ONSET, followed by speech frames... rinse and repeat. This is fine.

When it goes wrong is if you have a situation where (because of the timing of the VAD) you get either a SID_FIRST_INH or a SID_UPDATE_INH, in these cases, what I've observed is that after the SID_UPDATE_INH, the L1 does not seem to be sending us any Speech frames, (just TCH/H with no payload) even though I am speaking. Eventually, there will be an ONSET followed by speech frames.

I may be wrong, but I don't think there's anything that osmo-bts is doing or can do about this, so I'm pretty convinced at this point that is is an L1 problem, and therefore will not (cannot?) be fixed. I'm aware that there is code in libosmocore to deal with all this weird interleaving and such that goes on in AMR DTX, but we don't use any of it in osmo-bts-sysmo.

The pcap comes from an osmo-bts with some modifications to logging and RTP marking, (ignore the BAD AMR frames following ONSET - that's a hack but not relevant) the main point would be to observe what is happening around and after packet no 1482 - Note all that PH-DATA.ind TCH/H with no payload. I am speaking then.


Files

filtered-see_SID_FIRST_P2_1482.pcapng filtered-see_SID_FIRST_P2_1482.pcapng 748 KB keith, 03/11/2023 05:09 AM
amr-sid.png View amr-sid.png 484 KB laforge, 03/27/2023 11:03 AM
sysmobts-amr-hr-dtx-working.pcap sysmobts-amr-hr-dtx-working.pcap 18.7 MB laforge, 03/27/2023 11:57 PM
sysmobts-amr-hr-dtx-some_dropouts_counting.pcap sysmobts-amr-hr-dtx-some_dropouts_counting.pcap 40.1 MB laforge, 03/27/2023 11:58 PM
iphone5.pcapng iphone5.pcapng 3.92 MB keith, 03/28/2023 01:51 AM
Actions #1

Updated by laforge about 1 year ago

Some Random Ideas (not saying Keith should do all of that):
  • look at the dsp trace log (Play with the flags) to see if there's anything useful in it
  • capture what's actually happening on the air interface using osmocomBB fixeria/burst_ind for detailed analysis.
Actions #2

Updated by fixeria about 1 year ago

  • Category set to osmo-bts-sysmo
Actions #3

Updated by falconia about 1 year ago

Question: was this problem seen at CCC? I seem to recall hearing that the GSM network at CCC was run with sysmoBTS units, and I definitely remember reading that they used the AMR codec exclusively, not supporting FR1 or EFR. Did they perhaps disable DTXu, or did the combo of sysmoBTS+AMR+DTXu actually work for other people?

Actions #4

Updated by keith about 1 year ago

I just checked the congress config.

DTXu, unsurprisingly, was not enabled.

Actions #5

Updated by keith about 1 year ago

laforge wrote in #note-1:

Some Random Ideas (not saying Keith should do all of that):
  • look at the dsp trace log (Play with the flags) to see if there's anything useful in it
  • capture what's actually happening on the air interface using osmocomBB fixeria/burst_ind for detailed analysis.

I did play around with the dsp trace flags, including piping that output into pcaps to correlate with gsmtap/osmo-bts-sysmo logging.
So far that did not reveal anything extra about what was happening.

So capturing the uplink off the air would reveal more. I'm not sure I have the hardware to do that. I haven't looked at it in a long time.

Assuming, for the moment, (giving everything I tried with hacking on osmo-bts) the problem is in the L1 and therefore might be a cannot_fix issue, sniffing the UL might be a nice investigation project for somebody with a lot of free time.

Actions #6

Updated by laforge about 1 year ago

AFAIR, there are several modes in which the L1 can be operated. We're using a mode where we get more or less the RTP payload, as that's rather convenient. So I wouldn't give up on not being able to fis this at all.

Getting air interface traces via osmocom-bb tnt/burst_ind is rather easy, I've just done it very recently for getting CSD call traces. Still not saying that you (Keith) should do that.

Actions #7

Updated by laforge about 1 year ago

Really a generic AMR or only a TCH/H only problem?

First of all, can you please clarify whether the problem only appers on TCH/H?

So far, this bug subject or title doesn't constrain it to any channel type, so it clams all AMR is affected when DTXu is in place.

However, the SID_*_INH types are specific to TCH/H. So if there's a problem related to those, it should not exist on TCH/F. Please clarify.

Suspicious treatment of AMR seq during working DTX with no inhibit

Looking at a snippet of your pcap file:

  • we can see from packet 514 onwards, that every PH-DATA.ind generates one AMR/RTP frame.
    • FN is +4 / +5, as expected, equalling 18 .. 22.5ms interval
    • sequence number monotonically increasing
    • timestamps incrementing 160 ticks (= 20ms)
  • packet 526..528 (FN=256407) show the SID_FIRST_P1 frame
    • AMR: seq + timestamp to previous RTP frame as expected (+1 / +160)
  • packet 529..530 (FN=256412, +5) show the SID_FIRST_P2 frame
    • no AMR is generated
  • packet 531 (FN=256416, +4)
    • empty payload reported despite excellent RSSI and quality
    • no AMR is generated
  • packet 532..534 (FN=256420, +4) shows a SID frame
    • AMR seq +1 / timestamp + 480 (= 60ms). Matches the ~59 ms for FN difference of 13
  • packet 535 (FN=256416, +4)
    • empty payload reported despite excellent RSSI and quality
  • packet 536..540
    • empty payload reported, RSSI -110 dBm, so the MS really didn't transmit
  • packet 541
    • empty payload reported despite excellent RSSI and quality
  • packet 542..544 (FN=256455) shows a SID frame
    • AMR seq + 4, timestamp + 1280 (= 160ms). Matches the ~158ms for FN difference of 35
    • why is AMR seq +4? There were 7 blocks in between (FN 256425, 256429, 256433, 256438, 256442, 256446, 256451) which we didn't generate any AMR for, i.e. which were dropped.
Actions #8

Updated by laforge about 1 year ago

I really cannot explain some parts of the trace. I cannot reconcile how our code would behave that way. Do you have some local modifications to the AMR DTX logic in place?

Fore example, packet 828..830 seem to indicate that an ONSET received from the PHY would generate a RTP packet. Based on the osmo-bts code in master, I fail to see how that could ever happen. The ONSET would not set any rmsg, and hence add_l1sap_header() should never be called.

Actions #10

Updated by laforge about 1 year ago

Another oddity:

1481    0    22:52:02.595433279    GSMTAP        258366/194/04/00/10 Rx PH-DATA.ind TCH/H (hL2 000200bb): 09 04 00 50 44 00 00 00 00 0a , Meas: RSSI -43.97 dBm, Qual 20.41 dB, BER 0.00, Timing -3 
1482    0    22:52:02.595491257    GSMTAP        258366/194/04/00/10 DTX: received SID_FIRST_P1 from L1 (9 bytes) 
1483    0    22:52:02.595612741    AMR        PT=AMR, SSRC=0x4178B092, Seq=32959, Time=2092022532
So what we see here is that
  • SID_FIRST_P1 is signaled as Amr_SidFirstP1 (instead of expected Amr)
  • 9 bytes of payload 09 04 00 50 44 00 00 00 00 0a
    • 09: Amr_SidFirstP1
    • 04: CMI
    • 00: CMR
    • 50 44 00 00 00 00 0a is actually the RTP payload the L1 sends us
  • our code ignores this and instead calls osmo_amr_rtp_enc(sid_first, 0, AMR_SID, AMR_GOOD) which appears to render @50 44 00 00 00 without the 00 0a suffix. This seems an artefact of first merging c3fb0dcc8cd01a84942d06267003478b972feadb and then c77b574d1fbe43ca19db0e5e041b3b5e2a71b856 by msuraev in 2016. While P2 indeed has no payload, P1 does!
    • Why is the RTP payload from L1 different than our osmo_amr_rtp_enc?
  • our transmitted RTP payload is (after RTP header) is 50 44 00 00 00
    • this is wrong. There is no CMR and no TOC bytes preceding. The total packet size should be 7 bytes: CMR (1) + TOC (1) + 39bits-SID+padbits (5).
Actions #11

Updated by keith about 1 year ago

laforge wrote in #note-8:

I really cannot explain some parts of the trace. I cannot reconcile how our code would behave that way. Do you have some local modifications to the AMR DTX logic in place?

Yes!, As I wrote in the description: "The pcap comes from an osmo-bts with some modifications to logging and RTP marking, (ignore the BAD AMR frames following ONSET"

I should have been more explicit, or include a link to the actual source used. I've since been hacking on that without making any commits so I don't actualy have whatever was used at the time, but as I said in the description also, I didn't think that this RTP packet you are seeing generated as a result of the ONSET is relevant to the issue.

Fore example, packet 828..830 seem to indicate that an ONSET received from the PHY would generate a RTP packet. Based on the osmo-bts code in master, I fail to see how that could ever happen. The ONSET would not set any rmsg, and hence add_l1sap_header() should never be called.

Correct, I was playing around with many things at the time, trying to understand and eliminate the cause of so much "RTP clock out of sync with lower layer"

What I was doing was counting from 0 to 10, along with the "LSB" of wall clock seconds. I try to maintain silence between the numbers and I'm listening to the B-leg and I note which numbers I did not hear. This helps to look at the trace and identify where I need to look. This one worked quite well, but I happened to be running code at the time that was sending this RTP packet you identify.

Now you might say, oh well this is invalid if you are sending RTP packets that osmo-bts master would never send. You'll have to believe me, it makes NO difference whatsoever.

At the time I made this pcap, I had already spent about 3 days!! (granted, of my rather slow process of analysis and understanding) looking at the osmo-bts code related to RX from L1 and generation of RTP. So at this point I am 99.9999% convinced that it has nothing to do with that, what I wanted to demonstrate here is that I am speaking, but the L1 is not sending any speech data. This is why I wrote "that's a hack but not relevant".

I'm sorry if this is confusing or has caused you to waste valuable debugging time, but I think I am quite explicit in the description that this is not osmo-bts master

Actions #12

Updated by laforge about 1 year ago

laforge wrote in #note-10:

  • this is wrong. There is no CMR and no TOC bytes preceding. The total packet size should be 7 bytes: CMR (1) + TOC (1) + 39bits-SID+padbits (5).

Untested fix in https://gerrit.osmocom.org/c/osmo-bts/+/32088

Actions #13

Updated by keith about 1 year ago

laforge wrote in #note-7:

Really a generic AMR or only a TCH/H only problem?

First of all, can you please clarify whether the problem only appers on TCH/H?

I have clarified, that the problem only exists on TCH/H.

Using TCH/F, I did various (listening) tests of MS -> MS calls and also MS -> Sip Endpoint (with libopencore-amrnb)

Audio is clear and complete.

Actions #14

Updated by keith about 1 year ago

  • Subject changed from DTXu + AMR appears to be not usable on osmo-bts-sysmo to DTXu + AMR on TCH/H appears to be not usable on osmo-bts-sysmo
Actions #15

Updated by laforge about 1 year ago

I've also done some tests here in a local setup:
  • osmo-bts-sysmo + osmo-pcu from current 201705-nightly on sysmoBTS 1002 @ 192.168.100.233
  • osmo-bsc (current mater, built from source) + osmo-mgw on my laptop (192.168.100.149)
  • osmo-msc,hlr,stp,... on 192.168.11.4
  • 2x MS (Motorola C1xx with Calypos chipset, stock firmware)

I'm attaching two pcap files containing GSMTAP from BTS+BSC, Abis, OML, RSL. My results were not very conclusive. The first attempt (sysmobts-amr-hr-dtx-working.pcap) sounded fine to me. The second attempt, I was counting up from 1 to 32 (or so), with long breaks between the words. About 5 of those words spoken on the caller side didn't make it to the called party. See sysmobts-amr-hr-dtx-some_dropouts_counting.pcap

One thing I noticed looking at the dsp trace:

TchAhs_DecodeSpeech() - MS CMI (2) is not in ACS [3,0,0,0], using init values (InitCodecMode = 0, rxCodec = 3)
TchAhs_DecodeSpeech() - MS CMI (2) is not in ACS [3,0,0,0], using init values (InitCodecMode = 0, rxCodec = 3)

occurred virtually at every word after a period of silence. So the MS is sending CMI=2 (5.9kbps) but the active codec set doesn't contain that? This looks rather odd and deserves further analysis.

Actions #16

Updated by laforge about 1 year ago

laforge wrote in #note-15:

occurred virtually at every word after a period of silence. So the MS is sending CMI=2 (5.9kbps) but the active codec set doesn't contain that? This looks rather odd and deserves further analysis.

The MultiRateConfig IE in the RSL CHAN ACT looks fine, as per wireshark:
  • ICMI: implicit
  • Start Mode: 0
  • 5.90k is the only mode part of the subset

The same is also sent to the MS in the RR ASSIGNMENT COMMAND. So if there's any miscommunication/misinterpretation, it appears to be between osmo-bts and the DSP.

Actions #17

Updated by keith about 1 year ago

Here's a pcap of an iPhone 5c, in general, this phone is working so much better than anything else I have tried, i very rarely have any lost or clipped words.

(quick hint: "!gsmtap_log.string == "\n" && !gsmtap_log.string contains "[RACH]____[ACCESS TYPE" is a good base filter to remove that noise.)

This is call (the b-leg is a sip endpoint), answer, then mute the MS microphone and then unmute, hangup.

It is interesting to note (gsmtap_log.string contains "DTX:") that there is not one single ONSET. The iPhone seems to send pure SID_FIRST_INH all the time. (actually I later saw other behaviour, but only after the mic was muted for quite some time)

The CMI not in ActiveCodecSet issue (gsmtap_log.string contains "not in ACS") gets even more curious here, as I also see:

MS CMI (3) is not in ACS [3,6,0,0], using init values (InitCodecMode = 0, rxCodec = 3) 

3 not in 3? wtf?

also, seems the L1 counts codecs from 1, we count from 0 (my CHAN ACT with Multirate config is in the pcap.

AND, to be CLEAR: This is what I was running: https://cgit.osmocom.org/osmo-bts/?h=keith%2Fdtx-hacking

+ two lines of extra silly logging from libosmo-abis when I was trying to get a handle on the code flow
+ I'm using l1fwd-proxy and osmo-bts-sysmo-remote on x86

Actions #18

Updated by keith about 1 year ago

laforge wrote in #note-15:

I was counting up from 1 to 32 (or so), with long breaks between the words. About 5 of those words spoken on the caller side didn't make it to the called party.

5 out of ~30, sounds similar to my experience.

Actions #19

Updated by laforge about 1 year ago

regarding the CMI not in ACS messages, I'm now dumping how we configure the DSP/PHY during MPH-ACTIVATE.req:

  • amrCmiPhase=1 (GsmL1_AmrCmiPhase_Odd)
  • amrInitCodec=0 (GsmL1_AmrCodecMode_1; first mode of the modeset)
  • amrActriveCodecSet={3,0,0,0} (GsmL1_AmrCodec_5_9, GsmL1_AmrCodec_Unset, GsmL1_AmrCodec_Unset, GsmL1_AmrCodec_Unset, GsmL1_AmrCodec_Unset)

This looks all good, and as expected. Not sure why we get those strange log messages from the DSP.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)