Project

General

Profile

Actions

Bug #5694

open

BERT testing

Added by laforge about 2 months ago. Updated about 1 month ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
10/01/2022
Due date:
% Done:

20%


Description

Soem initial tests with a PrTel93i doing BERT testing on a single B-channel indicates we have some problems to investigate.

This ticket serves as log of the various tests/attempts so far

  • PrTel93i against same PrTel93i doing internal call through Auerswald COMmander2 PBX
    • 0 errors in several 1min and 15min calls. So the PrTel and the wiring are good.
  • PrTel93i against same PrTel93i doing external call through same PBX but going out via icE1usb to OCTOI hub and calling back through the same path:
    • first 1min call was with 0 errors
    • first 15min call was with 6.62 E-2 BER
    • second 15min call was with 5.92 E-2 BER
    • no lost / reordered OCTOI frames observed during that time
  • PrTel93i against bchan_loop from capi-hacks (hfc-usb, mISDN, CAPI20) doing internal call through Auerswald COMmander2 PBX
    • 1min call with 4.25 E-1 BER
    • so clearly there are some problems in the misdn/capi integration

Files

hbsldfqjixg.jpg View hbsldfqjixg.jpg 1.76 MB manawyrm, 10/26/2022 12:33 PM
qlcdrxgnmvj.jpg View qlcdrxgnmvj.jpg 2.28 MB manawyrm, 10/26/2022 12:33 PM
Actions #1

Updated by laforge about 2 months ago

Some testing of PrTel93i via Auerswald PBX to DAHDI/freeswitch (retronetworking Event setup) and back has shown 15min with zero BER.

Actions #2

Updated by laforge about 2 months ago

  • % Done changed from 0 to 20

laforge wrote in #note-1:

Some testing of PrTel93i via Auerswald PBX to DAHDI/freeswitch (retronetworking Event setup) and back has shown 15min with zero BER.

This was confirmed several times yesterday and today. IT still worked after upgrading DAHDI from dahdi 3.1.0 to the dahdi-linux laforge/trunkdev. The only time I saw a bit error was when enabling/disabling RXMIRROR/TXMIRROR after the call had started. Starting captures before (and terminating after) the PrTel call has 0 BER.

traces of the BER pattern are at https://people.osmocom.org/laforge/retronetworking/prbs/ - we can see that there's a 0x5D pattern for synchronization before the PRBS sequence starts.

According to manawyrm the PRBS sequence matches the prbs11 we have in osmo-prbs/libosmocore.

Some ideas
  • implement a raw loop type channel in yate so we can do Prtel->yate and back to avoid testing the path twice
  • implement a BER test channel in yate so we can do the same BER testing without a PrTel

I've started to read up, hopefully understand and hack my first yate module implementing source/consumer/channel/driver classes. Put it aside for now as other bits were more important.

Actions #3

Updated by manawyrm about 2 months ago

I offer a PRBS11 BERT at 030 6502 8003.
It's just a 128M .alaw file with pre-generated PRBS sequences.
The file content was validated using an ARGUS 4 tester (which showed no bit errors over the first 15 min of the call).

Playback is done using Yate's internal wave player.

Then I've done some additional testing, such as removing the icE1usb completely from the signal path.
My setup looks a lot like the Nuremburg hub now. Uplink to OCTOI is now being done using trunkdev.
Clocking is still provided via a port on the TE820 from the icE1usb, but no data is passing through the icE1usb.

validated, 0 BER setups:
ARGUS 4 -> Auerswald -> TE820 -> Yate PBX -> TE820 -> Auerswald -> ARGUS 4 (different B-channel)
ARGUS 4 -> Auerswald -> TE820 -> Yate PRBS11 (also when using both B-channels)

non working, high BER setups:
ARGUS 4 -> Auerswald -> TE820 -> Yate PBX -> icE1usb -> OCTOI DIVF -> icE1usb -> Yate PRBS11
ARGUS 4 -> Auerswald -> TE820 -> Yate PBX -> trunkdev -> OCTOI DIVF -> trunkdev -> Yate PRBS11

untested setups:
ARGUS 4 -> Auerswald -> TE820 -> Yate PBX -> icE1usb -> OCTOI DIVF -> icE1usb -> Auerswald -> ARGUS 4 (something about the signalling gets messed up along the way, ARGUS tester doesn't recognize that the incoming call is itself and just rings like a phone)

what have we learned:
icE1usb seems to be innocent, a pure trunkdev setup also exhibits the bit errors.
I believe this testing restricts the possible problem points to just OCTOI/osmo-e1d itself.
This is a rather brave assumption, but we've already seen this behaviour before the move to Nuremburg, so I don't think the new HW or the added VM layer are problematic.
My 0 BER setup also runs inside a VM, so that should hopefully be fine.

I have added some rudimentary debugging (just dumping the raw data into a file) on both the input and output of the RIFO code.
No errors and differences were noticable (no reordering occured while this test was running).

My ISDN tester also beeps whenever an error is detected, which is useful for correlating bit errors to osmo-e1d logs.
It doesn't beep when reordering occurs. It does beep when packet loss occurs. This is exactly as expected and a good sign :)

I've also timed the delay between errors (which always seem to come in bundles) and couldn't find any noticable systematic timing. Seems to be somewhat random.

laforge As you're the author of the e1d/OCTOI code: Can you think of any other places in the e1d code where something like this could occur (which would affect both icE1 and trunkdev)?

Actions #4

Updated by laforge about 2 months ago

Thanks for your testing.

Please note that we (tnt and I) did quite a bit of testing with PRBS testing on all timeslots, without yate/ISDN on top. The code for that is in osmo-e1d/e1-prbs-test. As far as I remember there were no PRBS issues discovered in either real E1 (between 2x icE1usb, or between TE802 and icE1usb) nor over OCTOI. At least not at the time. I guess the best way to start is from the bottom up by re-doing those tests in a variety of configurations. If those tests still work while your ISDN-based single-TS tests fail, it must be something about the yate/dahdi integration.

Actions #5

Updated by laforge about 2 months ago

Maybe it would be worth to

  • add stats exporter code to e1-prbs-check (so we get prbs related info into grafana)
  • set up a "prbs" user on divf
  • continuously run e1-prbs-test on the associated trunkdev

This way anyone could connect as prbs user and could get PRBS results for all TS in the same timeline as packet re-order/loss statistics.

Actions #6

Updated by laforge about 1 month ago

manawyrm what your testing didn't rule out is some other problem with DIVF specifically, such as the interrupt misses likely caused by virtualization. All your tests involving osmo-e1d/octoi-protocol also involved the DIVF instance.

Actions #7

Updated by manawyrm about 1 month ago

laforge wrote in #note-6:

manawyrm what your testing didn't rule out is some other problem with DIVF specifically, such as the interrupt misses likely caused by virtualization. All your tests involving osmo-e1d/octoi-protocol also involved the DIVF instance.

Indeed. I'm aware and I want to build a simpler (local) setup next and try to build a DIVF clone and see if the trouble still occurs.
I do have a similar AMD Epyc machine available, I do have the TE820 (well, only 1), but I think I might be able to get something to happen.

Currently away from my setup for another week again...

Actions #8

Updated by laforge about 1 month ago

  • Status changed from New to In Progress

I'd really like to have some kind of device or program or setup containing of any combination of software and hardware where we can do automatic (scripted) ISDN B-channel BERT. Doing this manually on a PrTel or Argus just doesn't scale.

some months ago, I was trying to write some "simple" testing code using libcapi20 (see https://gitea.osmocom.org/retronetworking/capi-hacks). It contains a simple "B channel loop" mode where all data received on a b-channel is echoed back to it. I was trying to use this with BERT testing, but unfortuantely it created significant (E-2/E-3) BER rates via a hfc-usb / mISDN / capi20 path.

Today I tried the same using the rcapi protocol of a Bintec_X1200. Unfortuantely with the sam results: Very high (E-2/E-3) BER is reported. I initially thought this might be due to the lack of TCP_NODELAY in the rcapi module of libcapi20, but even when adding that setsockopt call, I still had the high BER.

When making a voice call and speaking, the BER is not audible.

Actions #9

Updated by manawyrm about 1 month ago

OK, I did some more BERT testing.

Little bit different setup, this time with the HFC-S PCI OCTOI setup (https://osmocom.org/projects/octoi/wiki/Trunkdev-S0-Adapter ).
HFC-S -> Asterisk -> osmo-e1d (trunkdev) -> Internet (Alfeld, Telekom -> Kiel, Versatel) -> osmo-e1d (trunkdev) -> Yate

Yate is replying with the .alaw PRBS11 file again. Connection was absolutely spotless (see attached 15:00min BERT photos).

My yate is running on a Debian 11 VM, clocked by an icE1usb on Port 1 of a TE820 (which is passed through to the VM).
Host is Debian 11 as well, Intel i5-10400 CPU, qemu-kvm & libvirt for virtualization.
Default kernel parameters on the host.
The VM is using "pcie_aspm=off pcie_port_pm=off" (which was leftover from all the PCIe->PCI bridge testing)

As far as I can tell, the main differences are:
- Intel instead of AMD CPU
- 1x TE820 card instead of 2
- PCIe PM disabled on the guest

(I'm also not seeing any of those lost interrupt errors. I've correlated the timing of the messages in dmesg on DIVF with the errors in the B-channel and they do not match!)

My host machine is using 5.16.0-0.bpo.4-amd64 (from backports, due to the very recent hardware), the guest VM is using 5.10.0-11-amd64.

Some ideas:
- disable the second TE820
- update the host kernel to Debian backports
- disable PCIe ASPM (maybe on both host and guest)

I have access to some AMD machines of the same generation, unfortunately without TE820. Maybe worth a try to replicate the problem there?

Actions #10

Updated by manawyrm about 1 month ago

Just switched the Kiel and Alfeld setups back to the Berlin node at LaF0rge.
Interestingly enough: Same problem.

HFC-S -> Asterisk -> osmo-e1d (trunkdev) -> Internet (Alfeld, Telekom -> Berlin, Vodafone) -> osmo-e1d (trunkdev) -> Yate -> osmo-e1d (physical icE1usb) -> osmo-e1d (trunkdev) -> Yate
shows high bitrate again. So the problem even occurs on the Berlin machine. Similar CPU, similar Yate config, only 1 TE820.

Interesting...

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)