Project

General

Profile

Support #4365

Create testsuite to test osmo-bts-trx+osmo-trx under high channel load

Added by pespin 9 months ago. Updated about 2 months ago.

Status:
Stalled
Priority:
Normal
Assignee:
Category:
LimeSDR
Target version:
-
Start date:
01/15/2020
Due date:
% Done:

50%

Spec Reference:

Description

The final aim of this task is to check if we run into CPU limitations of the Raspi CM3 of the LimeNet-micro when maximizing the channel load even of a single TRX.

Say, for example, 14 concurrent TCH/H with AMR inside on the 7 TS. It would be very interesting to see if that works, and if there is any margin on the CPU left, etc.

Quick way to test manually: Use osmo-bsc connected to osmo-bts-trx and use VTY command "bts <0-255> trx <0-255> timeslot <0-7> sub-slot <0-7> (activate|deactivate) (hr|fr|efr|amr) [<0-7>]"

"Automatic" testing: Use TTCN3, add test to BTS_Tests.ttcn:
As we even have that RTP source and sink you could even send some (random payload) RTP messages if you'd want, they just need to match in size and in terms of the RTP payload type. We don't care about the content of the RTP at all here.
Make sure the config/vty is configured for 7 timeslots at TCH/H and unlimited radio link timeout and then send the 14x CHAN ACT with the right channel mode (and if you want to add RTP, the IPA CRCX/MDCX).


Related issues

Related to OsmoTRX - Bug #4366: Create testsuite to test osmo-bts-trx+osmo-trx under high channel load (w/ RACH correlation)New01/15/2020

History

#1 Updated by pespin 9 months ago

  • Related to Bug #4366: Create testsuite to test osmo-bts-trx+osmo-trx under high channel load (w/ RACH correlation) added

#2 Updated by pespin 9 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 50

I started some work in osmo-ttcn3-hacks.git and docker-playground.git branch "pespin/bts-perf".

https://git.osmocom.org/osmo-ttcn3-hacks/commit/?h=pespin/bts-perf
https://git.osmocom.org/docker-playground/commit/?h=pespin/bts-perf

In there I have a passing test which uses an osmo-bsc.cfg with all channels 1..7 set to TCH/H and the TC_pespin test activates speech AMR on all of them in the BTS, waits 10 seconds, then deactivates them.

Next step is to launch the test against osmo-bts-trx+osmo-trx-lms running on my own laptop to see if there's any issue. Later, run it against same setup but running in LimeNet-micro.

#3 Updated by ipse 9 months ago

Note, that in normal operation, osmo-bts-trx also consumes a considerable amount of CPU. Not as much as osmo-trx but still quite noticeable. But this only happens when osmo-trx produces bursts for osmo-bts-trx to process. If you just enable channels, osmo-trx will produce idle bursts (or no bursts, depending on the version and configuration), which means osmo-bts-trx won't have anything to process.

Also note that osmo-bts-trx CPU usage might depend on the uplink signal quality because it involves Viterbi decoding which iterates over a signal. I personally haven't tested this but this is the theory.

#4 Updated by pespin 9 months ago

I have a ttcn3 test opening 14 TCH/H channels from BSC side towards a bts-trx->LimeNet-micro all orchestrated with osmo-gsm-tester, and it seems to be working fine
In LimeNet-micro, top shows a load average of: 0.77, 0.78, 0.51
That's with osmo-bts-trx running outside of LimeNet-micro and without RTP processing though, but looks like there's enough load to keep it going anyway.

In osmo-gsm-tester prod main unit: jenkins ~/test/:

./run_locally.sh -s ttcn3_bts_tests:trx-lms-limenet-micro -l dbg -T -t highchanload_tchh

It worked fine for a while (20 mins) with no overruns or similar, while I could monitor it though VTY, top, etc.
I then tried to run stress-ng in the RPI CM3 while osmo-trx was running, and osmo-bts-trx quickly dropped the connection probably due to CLOCK IND arriving to late, despite osmo-trx-lms running with rt-prio 18. That's expected I'd say though because I did really put the whole system into a crazy stress which shouldn't happen in usual ways. I'll try do run with more realistic stress next times. that's what I used so far:

stress-ng --vm 4 --hdd 4 --cpu 4 --open 4 --sem 4 --sock 4 --io 2 --timer 2

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/17098 bts: Introduce new module for performance tests

Next steps:
  • Support running osmo-bts-trx remotely by osmo-gsm-tester into RPI CM3 module (same as we do for osmo-trx-lms, run the packaged binary directly).
  • Add an RTP source/sink to the TTCN3 code to make sure some data is being sent so that osmo-bts-trx does some work (first in docker setup branch pespin/bts-perf, then in real HW with osmo-gsm-tester).

The problem is that we don't have an easy way to feed the uplink in this case afaiu, only the downlink.

#5 Updated by dexter 8 months ago

I have the testcase BTS_Tests_perf.TC_highchanload_tchh now running on my side. I am currently testing with faketrx and trxcon. I have implemented Sending of the IPA CRCX and now I get the IP/Port from the BTS back. I wonder if sending RTP packets possible immediately or if I have to go through the complete procedure.

#6 Updated by dexter 8 months ago

I have now added the RTP emulation and I can send RTP packets to the BTS. However, it still fails some where but I can already see the packets in Wirshark.

#7 Updated by dexter 8 months ago

Sending RTP packets to all channels of the BTS works now. Now I will try to receive RTP packets. My first test sending an MDCX to to BTS looked promising. I indeed get packets from the BTS, however it still needs some more integration.

#8 Updated by dexter 8 months ago

  • % Done changed from 50 to 60

The RTP side of the BTS test is now done. We have now RTP streams in both directions, next step is to also work with the traffic from the UM side.

See also: https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/17285 bts: Add RTP payload testing to performance tests

#9 Updated by dexter 8 months ago

I am currently working on the UM side. Unfortunately I am having problems with fake_trx.py. I can clearly see that frames from the BTS are arriving inside faketrx but they get stuck in burst_fwd.py. The reason is from what I can see that in method forward_msg() trx == src_trx is true. I do not understand what this is about. There are two TRX one for MS and one for BTS, shouldn't there be only one TRX for the BTS or is the TRX for MS the interface to trxcon?

#10 Updated by fixeria 8 months ago

There are two TRX one for MS and one for BTS, shouldn't there be only one TRX for the BTS or is the TRX for MS the interface to trxcon?

Yes, one Transceiver is for the MS (trxcon), another is for the BTS (osmo-bts-trx). Each Transceiver has its own TRX interface (TRXC and TRXD sockets).

The reason is from what I can see that in method forward_msg() trx src_trx is true. I do not understand what this is about.

This is rather a cosmetic check. We're not supposed to deliver a burst to the same Transceiver that originated it, right? You can even drop it, and it would still work as expected because (given that trx src_trx) trx.rx_freq != trx.tx_freq.

#11 Updated by fixeria 8 months ago

You basically need to make sure that trxcon expects to receive bursts on TCH timeslots. If no logical channels are activated, then no SETSLOT command is sent to fake_trx.py, so the corresponding timeslot(s) are not enabled and the bursts are dropped. Your test case should send L1CTL_EST_REQ for each logical channel (i.e. 14 times). I guess that's the problem.

P.S. There may be some troubles with activating multiple logical channels in trxcon. In particular, it deactivates all other logical channels on receipt of L1CTL_EST_REQ. This little hack should make it work: https://git.osmocom.org/osmocom-bb/commit/?h=fixeria/gprs&id=3d2bd1fdcbfb4864e30f35705321ecd57c8fbb0b.

#12 Updated by pespin 8 months ago

  • Assignee changed from pespin to dexter

Assigning to dexter since he's taking care for the RTP part. Please-reassign it to me once you have the docker setup working. I can then make sure it's working fine in the osmo-gsm-tester setup running against LimeNET-micro.

#13 Updated by dexter 7 months ago

I have done some experiments today. It looks like the problems with TCH data not forwarded correctly inside of faketrx are now gone. I can follow the TCH data now from the BTS up to the l1ctl_link.c. I can even see frames arriving on TTCN3 level but with limitations. It looks like I only receive the FACCH yet but no TCH data. The reason for this is because the TCH data gets stuck in trxcon because the decoding (CRC8) of the frames fail. I also had trouble with a wrong TSC, but the errors also do not vanish after I have corrected that. I need to investigate whats wrong here.

A possible cause for the problem might be the fact that I currently only activate the channel, but I do not do any assignment, maybe that causes the BTS sending dummy frames or invalid frames?

#14 Updated by dexter 7 months ago

I have now found the cause for the non decoding RTP packets. The problem here was that the RTP emulation sends a 4 byte RTP packet by default. This is of course a made up payload and has nothing to do with a GSM V1 voice RTP packet. However, the BTS accepts those packets but it tosses the payload deeper in the stack in scheduler_trx.c:tx_tch_common() where the voice payload is expected to be 15 bytes. I have reconfigured the RTP emulation now so that it sends 15 bytes payload and now trxcon decodes the voice payload it gets from the BTS just fine (CRC matches)

Next step is to make sure the payload is received on the TTCN3 side and then we will loop it back.

#15 Updated by dexter 7 months ago

  • % Done changed from 60 to 70

On TTCN3 level we now receive the traffic payload. An altstep receives the incoming traffic indications and sends them back as traffic requests to trxcon. This now works for a single TCH/H channel. When I increase the channel count to the possible maximum of 14 channels I get problems on the RTP side:

18:41:04.621682 16 RTP_Emulation.ttcn:531 setverdict(fail): none -> fail reason: "Connection refused (unexpected)", new component reason: "Connection refused (unexpected)" 

I wonder what that could be as I did not see this when the TRX side was not handled.

#16 Updated by dexter 6 months ago

I have implemented some checks that check if the actual payload we pass around remains intact. For the first channel this is indeed the case, but for all subsequent the payload that comes back is all zero. It looks like if all subsequent channels do not receive data from trxcon. All payload that is passed on the L1CTL interface looks good, however I still need to make sure that I even receive payload of any subsequent channels on L1CTL, the chain may be interrupted even earlier. I also made sure that https://git.osmocom.org/osmocom-bb/commit/?h=fixeria/gprs&id=3d2bd1fdcbfb4864e30f35705321ecd57c8fbb0b is in place, so this can not be the problem.

#17 Updated by dexter 6 months ago

I tried to locate the problem. Infortunately I was not able to narrow it down completely to something specific yet. The setslot command seems to be applied correctly. The timeslots are indeed available but I get wired behavior. When I work with one TS, one SS only everything works fine but once I use multiple TS it only works for the second SS, which is confusing.

#18 Updated by dexter 6 months ago

I have traced the problem all the way up to TRXCON now. It seems that when the L1CTL_DM_EST_REQ is sent from TTCN3 it resets the TS, which means that it deactivates all lchans on that slot, and since each TS is used twice, the last L1CTL_DM_EST_REQ wins. Thats why we see only TCH/H1 working but no TCH/H0. So, we need to work around the reset, we must not reset the entire TS, but only the SS we really mean.

#19 Updated by dexter 6 months ago

The test now runs fine with multiple channels. I have also added checks to make sure that the traffic is really passed around as expected and not replaced by dummy frames or even silence. On my machine I can simulate up to 8 channels before the load becomes a problem. However this does not mean anything, its just because I use a very old laptop and its all still on faketrx/trxcon.

I think its time now to try it out with real hardware. I think I will use the USRP B200 on a separate laptop as a BTS and the motorola phone + ttcn3 on my laptop.

See also:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/17285 bts: Add RTP payload testing to performance tests

#20 Updated by dexter 6 months ago

  • % Done changed from 70 to 80

#21 Updated by dexter 5 months ago

  • % Done changed from 80 to 60

There are major problems with the way I control faketrx/trxcon. The problem is that apparently I try to allocate the lchans as single MS, but I should rather use multiple instances of trxcon/faketrx, one for each lchan. However, I still want to see if it works at with at least one lchan on real hardware. When I get the things correctly we probably will need 14 motorola phones?

#22 Updated by dexter 5 months ago

  • % Done changed from 60 to 50

At the moment I am not able to make the test working with a single instance. I expected that at least that should work but it does not. At the moment I do not know where exactly the problem is. I tried with various different settings, different physical distances between MS and BTS, also with antenna and with dummyload. Seems not to make any difference. At least the MS can sync onto the BTS. In order to get this working I need to investigate this further.

Because of the problems we decided to get it running without real TRX/MS first. I made the osmo-bts instance running on a separate host while faketrx/trxcon run on the host that also handles the TTCN3 tests. This works so far.

Also I think its a good idea to migrate from faketrx/trxcon to virtphy. This natively allows multiple MS instances and it would help to upgrade the support for multiple instances on the TTCN3 side. However I could not make it work so far. I think it makes most sense to continue here and once it works to go back trying it with osmocom-bb.

#23 Updated by fixeria 5 months ago

Also I think its a good idea to migrate from faketrx/trxcon to virtphy.

Not sure if it's a good idea given that this ticket is about osmo-bts-trx and osmo-trx. In case of virtphy and osmo-bts-virtual, you're not dealing with bursts at all, while GSM 05.02 (convolutional decoder) is one of the heaviest tasks on the BTS side.

At the moment I am not able to make the test working with a single instance. I expected that at least that should work but it does not. At the moment I do not know where exactly the problem is.

If you need any help with trxcon / fake_trx.py / Calypso PHY, we can have a call later on today or some other day this week.

#24 Updated by laforge 5 months ago

On Mon, Jun 08, 2020 at 09:26:41AM +0000, dexter [REDMINE] wrote:

At the moment I am not able to make the test working with a single instance. I expected that at least that should work but it does not. At the moment I do not know where exactly the problem is. I tried with various different settings, different physical distances between MS and BTS, also with antenna and with dummyload. Seems not to make any difference. At least the MS can sync onto the BTS. In order to get this working I need to investigate this further.

Can you please describe in more detail (possibly with log file excerpts, pcap files, ...) where it fails.

#25 Updated by dexter 4 months ago

I am currently working on a way to support multiple L1CTL instances in the TTCN3 testsuite. I currently have the setup running with virtphy and osmo-bts-virtual, which allows me to test the support of multiple l1ctl instances. I am aware that virtphy bypasses a lot code that we need to run in order to do the measurement, but for now it allows me to test my TTCN3 code. Once this is working I will try again with osmocon/osmocom-bb, I will get back to you with logs/traces then.

#26 Updated by dexter about 2 months ago

  • Status changed from In Progress to Stalled

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)