revisit fn-advance / rts-advance default settings
We currently use a fn-advance default of of 20 frames, and a rts-advance of 5, resulting in a total of 25 frames (equalling 115ms) of downlink frame nubmer advance.This will cause
- significantly increased RTT for GPRS user plane data
- increase latency of RLC/MAC signaling, specifically
- tbf establishment
- potentially cause window stalls if we don't poll for ACK/NACK a lot sooner than our window filling up.
- probably mess with LAPDm timing
I would guess that on modern hardware, particularly with SCHED_RR on TRX + BTS, we can reduce the fn_advance drastically. The rts_advance likely needs to remain in place without too many changes, as this is the amount of time the PCU has to prepare downlink data (i.e. schedule DL).
As a second step, we could possibly even think of something like a dynamically sized fn-advance, similar to dynamic jitter buffers work in RTP.
- Status changed from In Progress to Feedback
- % Done changed from 0 to 80
So in summary:
- I tested with B200 + osmo-trx-uhd + multi-arfcn with 2 TRX
- I tested with LimeSDR-USB + osmo-trx-lms + 1 TRX
- I had to run osmo-pcu also with SCHED_RR (-r 1) to avoid having issues with PDTCH Dl blocks not enqueued quickly enough in BTS (related to rts-advance value)
- I also noticed that using a more conservative logging levels (I was using a quite verbose and compute intensive one for RLCMAC category) also helps in getting more stable.
- "fn-advance" can be decreased to 2 by default, it worked fine. "rts-advance is on the edge already, so I wouldn't touch that one.
I also submitted patches improving some related scheduler code to provide more information. I also added rate counters in order to display issues related to fn-advance and rts-advance ("show rate-counters" in osmo-bts).
I did some testing with a LimeNET-micro and so far it looks good from osmo-bts-trx side, but it's not working properly on osmo-trx-lms side due to Tx downlink bursts arriving too late when using fn-advance 2 or 3, I get lots of messages like this from time to time:
DTRXDDL <0003> Transceiver.cpp:430 [tid=140424023869184][chan=0] dumping STALE burst in TRX->SDR interface (0:2005343 vs 1:2005343), retrans=0
I'm running all through systemd services and they have realtime scheduling set in the service files.
I added some rate counters to monitor that kind of issue in osmo-trx, and provide also some VTY command to establish a threshold at which osmo-trx will exit to flag the BTS that something's wrong, like we do for other counters (overruns, underruns, dropped packets, etc.):
remote: https://gerrit.osmocom.org/c/osmo-trx/+/19050 Rename device specific rate counter multi-thread helpers
remote: https://gerrit.osmocom.org/c/osmo-trx/+/19051 Introduce rate counter tx_stale_bursts
While at it, I also fixed some bug in the rate counter thresholds I observed.