jibuf: Flush or re-adapt buffered packets at resync time
Looking at some traces with high jitter over an emulated SAT link, with osmo-bsc-nat master processing OSMUX->RTP->JIBUF, I still see some packet reordering when some packets with the M bit appear, which should not happen due to Osmux since  and2 were merged in libosmo-netif.
Most probably the issues are in the jitter buffer implementation.
First, if Osmux detects a frame was lost (non consecutive SEQs received), it will mark the first RTP packet with the Marker bit. As it is known bad things don't come alone, during this bad periods it can be that 2-3 frames (like N, N+2, N+5) are dropped and we received 2-3 M bits in a short period, and probably really almost at the same time.
To explain this issue, let's put an easy example with osmux frame Fn and Fn+2 arriving almost at the same time, with a batch factor of 4. For the first one, 4 RTP packets will be created and queued:
[T0] RTPn [T0+20] RTPn+1 [T0+40] RTPn+2 [T0+60] RTPn+3
Then the 2nd Osmux frame Fn+2 (seq gap!) at let's say T1=T0+40. Osmux will flush the queue and schedule the packets (send them to jibuf):
[T0+40] RTPn+2 -> dequeued (flushed) [T0+60] RTPn+3 -> dequeued (flushed) [T1] RTPn+4 (M) -> dequeued [T1+20] RTPn+5 [T1+40] RTPn+6 [T1+60] RTPn+7
When jibuf receives RTPn+2, RTPn+3 and RTPn+4 at the same time, it will try to schedule them in corresponding time according to the buffer. Let's say it still now contains the 2 previous packets to those 3 that were now queued. However, as jibuf detects a new syncpoint (packet with M bit or non-synced seq<->timestamp), it will schedule that M packet within "threshold" ms and use that packet seq and timestamp as reference for next, which means new packets queued will take that one as reference for the buffer. In this scenarios, if the previous syncpoint and the new syncpoint are different enough, we can end up inserting new packets in the queue in an out-of-seq fashion, since we only take seq into account in there.
All possibly failing scenarios include syncpoint packets and the state of the waiting queue. In this cases we probably want to flush the queue or re-schedule the packets with seq < syncpoint seq.
Some possible improvements:
1- First, write a unit test to show the scenario described.
2- Code needs to be added which when adding a new syncpoint packet in position X, takes all queued packets X+1 to Xfinal and re-schedules its
>cb dequeue trigger timestamp to be scaled between now() and the delay stored in ->cb of the syncppint packet. Then syncpoint packet is added the latest in the list. Then, the timer needs to be updated to the first value of ->cb in the list. A simpler option for 2 would be to immediately flush all packets when a syncpoint is added.
- Status changed from New to Feedback
- % Done changed from 0 to 90
Bug is showcased and fixed in:
remote: https://gerrit.osmocom.org/#/c/libosmo-netif/+/9171 tests: jibuf_test: Add scenario to show out-of-order bug remote: https://gerrit.osmocom.org/#/c/libosmo-netif/+/9172 jibuf: Fix out-of-order seq queue around syncpoints