Project

General

Profile

Bug #3262

jibuf: Flush or re-adapt buffered packets at resync time

Added by pespin 11 days ago. Updated 9 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
05/14/2018
Due date:
% Done:

100%

Spec Reference:

Description

Looking at some traces with high jitter over an emulated SAT link, with osmo-bsc-nat master processing OSMUX->RTP->JIBUF, I still see some packet reordering when some packets with the M bit appear, which should not happen due to Osmux since [1] and2 were merged in libosmo-netif.

[1] https://git.osmocom.org/libosmo-netif/commit/?id=e259c8ab18ce53bd58d022b2c6a193b4a2ec5b1b
[2] https://git.osmocom.org/libosmo-netif/commit/?id=dbf8e531927a39cb044c5ccdde54173d0f213a2e

Most probably the issues are in the jitter buffer implementation.

First, if Osmux detects a frame was lost (non consecutive SEQs received), it will mark the first RTP packet with the Marker bit. As it is known bad things don't come alone, during this bad periods it can be that 2-3 frames (like N, N+2, N+5) are dropped and we received 2-3 M bits in a short period, and probably really almost at the same time.

To explain this issue, let's put an easy example with osmux frame Fn and Fn+2 arriving almost at the same time, with a batch factor of 4. For the first one, 4 RTP packets will be created and queued:

[T0] RTPn
[T0+20] RTPn+1
[T0+40] RTPn+2
[T0+60] RTPn+3

Then the 2nd Osmux frame Fn+2 (seq gap!) at let's say T1=T0+40. Osmux will flush the queue and schedule the packets (send them to jibuf):

[T0+40] RTPn+2 -> dequeued (flushed)
[T0+60] RTPn+3 -> dequeued (flushed)
[T1] RTPn+4 (M)  -> dequeued
[T1+20] RTPn+5
[T1+40] RTPn+6
[T1+60] RTPn+7

When jibuf receives RTPn+2, RTPn+3 and RTPn+4 at the same time, it will try to schedule them in corresponding time according to the buffer. Let's say it still now contains the 2 previous packets to those 3 that were now queued. However, as jibuf detects a new syncpoint (packet with M bit or non-synced seq<->timestamp), it will schedule that M packet within "threshold" ms and use that packet seq and timestamp as reference for next, which means new packets queued will take that one as reference for the buffer. In this scenarios, if the previous syncpoint and the new syncpoint are different enough, we can end up inserting new packets in the queue in an out-of-seq fashion, since we only take seq into account in there.

All possibly failing scenarios include syncpoint packets and the state of the waiting queue. In this cases we probably want to flush the queue or re-schedule the packets with seq < syncpoint seq.

Some possible improvements:
1- First, write a unit test to show the scenario described.
2- Code needs to be added which when adding a new syncpoint packet in position X, takes all queued packets X+1 to Xfinal and re-schedules its >cb dequeue trigger timestamp to be scaled between now() and the delay stored in ->cb of the syncppint packet. Then syncpoint packet is added the latest in the list. Then, the timer needs to be updated to the first value of ->cb in the list.
3
A simpler option for 2 would be to immediately flush all packets when a syncpoint is added.

out-of-order-rtp-packets.png (162 KB) pespin, 05/14/2018 05:45 PM

3129

History

#1 Updated by pespin 10 days ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 90

Bug is showcased and fixed in:

remote:   https://gerrit.osmocom.org/#/c/libosmo-netif/+/9171 tests: jibuf_test: Add scenario to show out-of-order bug
remote:   https://gerrit.osmocom.org/#/c/libosmo-netif/+/9172 jibuf: Fix out-of-order seq queue around syncpoints

#2 Updated by pespin 9 days ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

Merged, closing

Also available in: Atom PDF