Project

General

Profile

Bug #2975

OsmoBTS doesn't generate measurement indications in absence of uplink bursts

Added by laforge 9 months ago. Updated 10 days ago.

Status:
Stalled
Priority:
Normal
Assignee:
Category:
osmo-bts-trx
Target version:
-
Start date:
02/21/2018
Due date:
% Done:

90%

Spec Reference:

Description

If there are no uplink bursts received (but timestamp indications sent) from OsmoTRX, osmo-bts-trx doesn't appear to generate RSL MEAS REP messages at every SACCH multiframe (103 frames), as expected. Odd.


Related issues

Related to OsmoBTS - Bug #2965: No measurement reports sent for channels other than TCHResolved2018-02-19

Related to OsmoBTS - Bug #2987: OsmoBTS RxQual/RxLev averaging broken if bursts are missignStalled2018-02-23

Related to OsmoBTS - Bug #2700: Odd RTP behavior in case of bad / missing speech framesClosed2017-12-02

Related to OsmoBTS - Feature #2977: OsmoBTS measurment processing at L1SAP too complex / pass measurements along with dataNew2018-02-21

Related to OsmoBTS - Bug #3665: TTCN3 BTS_Tests last SACCH burst received too late -> wrong fake uplink measurement reportStalled2018-10-23

Related to OsmoBTS - Bug #3428: Too many contiguous elapsed fn, dropping...Stalled2018-07-28

History

#1 Updated by laforge 9 months ago

  • Related to Bug #2965: No measurement reports sent for channels other than TCH added

#2 Updated by laforge 9 months ago

The entire measurement computation + reporting process is driven by lchan_meas_check_compute(), which is only called from the l1sap whenever a PRIM_INFO_MEAS is reported up. In absence of bursts/blocks, this primitive is not reported and subsequently no measurement reports are generated.

What we should do instead is track the frame number and whenever the SACCH multiframe ends, we should trigger a RSL MEAS REP. the missing uplink bursts all have to count as erroneous, i.e. 100% bit errors.

The entire dualism of PH_DATA.ind / PH_TCH.ind containg (unsued) measurement data, but then having a separate PRIM_INFO_MEAS is odd to begin with. The measurements should always accompany the PH-DATA.ind / PH-TCH.ind and PRIM_INFO_MEAS should be abandoned.

#3 Updated by laforge 9 months ago

  • Status changed from New to In Progress

#4 Updated by laforge 9 months ago

#5 Updated by laforge 9 months ago

#6 Updated by laforge 9 months ago

  • Related to Bug #2987: OsmoBTS RxQual/RxLev averaging broken if bursts are missign added

#7 Updated by laforge 3 months ago

  • Assignee set to dexter

#8 Updated by dexter 3 months ago

  • % Done changed from 0 to 50

One of the most sensitive parts here is when the SACCH block drops out because then the measurement computation process is not triggered. As we receive measurement indications we need to compare the frame number from the currently received one against the frame number of the previous one in order to check if we already crossed the boundary of a SACCH interval. I have now added a patch that does exactly that. Now a dropout of the SACCH interval will not supress the measurement computation anymore.

See also: https://gerrit.osmocom.org/#/c/osmo-bts/+/10492

However, we are not done yet. When we get a complete dropout with no measurements at all (battery died, tunnel etc...) then we have a problem. For this I would propose to use the time indication to implement a timeout. When lets say a quarter of a SACCH interval has passed without executing the computation/measurement report we could forcefully trigger the computation to generate a report. Unfortunately we are still not good in handling intervals with no measurements so I think its better to wait until that is fixed. See also #2987

#9 Updated by dexter 3 months ago

The patch mentioned above is still in review. I have fixed the review issues now.

I also found out that we not really resetting the measurement states. Since the lchans are statically allocated (i think so, correct me if I am wrong) the states are not reset when the channel is re-opened by another subscriber. I now added a centralized function that resets everything and that is called from rsl.c when the channel is acknowledged.

See also: https://gerrit.osmocom.org/#/c/osmo-bts/+/10554/

#10 Updated by dexter 3 months ago

Unfortunately change Gerrit change 10554 causes problems with TTCN3 tests TC_meas_res_sign_sdcch4 and TC_meas_res_sign_sdcch8. The test complains ("No MEAS RES received at all") that there were no measurement reports received but when checking the pcap files one can see that there are indeed measurement reports. Presumably there is (also) a problem with the test expectation.

While trying to fix the problems with the TTCN3 tests I still found some remaining problems that need to be fixed, see also:
https://gerrit.osmocom.org/10564

#11 Updated by dexter 3 months ago

  • % Done changed from 50 to 90

All related patches are merged, unfortunately there is a problem now with the following to TTCN3 tests.

TC_meas_res_sign_sdcch4
TC_meas_res_sign_sdcch8

This is presumably a problem with the test expectation. Experiments show that even though the test is supposed to generate correct intervals the code always detects lost interval ends. Also TTCN3 complains that it would not see any measurement reports, but the pcap files show plenty of them. I also checked the numbering, it starts at 0 and looks good so far.

#12 Updated by daniel 3 months ago

dexter wrote:

TC_meas_res_sign_sdcch4
TC_meas_res_sign_sdcch8

This is presumably a problem with the test expectation. Experiments show that even though the test is supposed to generate correct intervals the code always detects lost interval ends. Also TTCN3 complains that it would not see any measurement reports, but the pcap files show plenty of them. I also checked the numbering, it starts at 0 and looks good so far.

The pcap shows plenty measurement reports, but the ttcn3 log also shows quite a few being processed/received. After a while it seems the Measurement Report from LAPDm is not generating a new Measurement Report on RSL.

See https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/221/artifact/logs/bts-tester/BTS_Tests.TC_meas_res_sign_sdcch4.pcap (also attached)
Packet #281 is the last RSL MEAS Rep on RSL while more are coming in from the "MS".

It's easy to filter for measurement reports in wireshark like this:
(gsm_a.dtap.msg_rr_type == 0x15)

If you append && gsm_abis_rsl you can see that 16 measurement reports are being received for SDCCH/4,subchan 0 and then only one for subchan 1 (packet #281). After that any further measurement reports are ignored from the bts it seems.

Looking at the MS side there are 15 MEAS reports for subchan 0 as well as 15 for subchan 1. After timing out on subchan 1 the test aborts, so neither subchan 2 or 3 are attempted.

It's interesting that the RSL reports number 16 (Measurement result number 0 - 15) while the MS only sends 15.

#13 Updated by dexter 3 months ago

I have found the problem now. I have confused Subslots and Timeslots for SDCCH/4 and SDCCH/8. This is now fixed and unit tests are added. The TTCN3 tests should be fine again when this is merged.

https://gerrit.osmocom.org/#/c/osmo-bts/+/10654 measurement: fix is_meas_overdue() and increase testcoverage

#14 Updated by dexter 2 months ago

See also Ticket #3502 as the problem is closely linked to this one.

#15 Updated by dexter 2 months ago

We have discussed the timing problem now and we came to the conclusion that one can not really rely on the ordering between SACCH and TCH voice since, those are different channels and it may be very vendor specific through which queues the blocks are sent. So at least a slight timing deviation must be accepted here. Unfortunately this renders my approach to detect the SACCH interval end useless.

The only way to fix this seems to be the usage of two buckets. We would collect measurements. By the frame number we can see if the measurement has to go into the bucket for the current interval or if it as to go into the bucket for the next interval. We would then notice the missed interval end by a timeout. If we start getting only measurements for the next-interval-bucket for some time we can flush the current-interval-bucket. This is of course a bit complex so we first need to see if there are other ways around.

Concerning osmo-bts-sysmo, there is good news. The phy has the option to space out unreadable bursts but we intentionally disabled this functionality, so in theory osmo-bts sysmo should never loose a block. Even when the no block is received it will still hand over a measurement and data of length zero. In order to verify that I made an experiment. I have set up a call and took the battery out of the phone. This is a measurement period from the time frame where the battery was already out:


<0004> measurement.c:442 025072/18/08/31/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=0
<0007> l1sap.c:1130 025072/18/08/31/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025077/18/13/36/37 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=1
<0007> l1sap.c:1130 025077/18/13/36/37 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025081/18/17/40/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=2
<0007> l1sap.c:1130 025081/18/17/40/41 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025085/18/21/44/45 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=3
<0007> l1sap.c:1130 025085/18/21/44/45 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025090/18/00/49/02 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=4
<0007> l1sap.c:1130 025090/18/00/49/02 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025094/18/04/02/06 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=5
<0007> l1sap.c:1130 025094/18/04/02/06 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025098/18/08/06/10 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=6
<0007> l1sap.c:1130 025098/18/08/06/10 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025103/18/13/11/15 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=7
<0007> l1sap.c:1130 025103/18/13/11/15 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025107/18/17/15/19 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=8
<0007> l1sap.c:1130 025107/18/17/15/19 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025111/18/21/19/23 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=9
<0007> l1sap.c:1130 025111/18/21/19/23 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025116/18/00/24/28 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=10
<0007> l1sap.c:1130 025116/18/00/24/28 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025120/18/04/28/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=11
<0007> l1sap.c:1130 025120/18/04/28/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025124/18/08/32/36 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=12
<0007> l1sap.c:1130 025124/18/08/32/36 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025129/18/13/37/41 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=13
<0007> l1sap.c:1130 025129/18/13/37/41 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025133/18/17/41/45 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=14
<0007> l1sap.c:1130 025133/18/17/41/45 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025137/18/21/45/49 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=15
<0007> l1sap.c:1130 025137/18/21/45/49 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025142/18/00/50/02 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=16
<0007> l1sap.c:1130 025142/18/00/50/02 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025146/18/04/03/06 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=17
<0007> l1sap.c:1130 025146/18/04/03/06 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025150/18/08/07/10 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=18
<0007> l1sap.c:1130 025150/18/08/07/10 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025155/18/13/12/15 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=19
<0007> l1sap.c:1130 025155/18/13/12/15 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025159/18/17/16/19 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=20
<0007> l1sap.c:1130 025159/18/17/16/19 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025163/18/21/20/23 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=21
<0007> l1sap.c:1130 025163/18/21/20/23 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025168/18/00/25/28 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=22
<0007> l1sap.c:1130 025168/18/00/25/28 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025172/18/04/29/32 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=0), num_ul_meas=23
<0007> l1sap.c:1130 025172/18/04/29/32 Rx TCH.ind chan_nr=0x0a
<0004> measurement.c:442 025102/18/12/10/14 (bts=0,trx=0,ts=2,ss=0) adding measurement (is_sub=1), num_ul_meas=24
<0004> measurement.c:319 (bts=0,trx=0,ts=2,ss=0) meas period end fn:25102, fn_mod:12, status:1, pchan:TCH/F
<0004> measurement.c:658 (bts=0,trx=0,ts=2,ss=0) Calculating measurement results for physical channel:TCH/F
<0004> measurement.c:680 (bts=0,trx=0,ts=2,ss=0) received 25 UL measurements, expected 25
<0004> measurement.c:732 (bts=0,trx=0,ts=2,ss=0) received UL measurements contain 3 SUB measurements, expected 3
<0004> measurement.c:734 (bts=0,trx=0,ts=2,ss=0) replaced 0 measurements with dummy values, from which 0 were SUB measurements
<0004> measurement.c:773 (bts=0,trx=0,ts=2,ss=0) Computed TA256( 171798681) BER-FULL(10.16%), RSSI-FULL(-113dBm), BER-SUB(14.54%), RSSI-SUB(-114dBm)
<0004> measurement.c:786 (bts=0,trx=0,ts=2,ss=0) UL MEAS RXLEV_FULL(0), RXLEV_SUB(0),RXQUAL_FULL(6), RXQUAL_SUB(7), num_meas_sub(3), num_ul_meas(25) 

From what I can see this looks very good. All measurements are there and the period end is detected properly after the 25th measurement. I can not say to much about the computation result, but shouln't BER-FULL be somewhere near 100%. Maybe this needs to be checked. I don't know.

Note: What is valid for osmo-bts-sysmo is also valid for osmo-bts-litecell15.

For osmo-bts-trx the behavior is completely different. When I take the RX-Antenna of the USRP-B200 of and put the phone approx 1m away I can already see dropouts, also at the SACCH with all the consequences of missed measurement intervals.

Our Idea is now to realize something similar with osmo-bts-trx. We first need to pinpoint where the bursts/frames/blocks get spaced out. It could be that they are already spaced out at osmo-trx. An idea is to take a look at the mechanism that receives the UDP packets from the TRX and check for lost packets there. In case a packet is missing we could substitute it with a dummy. We think it is a good idea to make the substitution in osmo-bts-trx since there are already some variants of trx (e.g. fake-trx) around and checking and patching them all might not be such a good idea.

We will now take out the existing interval end detection logic and approach the problem as described above.

#16 Updated by dexter 2 months ago

In order to have a functioning measurement reporting again I have removed fix is_meas_overdue() now.

https://gerrit.osmocom.org/#/c/osmo-bts/+/10814 measurement: remove missed interval end detection
https://gerrit.osmocom.org/#/c/osmo-bts/+/10815 measurement: fix unit-test test_lchan_meas_process_measurement

During our discussions we realized that a lot of the confusion we experience here comes from the way how measurement reports are handled in osmo-bts. The data and the measurement reports are handled on separate pathes, but it would be actually more natural to have both in one unit, handled on the same path. There is now an issue about that. See: #3530

#17 Updated by pespin about 2 months ago

  • Related to Bug #2700: Odd RTP behavior in case of bad / missing speech frames added

#18 Updated by fixeria about 2 months ago

  • Related to Feature #2977: OsmoBTS measurment processing at L1SAP too complex / pass measurements along with data added

#19 Updated by pespin 22 days ago

  • Related to Bug #3665: TTCN3 BTS_Tests last SACCH burst received too late -> wrong fake uplink measurement report added

#20 Updated by pespin 22 days ago

  • Related to Bug #3428: Too many contiguous elapsed fn, dropping... added

#21 Updated by dexter 10 days ago

  • Status changed from In Progress to Stalled

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)