Project

General

Profile

Bug #2329

frequent "no space for uplink measurement" message

Added by laforge 6 months ago. Updated 10 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
osmo-bts-sysmo
Target version:
-
Start date:
06/18/2017
Due date:
% Done:

90%

Spec Reference:

Description

on current osmo-bts master (f4544573f84f2fbfd2f11b4c35274c304c05df4b) with osmo-bts-sysmo backend and TCH/H, I'm getting plenty of the below messages:

<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement
<0004> measurement.c:155 (bts=0,trx=0,ts=1,ss=0) no space for uplink measurement

so somehow, the measurement processing is still broken, which makes me quite uneasy. We should not be able to introduce such an easy-to-spot regression in the code.

openbsc.cfg (1.96 KB) ipse, 08/18/2017 12:04 PM

osmo-bts.log Magnifier (305 KB) ipse, 08/18/2017 12:04 PM

meas_bug1.pcap (27.1 KB) ipse, 08/18/2017 04:06 PM

osmo-bts.log Magnifier (561 KB) ipse, 08/18/2017 04:06 PM

History

#1 Updated by laforge 6 months ago

  • % Done changed from 0 to 40

On a TCH/H on TS1 SS0, I'm getting L1SAP MEAS IND at the following fn_mod104:
3 11 20 29 37 46 55 63 72 81 89 98

So there might be an off-by-four, i.e.
  • we compute something like 107 % 104 = 3
  • we actually should compute something like (107-4)%104 = 103

In that latter case we end up receiving a L1SAP MEAS IND at the correct fn104 that tchh0_meas_rep_fn104 expects

Not sure where that off-by-four might be coming from. We're generating a L1SAP MEAS IND every time we receive a PH-DATA.ind from the PHY, and we use exacty the frame number we receive from there.

Maybe the frame number in the PH-DATA.ind is the frame number of the last of the 4 bursts? But then, the first burst would be "-3", not "-4".

#2 Updated by laforge 6 months ago

what we could do is store the last frame number at which we had received a L1SAP MEAS IND for this lchan, and check if the spec-mentioned modulo value is anywhere in between the last number and the current number. This way it should be portable, no matter what kind of weird offsets there might be applied? Not sure if it's worth the extra uint32_t per each lchan,though

#3 Updated by laforge 6 months ago

ok, so the L1 of sysmobts (and lc15) resports the frame number of the first burst of the block

If we look into Clause 7 Table 1 of 9: Mapping of logical channels onto physical channels of TS 05.02 we will see the burst periods for the various blocks at the different channel types. For my example of having a TCH/H on TS1 SS0,
  • SACCH is on B(12, 38, 64, 90) MOD 104
  • VOICE is on B0, B1, B2 MOD 13, (i.e. repeating 8 times witin the MOD104 above)

so we should get voice on 0, 4, 8, 13+0=13, 13+4=17, 13+8=21, 26+0=26, 26+4=30, 26+8=32, ...
which luckily corresponds to the above-3, i.e. 3-3=0, 7=3=4, ... up to 15-3=12 for the SACCH

#4 Updated by laforge 6 months ago

Ok, so the problem is that even if we correct for the offset of 3 (and then get correct fn104 values for e.g. the SACCH/H first burst of 12), when the FACCH is stealing the bursts, there simply is no PH-DATA.ind on fn104=103.

The first burst number for TCH/H (TS1/SS0) are different than those of FACCH/H (TS1/SS0:
  • SACCH/H: 0, 4, 8, 13+0=13, 13+4=17, 13+8=21, 26+0=26, 26+4=30, 26+8=32, 39+0=39, 39+4=43,39+8=47, 52, 52+4=56, 52+8=60, 65, ...
  • FACCH/H: 0, 8, 17, 26+0=26, 26+8=34, 26+17=43, 52+0=52, 52+8=60, 52+17=69, 78+0=78, 78+8=86, 78+17=95

So we definitely need to check for the end of the SACCH measurement period at any frame number, not just those where we by coincidence we actually receive a block whose start framenumber matches that fn104 value from ts 05.08

#5 Updated by laforge 6 months ago

actually, a hard one to tackle.

  • the measurement period ends at a certain fn104 value
  • this fn104 value is not guaranteed to be either the first or the last burst in a block
    • our code only gets executed for th first FN in each block and not for intermediate ones
    • our code currently has no knowledge about the specific fn104 values of all the bursts for a given block it receives

So actually we would need to determine if any of the bursts of the currently-processed block in the PH-DATA.ind crossed the fn104 boundary of the reporting interval, and then compute + store our measurement results until the next SACCH occurs, when we report the computed resultsto the BSC

Even then' it's not entirely correct, as we don't have the rssi/ber per individual burst, but only per block. So our reporting will inevitably always be granular to the block boundaries.

Now we "only" have to figure out how to compute + use a related table :/

#6 Updated by laforge 6 months ago

ok, I see no other way to do this in a half-decent way than to have a look-up table that allows us to determine the frame-number of the last burst in the given block based on the following input:
  • frame number of the first burst in the block as reported by PHY
  • channel type (facch/f, facch/s, sacch/f, sacch/h, tch/f, tch/h, sdcch, ...)
  • timeslot number
  • sub-slot number
    (i.e. basically lchan + fn)

We should use as much as possible pre-computed tables based on the information in Clause 7 of TS 05.02.

Then, once we know the fn of the last burst, we can do the modulo 104 and compare if that last frame number is >= the mod104 for the end of the measurement period. And that's the trigger for computing the measurement result.

#7 Updated by dexter 6 months ago

The problem should now be fixed on osmo-bts-sysmo, the following patches are up for review:

https://gerrit.osmocom.org/3050 Revert "sysmobts: normalize frame number in measurement indication"
https://gerrit.osmocom.org/3051 Revert "osmo-bts-sysmo: Include frame number in MEAS IND"
https://gerrit.osmocom.org/3052 osmo-bts-sysmo: Include frame number in MEAS IND
https://gerrit.osmocom.org/3053 measurement: fix measurment report

I have tested it carefully on osmo-bts-sysmo. Half rate and full rate on all timeslots (except 0 and 1, where the control channels are).

#8 Updated by dexter 6 months ago

I have checked the current state with octphy now. Octphy seems to send the measurement results in the correct SACCH block. I did a full fu check on all timeslots and it works all the time. The question is now how do we resolve this best. I would like to keep translate_tch_meas_rep_fn104() in masurement.c. Maybe we can have some flag somewhere that says which frame number style is used. Maybe we can talk about this tomorrow.

#9 Updated by laforge 6 months ago

What is the "correct SACCH block" here?

As per convention, in osmo-bts every block is reported using the frame number of the first burst. If OCTPHY does it different, I think the correct implementation would be to translate the block number in osmo-bts-octphy to map from FN of last to FN of first block, and then for the common part above l1sap everything is identical.

Might also be an idea to
  • document the requirement for reporting the FN of the first burst in comments
  • ask octasic if they'd want to change in later versions.

#10 Updated by dexter 6 months ago

I have fixed two of the broken patches, that where still in the review process. For osmo-bts-trx I could test and it did look ok to me. For litecell15 I did not test, but since the patch is almost the same for osmo-bts-sysmo, I think its now ok here as well.

https://gerrit.osmocom.org/2813 osmo-bts-trx: fix missing frame number in MEAS IND
https://gerrit.osmocom.org/2861 osmo-bts-litecell15: Fix missing frame number in MEAS IND

#11 Updated by dexter 6 months ago

  • % Done changed from 40 to 70

#12 Updated by dexter 5 months ago

The l1_if.c now translates the frame number. For the upper layers things should look normal again. I have tested it with TCH/F TS2-7 and TCH/H TS2-7.

The patch is up for review:
https://gerrit.osmocom.org/3118 octphy: remap frame number in MEAS_IND

#13 Updated by laforge 5 months ago

this urgent ticket is still in state "new". Why? Meanwhile the patches are merged.

#14 Updated by dexter 5 months ago

  • Status changed from New to Resolved
  • % Done changed from 70 to 100

#15 Updated by laforge 4 months ago

  • Status changed from Resolved to Closed

#16 Updated by ipse 4 months ago

I've just compiled osmo-bts-trx from master and I see this message for pretty much every measurement report on TCH:

2017-08-18_01:19:58.26168 <0004> ../../../../osmo-bts/src/common/measurement.c:199 (bts=0,trx=0,ts=2,ss=0) no space for uplink measurement, num_ul_meas=104

I don't see them on SDDCH though. Any idea what could be wrong?

#17 Updated by dexter 4 months ago

  • Status changed from Closed to In Progress

#18 Updated by dexter 4 months ago

Hello ipse,

I have compiled the current master and tested it on my sysmobts this morning, I do not get the error message. It would be great if you could give me some more information on the problem. Having a trace, your current nitb config and the log output would be very helpful.

best regards.
Philipp

#19 Updated by ipse 4 months ago

I've attached tcpdump traces, osmo-nitb config and osmo-bts log as you requested.

Note though that I'm working with osmo-bts-trx, not with sysmoBTS.

#20 Updated by dexter 4 months ago

Hello ipse,

It looks like the capture file does not contain any abis packets. It would be
good to see if there are any measurement reports are comming through or if it
the measurement has gone completely silent.

Meanwhile I have checked it on my side. I reproduced your transceiver
configuration on the USRP I have here. To me it looks still fine, I was not
able to reproduce the problem. First I thought that probably the two
transceivers are messing it up but this is not the issue. Also your log shows
that the call is active on TS2 on TRX0. I also tried it with the exact ARFCNs
on GSM900, results are the same.

However, I have a suspicion though. Before the "no space..." messages are
showing up I see the line: "Received bad TCH frame ending at fn=1372426 for TCH/F"
You also mentioned some problems with the SDCCH.

In the current implementation. The ending of the measurement period is locked
on some specific frame numbers on the SDCCH. So if the frame that marks the
ending of the measurement period is missing, the buffer where the measurement
vaules are collected overflows

Also have a look at ./common/measurement.c and ./osmo-bts-trx/l1_if.c

Bottom line: It looks just like you missing the trigger frame (SDCCH) that
sets of the handling of the measurement report end
(RSL message, bufer reset etc...)

best regards,
Philipp

#21 Updated by ipse 4 months ago

Sorry, I took a dump on a wrong interface list time. Here is a correct dump with a related osmo-bts log. There are several calls in this dump and in some of them the error appears after a second of operation, while in others it appears almost immediately. Let me know if you need more logs/captures.

I've never seen this error in the version we're using (fairwaves/package branch), so it's a relatively new regression.

#22 Updated by ipse 4 months ago

  • File deleted (meas_bug_clean.pcap)

#23 Updated by laforge 4 months ago

On Fri, Aug 18, 2017 at 03:49:36PM +0000, dexter [REDMINE] wrote:

Bottom line: It looks just like you missing the trigger frame (SDCCH) that
sets of the handling of the measurement report end
(RSL message, bufer reset etc...)

I guess you're referring to the SACCH/F associated with that TCH/F, and not to a SDCCH?

#24 Updated by dexter 4 months ago

Yes of course, ts SACCH, not SDCCH. Sorry. (see also: 3GPP TS 05.02 Clause 7 Table 1 of 9)

#25 Updated by ipse 4 months ago

So, any idea what could be the reason for this issue to appear in the recent master?

#26 Updated by dexter 4 months ago

Hello ipse,

I had a look at the trace, there are indeed some measurement report messages in it, but by fare not as many as expected. When I look at the log I see sometimes "Handing RLL msg UNIT_DATA_IND from LAPDm to MEAS REP". It looks like SACCH timing is shomehow messed up.

My main problem with this is that I can not reproduce the problem on my side. I tried it on the USRP with osmo-trx as well, so my setup should be very similar to yours. Which kind of phones did you use, did you try with different ones?

The code that handles the measurement result computation is fairly new, but we have tested it carefully with all our BTSs on any timeslot/rate.

best regards.
Philipp

#27 Updated by ipse 4 months ago

dexter wrote:

My main problem with this is that I can not reproduce the problem on my side. I tried it on the USRP with osmo-trx as well, so my setup should be very similar to yours. Which kind of phones did you use, did you try with different ones?

Testing with a cheap Lenovo smartphone (K10A40). No, I haven't tried different phones as I was quite sure this should not be phone specific?

I was testing with attenuators on the BTS to emulate edge of the coverage (I was testing power control). So there might have been some burst loss.

#28 Updated by dexter 3 months ago

Hello ipse,

I was testing with attenuators on the BTS to emulate edge of the coverage (I was testing power control). So there might have been some burst loss.

Thats interesting. Can you give it a quick try with lower the attenuation? I wonder if the problem vanishes then.

best regards,
Philipp

#29 Updated by ipse 3 months ago

I'm unfortunately traveling for at least the next two weeks. Not sure when I can test this again :( Do you think you can add attenuation to your setup?

#30 Updated by laforge 2 months ago

any updates here? Would be great if we could establish if the bug is resolved or not.

#31 Updated by ipse 2 months ago

Hi Harald,

Is there any reason to believe it was resolved? I can ask the team to test, but it'll take some time.

#32 Updated by laforge 2 months ago

Hi Alexander,

On Wed, Oct 11, 2017 at 02:58:05PM +0000, ipse [REDMINE] wrote:

Is there any reason to believe it was resolved? I can ask the team to test, but it'll take some time.

after careful re-reading of the ticket, I interpreted the status wrong.
The ball is on our side, dexter should be trying with an attenuator
(possibly a step attenuator so you can manually increase the attenuation
to a point just before it becomes unreliable?) to reproduce your setup
first.

@dexter: this has been "urgent" for a month, please do a quick test, I think
this shouldn't take more than an hour to verify, right?

As a side note:Please note that all our osmo-gsm-tester runs run with >=
80dB path loss between BTS and MS, and I don't think anyone has reported
seeing related errors in the logs. We do test not only with
osmo-bts-sysmo but also with osmo-bts-trx and USRP B2xx.

#33 Updated by dexter about 2 months ago

  • Priority changed from Urgent to Normal

Hi Alexander,

I tried it out once more with very high attenuation in the RX-Path. I turned up the attenuation until the majority of frames were reported as bad by the log. I could not reproduce the problem. I also did a full check without attenuation for each timeslot (2-7) in TCH/H and TCH/F to see if something broke meanwhile. All checks went ok.

Unfortunately I have no Idea what or how else to check. Maybe we should try it out with a K10A40 in our lab to see what happens then.

best regards,
Philipp

#34 Updated by ipse about 2 months ago

Thanks a lot for the update. We'll try to reproduce the issue here, but it'll take some time as our testing stand is disassembled right now.

#35 Updated by laforge 12 days ago

  • Priority changed from Normal to Immediate
  • % Done changed from 100 to 90

I have found one more bug causing those messages to appear, specifically on osmo-bts-trx.

For details see https://gerrit.osmocom.org/#/c/5136/

It would be great if somebody with an osmo-bts-trx setup could test without that patch and with that patch in order to see if the problem really existed all the time and if it is finally resolved with the above-mentioned patch.

#36 Updated by dexter 10 days ago

I have checked it and the problem was immediately reproducible. Since I checked it carefully last time I suspect that this is some regression. With the patch it works fine again on all timeslots (3-7) in both modes (TCH/F, TCH/H).

#37 Updated by laforge 10 days ago

On Mon, Dec 04, 2017 at 12:47:03PM +0000, dexter [REDMINE] wrote:

I have checked it and the problem was immediately reproducible. Since
I checked it carefully last time I suspect that this is some
regression. With the patch it works fine again on all timeslots (3-7)
in both modes (TCH/F, TCH/H).

Please discuss with Pau if we can somehow have e.g. a regular expression
that checks the logs after an osmo-gsm-tester run. Something like a
condition "We expect no line to match this regexp". This should be
reasonably simple to add to osmo-gsm-tester, and we could make sure if
we ever see this message in the log output again, we would be notified
about it.

#38 Updated by pespin 10 days ago

I created this task in osmo-gsm-tester to support this kind of operations: https://osmocom.org/issues/2707

#39 Updated by dexter 10 days ago

  • Priority changed from Immediate to Normal

Since the problem is now solved again I set the priority back to normal. The problem we had now was probably the problem that Alexander spotted also.

Also available in: Atom PDF