Project

General

Profile

Bug #3560

nanoBTS multiTRX tests in osmo-gsm-tester Prod setup failing

Added by laforge 3 months ago. Updated about 1 month ago.

Status:
In Progress
Priority:
High
Assignee:
Target version:
-
Start date:
09/17/2018
Due date:
% Done:

0%

Spec Reference:

Description

A sysmocom customer has reported that operationg with two TRX used to work some weeks/months ago, but is failing with current master of osmo-trx/bts/bsc.

It's not clear in which component exactly the bug was introduced, but my guess would be probably the many BSC FSM related changes that were introduced.

This may also be related to the issue discovered in context with osmo-gsm-tester, where the 2nd TRX of a two-trx nanoBTS setup wasn't operating as expected

trial-157-run.tgz trial-157-run.tgz 239 KB pespin, 10/03/2018 03:17 PM
trial-161-run.tgz trial-161-run.tgz 291 KB pespin, 10/04/2018 09:38 AM
trial-165-run.tgz trial-165-run.tgz 202 KB pespin, 10/04/2018 02:40 PM

Related issues

Related to OsmoGSMTester - Feature #2760: osmo-gsm-tester: Add support for several (osmo-)trx to OsmoBtsTrxIn Progress2017-12-15

Related to OsmoGSMTester - Bug #2761: osmo-gsm-tester: add test case: Test 2nd trx is correctly usedResolved2017-12-15

Related to Cellular Network Infrastructure - Feature #3628: document nanoBTS calibration procedure using ipaccess-config toolNew2018-10-04

Is duplicate of OsmoBSC - Bug #3475: neels/inter_bsc_ho branch for osmo-bsc 2TRX configured but OSMO-BSC only uses the first TRX configurationResolved2018-08-20

History

#1 Updated by laforge 3 months ago

  • Related to Feature #2760: osmo-gsm-tester: Add support for several (osmo-)trx to OsmoBtsTrx added

#2 Updated by laforge 3 months ago

  • Related to Bug #2761: osmo-gsm-tester: add test case: Test 2nd trx is correctly used added

#3 Updated by laforge 3 months ago

#4 Updated by pespin 3 months ago

#5 Updated by pespin 3 months ago

I recently patches to enable testing multiTRX support in osmo-gsm-tester with nanoBTS with osmo-bsc.git.

I think only the first TRX is checked for channels, despite configuring the BTS with 2 TRX and using "channel allocator descending".

2 RSL links seems to be set up fine, one on each TRX IP addr:

20180823165020902 DLINP <0015> input/ipa.c:265 accept()ed new link from 10.42.42.121 to port 3002
20180823165020906 DLINP <0015> bts_ipaccess_nanobts.c:481 Identified BTS 10/0/0
20180823165037242 DNM <0004> abis_nm.c:2653 OC=BASEBAND-TRANSCEIVER(04) INST=(00,00,ff): IPACCESS(0xe1): RSL CONNECT ACK IP=10.42.42.7 PORT=3003 STREAM=0x00
20180823165037730 DLINP <0015> input/ipa.c:265 accept()ed new link from 10.42.42.121 to port 3003
20180823165037734 DLINP <0015> bts_ipaccess_nanobts.c:481 Identified BTS 10/0/0
...
BTS 0: Failure Event Report: Type=communication failure, Severity=failure ceased, Probable cause=Manufacturer specific values: unknown 0x306, Additional Text=Network Failure.
20180823165224354 DNM <0004> abis_nm.c:218 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): STATE CHG: OP_STATE=Disabled AVAIL=Not installed(07) ADM=Locked
...
20180823165240748 DNM <0004> abis_nm.c:218 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): STATE CHG: OP_STATE=Disabled AVAIL=Dependency(05) ADM=Unlocked
20180823165240924 DNM <0004> abis_nm.c:702 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): bts=0 trx=0 Opstart ACK
20180823165240944 DNM <0004> abis_nm.c:2653 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): IPACCESS(0xe1): RSL CONNECT ACK IP=10.42.42.7 PORT=3003 STREAM=0x00
20180823165240962 DNM <0004> abis_nm.c:837 OC=RADIO-CARRIER(02) INST=(00,01,ff): Set Radio Carrier Attributes ACK
20180823165241172 DNM <0004> abis_nm.c:1992 OC=RADIO-CARRIER(02) INST=(00,01,ff): Sending OPSTART
20180823165241445 DNM <0004> abis_nm.c:702 OC=RADIO-CARRIER(02) INST=(00,01,ff): bts=0 trx=0 Opstart ACK
20180823165241576 DLINP <0015> input/ipa.c:265 accept()ed new link from 10.42.42.122 to port 3003
20180823165241579 DLINP <0015> bts_ipaccess_nanobts.c:481 Identified BTS 10/0/1
20180823165241579 DRSL <0003> osmo_bsc_main.c:282 bootstrapping RSL for BTS/TRX (0/1) on ARFCN 52 using MCC-MNC 901-70 LAC=58 CID=58 BSIC=63

I still need to check what's going on with following message:

BTS 0: Failure Event Report: Type=communication failure, Severity=failure ceased, Probable cause=Manufacturer specific values: unknown 0x306, Additional Text=Network Failure.

It may be just the message I think we use to send the BTS version.

And then it seems only channels from 1st TRX are available:

20180823165336688 DMSC <0007> osmo_bsc_bssap.c:726 Found matching audio type: half rate SPEECH_AMR for channel_type = { ch_indctr=0x1, ch_rate_type=0xa, perm_s
pch=[ 42 21 11 01 25 05 ] }
20180823165336688 DMSC <0007> osmo_bsc_bssap.c:757 SUBSCR_CONN(conn4)[0x612000006820]{ACTIVE}: Received Event ASSIGNMENT_START
20180823165336689 DMSC <0007> bsc_subscr_conn_fsm.c:342 SUBSCR_CONN(conn4)[0x612000006820]{ACTIVE}: state_chg to ASSIGNMENT
20180823165336689 DMSC <0007> assignment_fsm.c:304 Assignment: incrementing rate counter: assignment:attempted Assignment attempts.
20180823165336689 DAS <0012> fsm.c:299 assignment(conn4)[0x6120000063a0]{WAIT_LCHAN_ACTIVE}: Allocated
20180823165336689 DAS <0012> fsm.c:329 assignment(conn4)[0x6120000063a0]{WAIT_LCHAN_ACTIVE}: is child of SUBSCR_CONN(conn4)[0x612000006820]
20180823165336689 DRLL <0000> lchan_select.c:159 (bts=0) lchan_select_by_type(TCH_H)
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=7,pchan=PDCH,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=6,pchan=PDCH,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=5,pchan=TCH/F,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=4,pchan=TCH/F,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=3,pchan=TCH/F,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=2,pchan=TCH/F,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=1,pchan=SDCCH8,state=IN_USE) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/H: (bts=0,trx=0,ts=0,pchan=CCCH+SDCCH4,state=UNUSED) is != TCH/H
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/F: (bts=0,trx=0,ts=7,pchan=PDCH,state=UNUSED) is != TCH/F
20180823165336689 DRLL <0000> lchan_select.c:71 looking for lchan TCH/F: (bts=0,trx=0,ts=6,pchan=PDCH,state=UNUSED) is != TCH/F
20180823165336689 DRLL <0000> lchan_select.c:86 looking for lchan TCH/F: (bts=0,trx=0,ts=5,pchan=TCH/F,state=UNUSED) ss=0 is available
20180823165336689 DCHAN <0010> lchan_select.c:253 lchan(0-0-5-TCH_F-0)[0x612000007720]{UNUSED}: (type=TCH_F) Selected

#6 Updated by laforge 3 months ago

#7 Updated by laforge 3 months ago

I would have loved to have a look at a pcap file and share my thoughts, but unfortunately there is none attached to this issue (nor to the sysmocom internal one where this first came up) :(

pespin wrote:

I still need to check what's going on with following message:
BTS 0: Failure Event Report: Type=communication failure, Severity=failure ceased, Probable cause=Manufacturer specific values: unknown 0x306, Additional Text=Network Failure.

Well, the event report indicates "failure ceased". So basically the BTS is informing you that at some previous time, it was unable to connect to the BSC, but that this failure has now been ceased and the communication has been established again. So nothing surprising. OsmoBSC should not interpret such a message or handle it in any other way than to log it in the log file. It shouldn't change the behavior of OsmoBSC.

#8 Updated by laforge 3 months ago

  • Assignee changed from sysmocom to pespin

When I look at this snippet:

20180823165240748 DNM <0004> abis_nm.c:218 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): STATE CHG: OP_STATE=Disabled AVAIL=Dependency(05) ADM=Unlocked
20180823165240924 DNM <0004> abis_nm.c:702 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): bts=0 trx=0 Opstart ACK
20180823165240944 DNM <0004> abis_nm.c:2653 OC=BASEBAND-TRANSCEIVER(04) INST=(00,01,ff): IPACCESS(0xe1): RSL CONNECT ACK IP=10.42.42.7 PORT=3003 STREAM=0x00
20180823165240962 DNM <0004> abis_nm.c:837 OC=RADIO-CARRIER(02) INST=(00,01,ff): Set Radio Carrier Attributes ACK
20180823165241172 DNM <0004> abis_nm.c:1992 OC=RADIO-CARRIER(02) INST=(00,01,ff): Sending OPSTART
20180823165241445 DNM <0004> abis_nm.c:702 OC=RADIO-CARRIER(02) INST=(00,01,ff): bts=0 trx=0 Opstart ACK
20180823165241576 DLINP <0015> input/ipa.c:265 accept()ed new link from 10.42.42.122 to port 3003
20180823165241579 DLINP <0015> bts_ipaccess_nanobts.c:481 Identified BTS 10/0/1
20180823165241579 DRSL <0003> osmo_bsc_main.c:282 bootstrapping RSL for BTS/TRX (0/1) on ARFCN 52 using MCC-MNC 901-70 LAC=58 CID=58 BSIC=63

What stands out is:

a) I don't see that the individual timeslots are initialized at all. we see the radio carrier is opstart'ed, but its state is still "Dependency", as the individual timeslots haven't gone through the OML state machine dance of setting attributes (mainly the channel combination), starting them individually, ...

b) We see OML initialization for TRX 1, but then a few lines later we see that a new OML link for exactly that transceiver 10/0/1 is established. So either the link is dropped in between (which might hint us to where/when the bug happens), or we have multiple TRXs that identify themselves using the same unit-id to us?

In any case, a pcap file should contain answers to all those questions... Please share!

#9 Updated by laforge 3 months ago

  • Is duplicate of Bug #3475: neels/inter_bsc_ho branch for osmo-bsc 2TRX configured but OSMO-BSC only uses the first TRX configuration added

#10 Updated by pespin 3 months ago

Attaching trial-157-run.tgz containing all recorded information while running test "gprs:nanobts+band-900+mod-bts0-numtrx2+mod-bts0-chanallocdescend" with multiTRX (2) nanobts.

pcap file with Abis is in directory "run.2018-10-03_16-16-09/gprs:nanobts+band-900+mod-bts0-numtrx2+mod-bts0-chanallocdescend/ping.py/osmo-bsc_10.42.42.9/pcap/".
I picked this one instead of voicecall due to test being more simple (only 1 MS being used). It can be seen that it searches in descending order but only checks TRX0 while trying to allocate chan to do location update.

See "run.2018-10-03_16-16-09/gprs:nanobts+band-900+mod-bts0-numtrx2+mod-bts0-chanallocdescend/ping.py/osmo-bsc_10.42.42.9/stderr":

20181003162423278 DRSL <0003> abis_rsl.c:1359 (bts=0) CHAN RQD: reason: Location updating (ra=0x04, neci=0x01, chreq_reason=0x03)
20181003162423278 DRLL <0000> lchan_select.c:159 (bts=0) lchan_select_by_type(SDCCH)
20181003162423278 DRLL <0000> lchan_select.c:71 looking for lchan SDCCH8: (bts=0,trx=0,ts=7,pchan=PDCH,state=UNUSED) is != SDCCH8
20181003162423278 DRLL <0000> lchan_select.c:71 looking for lchan SDCCH8: (bts=0,trx=0,ts=6,pchan=PDCH,state=UNUSED) is != SDCCH8
20181003162423278 DRLL <0000> lchan_select.c:71 looking for lchan SDCCH8: (bts=0,trx=0,ts=5,pchan=TCH/F,state=UNUSED) is != SDCCH8
20181003162423278 DRLL <0000> lchan_select.c:71 looking for lchan SDCCH8: (bts=0,trx=0,ts=4,pchan=TCH/F,state=UNUSED) is != SDCCH8
20181003162423278 DRLL <0000> lchan_select.c:71 looking for lchan SDCCH8: (bts=0,trx=0,ts=3,pchan=TCH/F,state=UNUSED) is != SDCCH8
20181003162423278 DRLL <0000> lchan_select.c:71 looking for lchan SDCCH8: (bts=0,trx=0,ts=2,pchan=TCH/F,state=UNUSED) is != SDCCH8
20181003162423278 DRLL <0000> lchan_select.c:86 looking for lchan SDCCH8: (bts=0,trx=0,ts=1,pchan=SDCCH8,state=UNUSED) ss=0 is available
20181003162423278 DCHAN <0010> lchan_select.c:253 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{UNUSED}: (type=SDCCH) Selected
20181003162423278 DCHAN <0010> abis_rsl.c:1423 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{UNUSED}: (type=SDCCH) MS: Channel Request: reason=LOCATION_UPDATE ra=0x04 ta=0
20181003162423278 DCHAN <0010> lchan_fsm.c:316 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{UNUSED}: Received Event LCHAN_EV_ACTIVATE
20181003162423278 DCHAN <0010> lchan_fsm.c:495 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{UNUSED}: state_chg to WAIT_TS_READY
20181003162423279 DCHAN <0010> lchan_fsm.c:521 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_TS_READY}: (type=SDCCH) Activation requested: FOR_MS_CHANNEL_REQUEST voice=no MGW-ci=none type=SDCCH tch-mode=SIGNALLING
20181003162423279 DTS <0011> lchan_fsm.c:525 timeslot(0-0-1-SDCCH8)[0x612000008f20]{UNUSED}: Received Event TS_EV_LCHAN_REQUESTED
20181003162423279 DTS <0011> timeslot_fsm.c:328 timeslot(0-0-1-SDCCH8)[0x612000008f20]{UNUSED}: state_chg to IN_USE
20181003162423279 DCHAN <0010> timeslot_fsm.c:104 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_TS_READY}: Received Event LCHAN_EV_TS_READY
20181003162423279 DCHAN <0010> lchan_fsm.c:539 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_TS_READY}: state_chg to WAIT_ACTIV_ACK
20181003162423279 DRSL <0003> abis_rsl.c:475 (bts=0,trx=0,ts=1,pchan=SDCCH8,state=IN_USE) Tx RSL Channel Activate with act_type=INITIAL
20181003162423279 DLMI <0017> input/ipaccess.c:343 TX 2: 08 21 01 41 03 00 06 04 00 03 01 00 05 06 64 41 e0 32 72 00 04 00 0d 05 18 00
20181003162423284 DLMI <0017> input/ipaccess.c:243 RX 2: 08 22 01 41 08 93 82
20181003162423284 DCHAN <0010> abis_rsl.c:1126 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_ACTIV_ACK}: (type=SDCCH) Rx CHAN_ACTIV_ACK
20181003162423284 DCHAN <0010> abis_rsl.c:1138 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_ACTIV_ACK}: Received Event LCHAN_EV_RSL_CHAN_ACTIV_ACK
20181003162423284 DCHAN <0010> lchan_fsm.c:663 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_ACTIV_ACK}: (type=SDCCH) Tx RR Immediate Assignment
20181003162423284 DCHAN <0010> lchan_fsm.c:712 lchan(0-0-1-SDCCH8-0)[0x6120000066a0]{WAIT_ACTIV_ACK}: state_chg to WAIT_RLL_RTP_ESTABLISH

#12 Updated by pespin 3 months ago

Looking at the pcap trace (OML+RSL) it seems it takes several rounds of turning on the 2nd TRX, but it finally is turned on and at around frame 895 we can see RF Resource INDication coming from each TRX (trx0=10.42.42.121:35893, trx1=10.42.42.122:31662), and that's a few frames before the channel allocation is processed.

So most probably indeed something is messed up during the several activation retries of the 2nd TRX.

#13 Updated by pespin 3 months ago

Submitted:
https://gerrit.osmocom.org/#/c/osmo-bsc/+/11221 abis_nm_get_ts: Return TS of correct TRX, not always TRX0

Hopefully it will fix it, testing now.

#14 Updated by pespin 3 months ago

  • File trial-158-run.tgz added

After that patch, channel allocation for Location Update looks first for channels in TRX1, so that part is fixed. However test is failing now later when trying to send a USSD code (used "-s ussd:trx-b200+mod-bts0-numtrx2+mod-bts0-chanallocdescend" now). I'll continue looking at this tomorrow.
[totally wrong test run initially, don't look at crossed text].

#15 Updated by pespin 3 months ago

  • File deleted (trial-158-run.tgz)

#16 Updated by pespin 3 months ago

Test "gprs:nanobts+band-900+mod-bts0-numtrx2+mod-bts0-chanallocdescend" passes fine now.
However, "voice:nanobts+band-900+mod-bts0-numtrx2+mod-bts0-chanallocdescend" is still failing. I attach a test run with all the information.

It seems most of the call setup is working fine, but it fails finally because we receive an "Assignment Failure" from nanobts RSL TRX0 in frame 1117, as response from Assignment Command in frame 1103.

#17 Updated by pespin 3 months ago

I submitted following patch for osmo-gsm-tester to add SDCCH8 in TRX1, this way we also test if SDCCH chans are correctly allocated in TRX1:
https://gerrit.osmocom.org/#/c/osmo-gsm-tester/+/11225

I actually had to fix the example configuration for nanoBTS because it used to set it in TS0, but nanoBTS seems to allow it in TS1:
https://gerrit.osmocom.org/#/c/openbsc/+/11234/

After that, I run a ussd test and now it fails before time with similar lower layer issues. It fails now when trying to assign the SDCCH8 in TRX1 to do the location update. "Immediate Assignnment" is sent to the BTS, but the MS/BTS never answers to it and the BSC times out after 3 seconds and sends an "Immediate Assignment Reject" message.

So indeed it looks like some kind of time sync issue between TRX?

Related: Harald created this ticket to document/implement nanoBTS calibration procedure: #3628

#18 Updated by pespin 3 months ago

  • Related to Feature #3628: document nanoBTS calibration procedure using ipaccess-config tool added

#19 Updated by pespin 3 months ago

  • Subject changed from 2nd TRX of 2-TRX osmo-bts-trx setup not working anymore to 2nd TRX of 2-TRX osmo-bsc setup not working anymore

#20 Updated by pespin about 2 months ago

  • Project changed from OsmoBSC to OsmoGSMTester
  • Subject changed from 2nd TRX of 2-TRX osmo-bsc setup not working anymore to nanoBTS multiTRX tests in osmo-gsm-tester Prod setup failing

General issue with multiTRX is fixed now, see last comments in #3475.

However, multiTRX tests using nanoBTS in osmo-gsm-tester are still failing, so I rename this task to look at that specific issue.

#21 Updated by pespin about 1 month ago

  • Status changed from New to Resolved

Closing after more than 1 week of stating issue is fixed.

#22 Updated by pespin about 1 month ago

  • Status changed from Resolved to In Progress

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)