Bug #4681
closed>= 100 BTS_Tests.ttcn failures / regressions since July 23rd
100%
Description
since July 23rd, almost all of our tests are failing with a regression (100 new failures from July 22nd -> 23rd): https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/963/#showFailuresLink
Regressions of this s cale are not acceptable at all, particularly not if they are not resolved for several days in a row.
Files
Updated by laforge over 3 years ago
- Assignee changed from 4368 to pespin
- "BTS_Tests.ttcn:2388 : No MEAS RES received at all"
- "BTS_Tests.ttcn:233 : Timeout waiting for RSL bring up"
- Timeout waiting for { rsp := { verb := "FAKE_TOA", status := ?, params := * } } on port BTS_TRXC
I currently only see patches from pespin merged around this timeframe. Pleaes investigate ASAP and revert changes if the issues cannot be resolved quickly.
Updated by fixeria over 3 years ago
Hi all,
it could potentially be related to [1], but after looking at [2]:
Traceback (most recent call last): File "/tmp/osmocom-bb/src/target/trx_toolkit/fake_trx.py", line 543, in <module> app.run() File "/tmp/osmocom-bb/src/target/trx_toolkit/fake_trx.py", line 461, in run self.burst_fwd.forward_msg(trx, msg) File "/tmp/osmocom-bb/src/target/trx_toolkit/burst_fwd.py", line 70, in forward_msg trx.handle_data_msg(src_trx, rx_msg, tx_msg) File "/tmp/osmocom-bb/src/target/trx_toolkit/fake_trx.py", line 255, in handle_data_msg Transceiver.handle_data_msg(self, msg) File "/tmp/osmocom-bb/src/target/trx_toolkit/transceiver.py", line 281, in handle_data_msg self.data_if.send_msg(msg, legacy = True) File "/tmp/osmocom-bb/src/target/trx_toolkit/data_if.py", line 109, in send_msg msg.validate() File "/tmp/osmocom-bb/src/target/trx_toolkit/data_msg.py", line 597, in validate raise ValueError("RSSI %d is out of range" % self.rssi) ValueError: RSSI -122 is out of range
I would not think so. Moreover, I've tested [1] locally before submitting.
ValueError: RSSI -122 is out of range
So fake_trx.py crashes due to an out of range RSSI value. RSSI has nothing to do with my recent refactoring changes, I am pretty sure it would have crashed before. I'll prepare a patch to handle such errors properly. Although, it would still be good to investigate where this RSSI value is coming from.
[1] https://git.osmocom.org/osmocom-bb/commit/?id=d4ed09df57b3461470af501e9687ddd80eb78838
[2] https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/966/artifact/logs/fake_trx/
Updated by fixeria over 3 years ago
- Status changed from New to In Progress
- Assignee changed from pespin to fixeria
- % Done changed from 0 to 10
I've managed to reproduce the crash on my machine, and it seems to be related to the power ramping. The container with fake_trx.py dies after BTS_Tests.TC_tx_power_ramp_adm_state_change is finished and BTS_Tests.TC_rsl_bs_pwr_static_ass is started. Ramping is a relatively new feature, and there were some new changes merged to osmo-bts recently, so I assume that's why we did not hit this problem before.
Updated by fixeria over 3 years ago
- Status changed from In Progress to Feedback
- % Done changed from 10 to 80
The crash should be fixed now, waiting for code review:
https://gerrit.osmocom.org/c/osmocom-bb/+/19400 trx_toolkit/data_if.py: do not validate TRXD message twice
https://gerrit.osmocom.org/c/osmocom-bb/+/19401 trx_toolkit/data_if.py: fix: handle encoding exceptions
Updated by fixeria over 3 years ago
- % Done changed from 80 to 90
The crash should be fixed now, waiting for code review: [...]
Tested the fix on my machine (in Docker), fake_trx.py survives during power ramping now (yay!).
I've just merged it to the upstream, let's wait for a new build on Jenkins (next morning).
Updated by fixeria over 3 years ago
- File junit-xml-19.log junit-xml-19.log added
Updated by fixeria over 3 years ago
- Status changed from Feedback to Resolved
- % Done changed from 90 to 100