Bug #6422
openlatest: BSC_Tests.TC_ctrl_location started fo fail on March 15th
0%
Description
This test used to pass and started failing since March 15. It fails consistently ever since. Interestingly, master is not affected.
I don't see any commits to osmo-ttcn3-hacks.git touching the BSC code which were merged on March 14th.
Does anyone have any ideas?
Files
Updated by fixeria about 1 month ago
- File BSC_Tests.TC_ctrl_location.pcap.gz BSC_Tests.TC_ctrl_location.pcap.gz added
- File BSC_Tests.TC_ctrl_location.merged BSC_Tests.TC_ctrl_location.merged added
- Status changed from New to In Progress
Attaching a PCAP from build 2167, here is what I see:
- [frame 391, 42341 -> 4249] Tx "CTRL data: SET 189111396 bts.0.location 1234567,fix3d,0.340000,0.560000,0.780000"
- [frame 392] GSMTAP logging: osmo-bsc logs the received CTRL command
- [frame 395, 34985 -> 5000] Rx "CTRL data: TRAP 0 bts.0.location-state 1234567,fix3d,0.340000,0.560000,0.780000,operational,unlocked,on,001,01"
- [frame 397, 4249 -> 42341] Rx "CTRL data: TRAP 0 bts.0.location-state 1234567,fix3d,0.340000,0.560000,0.780000,operational,unlocked,on,001,01"
- [frame 399, 4249 -> 42341] Rx "CTRL data: SET_REPLY 189111396 bts.0.location 1234567,fix3d,0.340000,0.560000,0.780000"
- [frame 404, 5000 -> 34985] Tx "CTRL data: SET 60166139 rf_locked 1"
- this TCP packed is ACKed, but never answered
- who is listening at port 34985?
Updated by fixeria about 1 month ago
- Status changed from In Progress to Feedback
- Assignee changed from fixeria to pespin
Unfortunately, my knowledge about sccplite is very limited.
pespin git-blame tells me you implemented the testcase, could you please take a look?
Updated by pespin about 1 month ago
- Assignee changed from pespin to fixeria
tcp/ipa port 5000 is the emulated SCCPLite MSC.
Nowadays and since a while ago, osmo-bsc-sccplite uses libosmo-sccp as an SCCPLite stack iirc.
AFAICT osmo-bsc is not processing the TCP/IPA/CTRL messages recieved from the SCCPLite MSC containing "SET rf_locked 1", I think it's not even seeing it, so it's probably a polling/fd bug somewhere.
Around the failure date I see this libosmo-sccp commit which I probably related:
commit 9257cd896e255403822bee6f87f5487a92fd3c11 Author: Harald Welte <laforge@osmocom.org> Date: Mon Mar 4 13:10:10 2024 +0100 xua + ipa: Add support for I/O in OSMO_IO mode This switches osmo_stream_{cli,srv} over to using the OSMO_IO mode instead of the classic OSMO_FD mode. The difference is that we no longer read/write directly to a file descriptor, but we pass message buffers to/from the library. This in turn allows the library to use more efficient I/O mechanisms as osmo_io backend, for example the Linux kernel io_uring. This re-introduces Change-Id: I7d02037990f4af405839309510dc6c04e36c3369 which was previously reverted due to regressions caused by a missing change in libosmo-netif. Depends: libosmo-netif.git I6cf5bad5f618e71c80017960c38009b089dbd6a1 Depends: libosmocore.git I89eb519b22d21011d61a7855b2364bc3c295df82 Closes: OS#5752 Change-Id: Ia1910f3b99d918ec2a34d5304c3f40ba015c25c9
According to gitk in here: "Committer: Harald Welte <laforge@osmocom.org> 2024-03-13 22:18:36"
Last known successful run: #2155 (Mar 14, 2024, 5:48 AM)
First known failing run: #2156 (Mar 15, 2024, 5:48 AM)
There are also several libosmocore osmo-io related commits merged on the 14th which are probably causing the regression: 5fcfbe0c699dbe2f9f800ea90452c525988e51ce..9c0004ad0da4af2365be5c6734ba9b8c1c4eec33
There seems to be no relevant change in libosmo-netif during that day.
So probably some regression from laforge / jolly
Updated by fixeria 29 days ago
pespin wrote in #note-3:
tcp/ipa port 5000 is the emulated SCCPLite MSC.
Yes, this is clear. This port number can be git-grep'ed in the repository. But there seem to be more than one CTRL connections.
Nowadays and since a while ago, osmo-bsc-sccplite uses libosmo-sccp as an SCCPLite stack iirc.
So basically SCCPLite is all about using TCP/IPA as the transport protocol. The implementation is provided by libosmo-sccp, ack.
AFAICT osmo-bsc is not processing the TCP/IPA/CTRL messages recieved from the SCCPLite MSC containing "SET rf_locked 1", I think it's not even seeing it, so it's probably a polling/fd bug somewhere.
Yes, this is also my observation. The TCP packet is ACKed, but no logging or whatever is seen in the PCAPs.
Around the failure date I see this libosmo-sccp commit which I probably related:
The important detail here is that the testcase is failing for -latest
, but passing for -master
! So I don't think those commits are relevant here.
Most of the -latest
releases were made months ago, except libosmo-abis v1.5.2, a patch release that was tagged around the time.
I will try downgrading libosmo-abis version to see if this could be related somehow.
Updated by fixeria 27 days ago
fixeria wrote in #note-4:
The important detail here is that the testcase is failing for
-latest
, but passing for-master
! So I don't think those commits are relevant here.
Oh, there is some confusion here. The ticket description states that it's about the '-latest' and "master is not affected". But actually it's exactly about '-master', the '-latest' is passing just fine!
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-sccplite/test_results_analyzer/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-sccplite-latest/test_results_analyzer/
Updated by pespin 25 days ago
fixeria wrote in #note-4:
pespin wrote in #note-3:
tcp/ipa port 5000 is the emulated SCCPLite MSC.
Yes, this is clear. This port number can be git-grep'ed in the repository. But there seem to be more than one CTRL connections.
The other one is the usual CTRL conn that a tool uses to interact with the program more or less locally, similar to a VTY telnet client.
The test uses that one to change the rf_lock status and see if commands/traps coming from the MSC over the IPA/CTRL multiplex of the SCCPlite connection still keep working as desired.
Around the failure date I see this libosmo-sccp commit which I probably related:
The important detail here is that the testcase is failing for
-latest
, but passing for-master
! So I don't think those commits are relevant here.
These commits are totally relevant, but I think you already figured that out from your later comments :)
It's a bug in osmo_io code which seems to be triggered most probably in the libosmo-sccp ipa stack. It needs to be chased and fixed.