Project

General

Profile

Actions

Bug #6422

open

latest: BSC_Tests.TC_ctrl_location started fo fail on March 15th

Added by laforge about 1 month ago. Updated 24 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
03/26/2024
Due date:
% Done:

0%

Spec Reference:

Description

This test used to pass and started failing since March 15. It fails consistently ever since. Interestingly, master is not affected.

I don't see any commits to osmo-ttcn3-hacks.git touching the BSC code which were merged on March 14th.

Does anyone have any ideas?


Files

Actions #1

Updated by fixeria about 1 month ago

Attaching a PCAP from build 2167, here is what I see:

  • [frame 391, 42341 -> 4249] Tx "CTRL data: SET 189111396 bts.0.location 1234567,fix3d,0.340000,0.560000,0.780000"
  • [frame 392] GSMTAP logging: osmo-bsc logs the received CTRL command
  • [frame 395, 34985 -> 5000] Rx "CTRL data: TRAP 0 bts.0.location-state 1234567,fix3d,0.340000,0.560000,0.780000,operational,unlocked,on,001,01"
  • [frame 397, 4249 -> 42341] Rx "CTRL data: TRAP 0 bts.0.location-state 1234567,fix3d,0.340000,0.560000,0.780000,operational,unlocked,on,001,01"
  • [frame 399, 4249 -> 42341] Rx "CTRL data: SET_REPLY 189111396 bts.0.location 1234567,fix3d,0.340000,0.560000,0.780000"
  • [frame 404, 5000 -> 34985] Tx "CTRL data: SET 60166139 rf_locked 1"
    • this TCP packed is ACKed, but never answered
    • who is listening at port 34985?
Actions #2

Updated by fixeria about 1 month ago

  • Status changed from In Progress to Feedback
  • Assignee changed from fixeria to pespin

Unfortunately, my knowledge about sccplite is very limited.
pespin git-blame tells me you implemented the testcase, could you please take a look?

Actions #3

Updated by pespin about 1 month ago

  • Assignee changed from pespin to fixeria

tcp/ipa port 5000 is the emulated SCCPLite MSC.

Nowadays and since a while ago, osmo-bsc-sccplite uses libosmo-sccp as an SCCPLite stack iirc.

AFAICT osmo-bsc is not processing the TCP/IPA/CTRL messages recieved from the SCCPLite MSC containing "SET rf_locked 1", I think it's not even seeing it, so it's probably a polling/fd bug somewhere.

Around the failure date I see this libosmo-sccp commit which I probably related:

commit 9257cd896e255403822bee6f87f5487a92fd3c11
Author: Harald Welte <laforge@osmocom.org>
Date:   Mon Mar 4 13:10:10 2024 +0100

    xua + ipa: Add support for I/O in OSMO_IO mode

    This switches osmo_stream_{cli,srv} over to using the OSMO_IO
    mode instead of the classic OSMO_FD mode.  The difference is that
    we no longer read/write directly to a file descriptor, but we pass
    message buffers to/from the library.

    This in turn allows the library to use more efficient I/O mechanisms
    as osmo_io backend, for example the Linux kernel io_uring.

    This re-introduces Change-Id: I7d02037990f4af405839309510dc6c04e36c3369
    which was previously reverted due to regressions caused by a missing
    change in libosmo-netif.

    Depends: libosmo-netif.git I6cf5bad5f618e71c80017960c38009b089dbd6a1
    Depends: libosmocore.git I89eb519b22d21011d61a7855b2364bc3c295df82
    Closes: OS#5752
    Change-Id: Ia1910f3b99d918ec2a34d5304c3f40ba015c25c9

According to gitk in here: "Committer: Harald Welte <> 2024-03-13 22:18:36"

Last known successful run: #2155 (Mar 14, 2024, 5:48 AM)
First known failing run: #2156 (Mar 15, 2024, 5:48 AM)

There are also several libosmocore osmo-io related commits merged on the 14th which are probably causing the regression: 5fcfbe0c699dbe2f9f800ea90452c525988e51ce..9c0004ad0da4af2365be5c6734ba9b8c1c4eec33
There seems to be no relevant change in libosmo-netif during that day.
So probably some regression from laforge / jolly

Actions #4

Updated by fixeria 29 days ago

pespin wrote in #note-3:

tcp/ipa port 5000 is the emulated SCCPLite MSC.

Yes, this is clear. This port number can be git-grep'ed in the repository. But there seem to be more than one CTRL connections.

Nowadays and since a while ago, osmo-bsc-sccplite uses libosmo-sccp as an SCCPLite stack iirc.

So basically SCCPLite is all about using TCP/IPA as the transport protocol. The implementation is provided by libosmo-sccp, ack.

AFAICT osmo-bsc is not processing the TCP/IPA/CTRL messages recieved from the SCCPLite MSC containing "SET rf_locked 1", I think it's not even seeing it, so it's probably a polling/fd bug somewhere.

Yes, this is also my observation. The TCP packet is ACKed, but no logging or whatever is seen in the PCAPs.

Around the failure date I see this libosmo-sccp commit which I probably related:

The important detail here is that the testcase is failing for -latest, but passing for -master! So I don't think those commits are relevant here.
Most of the -latest releases were made months ago, except libosmo-abis v1.5.2, a patch release that was tagged around the time.
I will try downgrading libosmo-abis version to see if this could be related somehow.

Actions #5

Updated by fixeria 27 days ago

fixeria wrote in #note-4:

The important detail here is that the testcase is failing for -latest, but passing for -master! So I don't think those commits are relevant here.

Oh, there is some confusion here. The ticket description states that it's about the '-latest' and "master is not affected". But actually it's exactly about '-master', the '-latest' is passing just fine!

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-sccplite/test_results_analyzer/
https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bsc-test-sccplite-latest/test_results_analyzer/

Actions #6

Updated by pespin 25 days ago

fixeria wrote in #note-4:

pespin wrote in #note-3:

tcp/ipa port 5000 is the emulated SCCPLite MSC.

Yes, this is clear. This port number can be git-grep'ed in the repository. But there seem to be more than one CTRL connections.

The other one is the usual CTRL conn that a tool uses to interact with the program more or less locally, similar to a VTY telnet client.
The test uses that one to change the rf_lock status and see if commands/traps coming from the MSC over the IPA/CTRL multiplex of the SCCPlite connection still keep working as desired.

Around the failure date I see this libosmo-sccp commit which I probably related:

The important detail here is that the testcase is failing for -latest, but passing for -master! So I don't think those commits are relevant here.

These commits are totally relevant, but I think you already figured that out from your later comments :)

It's a bug in osmo_io code which seems to be triggered most probably in the libosmo-sccp ipa stack. It needs to be chased and fixed.

Actions #7

Updated by laforge 24 days ago

I very briefly looked at the code and a bit to my surprise, the sccplite code is indeed using normal osmo_ss7.

The only Thing unusual (compared to normal osmo_ss7 users) is that it uses this unknown_cb to register receiving data for unknown ipa Stream identifiers.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)