Project

General

Profile

Actions

Feature #5917

open

immediately detect SCTP SHUTDOWN in SCCP link / in SCCP user / in active SCCP connections

Added by neels about 1 year ago. Updated about 1 year ago.

Status:
Feedback
Priority:
Normal
Assignee:
Target version:
-
Start date:
02/21/2023
Due date:
% Done:

0%

Spec Reference:

Description

This is a question about SCCP concepts for link loss detection.

I'm trying to trigger an SCCP link loss to examine the cleanup / leak behavior of osmo-hnbgw.
I'd like to find out how an SCTP link loss would propagate to osmo-hnbgw code and signal an SCCP link loss.
(IOW looking for a place that would trigger an FSM event like MY_SCCP_EV_LINK_LOST)

I have two scenarios, one that osmo-stp is killed, the other that the remote entity behind osmo-stp is disconnected.


(1) kill osmo-stp

I tried this:
  • in HNBGW_Tests.ttcn, cranked up T_guard to 10000.0 s
  • in a RAB Assignment test, just after the RAB is established, insert f_sleep(5000.0)
  • run the test to establish a context mapping with SCCP connection in osmo-hnbgw
  • kill osmo-stp

I immediately see lots of low level DLINP, DLSS7 and some DLSCCP logging showing that an SCTP SHUTDOWN event was processed, and that the XUA AS restarts and tries to reconnect. But none of this makes its way up into osmo-hnbgw.

After about 15 minutes(!), I receive sccp_sap_up(N-DISCONNECT.indication) on the SCCP connection.
So, we do have a cleanup trigger, but

  • Is it expected to take this long, given that an SCTP SHUTDOWN is detected in libosmo-sigtran immediately?
  • Do we only get an N-DISCONNECT on individual SCCP conns? My idea was to trigger a LINK_LOST event on the SCCP link to the CN, i.e. a signal that the entire SCCP layer is gone. Does that exist, conceptually?

(log attached)

In contrast, when the RUA side of osmo-hnbgw sees an SCTP SHUTDOWN, osmo-hnbgw immediately registers that all HNB are disconnected, by means of the read cb() passed to osmo_stream_srv_create().


(2) disconnect remote SCCP peer, STP still up
i.e. in ttcn, after the conn is established, call f_ran_adapter_cleanup(g_msc); f_ran_adapter_cleanup(g_sgsn);
and continue to f_sleep(5000.0) keeping the HNB connected.

Here I immediately see an N-PCSTATE.indication containing a DUNA (Destination Unavailable) coming up the SCCP user SAP, one each for the MSC and the SGSN point-code. osmo-hnbgw ignores N-PCSTATE so far, I guess it might be a good idea to implement acting on the DUNA messages. I see now that we can simply read out prim->u.pcstate.

Since the DUNA is so far ignored, the same as above happens. After about 15 minutes, sccp_scoc.c sends up an N-DISCONNECT for the individual SCCP connection and we do clean up, eventually.


So in summary:
  • when a remote entity behind STP goes bust, i can already now trigger my LINK_LOST event when osmo-hnbgw sees a PCSTATE indicating that the remote point-code for CS / PS CN becomes unavailable.
  • when the first SCTP hop goes bust (kill osmo-stp), maybe we can implement some prim going up the SCCP user SAP? Would that also be a DUNA, based on active SCCP conns' remote point-code, or is that a hacky layer violation?

Files

stp_killed.log stp_killed.log 139 KB neels, 02/21/2023 02:50 AM
stp_killed.pcapng stp_killed.pcapng 200 KB neels, 02/21/2023 02:56 AM
cn_disconnected.pcapng cn_disconnected.pcapng 107 KB neels, 02/21/2023 03:24 AM
cn_disconnected.log cn_disconnected.log 64 KB neels, 02/21/2023 03:24 AM
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)