Project

General

Profile

Feature #4635

quirks when initializing SS7 ASP

Added by neels 10 days ago. Updated 9 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Target version:
-
Start date:
06/24/2020
Due date:
% Done:

0%

Spec Reference:

Description

This is about starting up the SCCP links in osmo-bsc and osmo-msc (and possibly others), using the API provided by libosmo-sigtran.

There are problems when:
a) the 'cs7' / 'as' and 'asp' vty config is in the .cfg file
b) and/or when invoking osmo_sccp_simple_client() more than once

This came up because rapidly shutting down and restarting an SCTP connection to osmo-stp recently started triggering a race condition,
where osmo-stp's select() returned the new connection's accept before (or at the same time as) the old connection's close.
This because osmo-bsc recently started re-running osmo_sccp_simple_client() for each MSC in the pool: thrice for ttcn3-bsc-tests.

At first thought it seems that osmo-msc could also run it twice (for A and Iu)
but actually figures out to run the osmo_sccp_simple_client() only once when both A and Iu are on the same ss7 instance.

The first fix was to only re-start the ASP in osmo_sccp_simple_client() IF a new ASP was created.
That worked for osmo-bsc but only because of the peculiarity that the osmo-bsc.cfg does not contain an AS.
At the same time that broke osmo-msc tests, where the osmo-msc.cfg has AS and ASP configured in the cfg file.

I know, this is getting more and more entangled...

(I'll open this issue and then elaborate in follow-up comments.)


Related issues

Related to OsmoSTP - Bug #4625: osmo-stp crashes in ttcn3-bsc-tests on first M3UA messageResolved06/20/2020

History

#1 Updated by neels 10 days ago

Let's go through some start up sequences, chronologically:

osmo-bsc in ttcn3-bsc-tests

osmo-bsc.cfg:

cs7 instance 0
 asp asp-clnt-msc-0 2905 2905 m3ua
  remote-ip 172.18.2.200

  • no AS
  • no 'sctp-role client'

1. First, the cs7 vty gets parsed.

  • The 'asp' vty command creates an ASP with is_server = true.
  • osmo_ss7_vty_go_parent() exits the ASP node and runs osmo_ss7_asp_restart().
  • log says: DLSS7 NOTICE 0: asp-asp-clnt-msc-0: ASP Restart for server not implemented yet!

2. osmo_bsc_sigtran_init() runs

  • 'msc 0' gets set up by osmo_sccp_simple_client_on_ss7_id().
  • it finds no AS, so sets up a new one.
  • asp = osmo_ss7_asp_find_by_proto(as, prot); returns no ASP, because the AS was not in the config file.
  • in the if (!asp) conditional, the configured ASP gets found and added to the AS.
  • also the ASP gets set to is_server = false unconditionally.
  • our new libosmo-sccp patch thus restarts the ASP with osmo_ss7_asp_restart()
    (the patch intends to only start a newly created ASP, but the patch https://gerrit.osmocom.org/c/libosmo-sccp/+/18990/2/src/sccp_user.c actually pivots on whether the AS had an ASP instead)
  • 'msc 1' and 2 get set up
  • now osmo_sccp_simple_client_on_ss7_id() finds an AS with ASP and does not restart the ASP.

Result: the initial "server" restart from parsing the vty had no effect.
The first 'msc 0' invoked the ASP restart once.
Everything is fine.

what if i add an AS in the cfg

osmo-msc in ttcn3-msc-tests has this config:

cs7 instance 0 
 point-code 0.23.1
 asp asp-clnt-OsmoMSC-A 2905 0 m3ua
  remote-ip 172.18.1.200
 as as-clnt-OsmoMSC-A m3ua
  asp asp-clnt-OsmoMSC-A
  routing-key 3 0.23.1

  • there is an AS and an ASP
  • no 'sctp-role client'

1. First, the cs7 vty gets parsed.

  • The 'asp' vty command creates an ASP with is_server = true.
  • osmo_ss7_vty_go_parent() exits the ASP node and runs osmo_ss7_asp_restart().
  • log says: DLSS7 NOTICE 0: asp-asp-clnt-msc-0: ASP Restart for server not implemented yet!

2. osmo-msc's ss7_setup() runs

  • the A link gets set up by osmo_sccp_simple_client_on_ss7_id().
  • it finds an AS, and an ASP as part of that AS
  • the if (!asp) conditional is skipped
  • the ASP remains configured as is_server=true.
  • the Iu link uses the same SCCP instance, does not re-run the simple-client and does not restart the ASP.

conclusions

  • It is not a good idea to restart the ASP while the vty config file is being read.
    The osmo_ss7_vty.c should probably differentiate between a telnet vty shell and a vty config file,
    and it should only restart the ASP if the command comes from a vty shell.
    Each libosmo-sccp user should then make sure to start the ASP once the config is complete,
    particularly after setting asp->is_server = false.
    (Also consider that an ASP is added to an AS only after the 'asp' vty node is exited)
  • Trying to make osmo_sccp_simple_client_on_ss7_id() safe to invoke multiple times for the same cs7 instance is not a good idea:
    • if we restart the ASP only when it was created, then specific variants of incomplete 'cs7' config will omit to start the ASP.
    • if we restart the ASP every time the simple client setup is invoked, then we rapidly shut down and reopen the same SCTP link.
      With the osmo-stp fix in place that seems not so harmful anymore, but still is Not Good (tm).

solutions

How to fix osmo-bsc's multi-MSC startup?
  • With the osmo-stp fix in place, we could actually make osmo-bsc rapidly close and open the same SCTP link without crashing.
  • A better solution would be to fix osmo-bsc code so that it sets up AS and ASP per cs7 instance exactly once.
    • either by doing the things osmo_sccp_simple_client() does manually / more intelligently in osmo-bsc source directly,
    • or by making sure to invoke the simple client setup exactly once per cs7 instance.
      I know it seems appropriate to not use the "simple client" setup at all, but I think this is actually the simplest to implement.
      We have only one SCCP User (per cs7 instance) in osmo-bsc, so the setup is fairly simple.
How to fix osmo-msc startup?
  • we need to revert the libosmo-sccp patch that modifies the simple-client setup, because it makes the cs7 config fail in complex ways.
How to fix weird "server" startup log error for SCCP/M3UA client programs?
  • We should change osmo_ss7_vty.c to not start up components when the vty is read from a config file.
  • Possibly we should never restart components implicitly in go_parent(), but rather provide an explicit vty command to restart an ASP.

#2 Updated by neels 10 days ago

  • Related to Bug #4625: osmo-stp crashes in ttcn3-bsc-tests on first M3UA message added

#4 Updated by neels 9 days ago

I just now realized another quirk: the "simple client" auto configuration takes what AS or ASP already exist and completes the configuration by adding missing parts.
At least that's what I thought. Now I notice that this only works when the AS / ASP has exactly the default name.

Example: current osmo-bsc.cfg in ttcn3-bsc-test:

cs7 instance 0
 asp asp-clnt-msc-0 2905 2905 m3ua
  remote-ip 172.18.2.200

This uses the name "asp-clnt-msc-0", which matches the default name given by osmo_bsc_sigtran_init().
However, when more MSCs are contacted via that ASP, the name "msc-0" does not make sense.
So in the osmo-bsc sources, I changed the name to "A-0-m3ua" (A-interface 0 on M3UA proto).
That made osmo-bsc unable to contact the STP, because of the resulting auto config:

OsmoBSC# show cs7 config 
cs7 instance 0
 point-code 0.23.3
 asp asp-clnt-msc-0 2905 2905 m3ua
  remote-ip 172.18.2.200
 asp asp-clnt-A-0-m3ua 2905 0 m3ua
  remote-ip 127.0.0.1
  sctp-role client
 as as-clnt-A-0-m3ua m3ua
  asp asp-clnt-A-0-m3ua
  routing-key 0 0.23.3

So instead of picking up the ASP found in the osmo-bsc.cfg, osmo-bsc created another ASP with the name "asp-clnt-A-0-m3ua".

The cause is this code in sccp_user.c:

        asp = osmo_ss7_asp_find_by_proto(as, prot);     <------- (1)
        if (!asp) {
                /* Check if the user has already created an ASP elsewhere under
                 * the default asp name. */
                asp_name = talloc_asprintf(ctx, "asp-clnt-%s", name);
                asp = osmo_ss7_asp_find_by_name(ss7, asp_name);         <---------- (2)
                if (!asp) {
                        LOGP(DLSCCP, LOGL_NOTICE, "%s: Creating ASP instance\n",
                             name);
                        asp =
                            osmo_ss7_asp_find_or_create(ss7, asp_name,
                                                        default_remote_port,
                                                        default_local_port,
                                                        prot);
[...]

(1): At first, we're looking for any ASP matching M3UA on the AS.
(This config had no AS, so the AS was added automatically just above this, hence the AS has no ASP associated.)
No ASP was found, so then (2) we look for an ASP not by protocol, but by name. That seems wrong to me.

To successfully complete a config where the .cfg file has only an ASP, the ASP has to exactly match the name that the program would choose automatically.
(depending on which msc nr appears first in the config, that could be any asp-clnt-msc-N where N is the msc nr.)
Our manuals describe that any ASP is automatically picked up, no matter which name.
IMHO we should not require the name to match, rather look for any ASP
(possibly one that is not bound to an AS yet??)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)