Project

General

Profile

Actions

Feature #2623

open

SCCP/M3UA: detect restart of osmo-msc and osmo-sgsn

Added by neels over 6 years ago. Updated about 3 years ago.

Status:
New
Priority:
High
Assignee:
Target version:
-
Start date:
11/07/2017
Due date:
% Done:

50%

Spec Reference:

Description

Connecting osmo-bsc and osmo-hnbgw to the MSC and SGSN via an OsmoSTP instance, it is currently not possible to detect that the MSC or SGSN has restarted.

Scenario: using a sysmoBTS as a NITB, change MSC config, restart MSC -- now osmo-bsc happily continues to run and does not even notice that it is an entirely new MSC instance running in the core net now.

In the old days, the SCCPlite link would go down, but since now OsmoSTP is in-between and has no concept of who depends on who, no-one is notifying BSC or HNBGW that MSC or SGSN have gone down. Find out how this is intended to be solved if at all, and devise a way how osmo-bsc will restart and/or reconnect to a new MSC instance, and so forth.


Related issues

Related to OsmoSGSN - Bug #3403: osmo-sgsn doesn not connect properly with via SCCP when restartedNewlynxis07/17/2018

Actions
Related to Cellular Network Infrastructure - Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnectedResolvedneels08/11/2020

Actions
Actions #1

Updated by laforge over 6 years ago

  • Assignee set to laforge

neels wrote:

Connecting osmo-bsc and osmo-hnbgw to the MSC and SGSN via an OsmoSTP instance, it is currently not possible to detect that the MSC or SGSN has restarted.

The specified way to treat this is the A interface RESET procedure (and I'm sure Iu has the same?). So the MSC should perform a RESET procedure towards the BSC after it has started new, to erase all state in the BSC.

What's problematic here is that with our "dynamically accept any BSC from any point code" approach, the re-started MSC has no clue about where BSCs might be. One possible (but ugly) approach would be to simply flood this RESET to an entire range of point codes that's configurable at the MSC.

Scenario: using a sysmoBTS as a NITB, change MSC config, restart MSC -- now osmo-bsc happily continues to run and does not even notice that it is an entirely new MSC instance running in the core net now.

I presume you're hinting that the "MSC config change" included a change of the MSC's point code?

One could implement the classic SCCP messsages / primitives for infomring the BSC that the MSC is no longer reachable at the old point code. On the MTP-level, this is a MTP-STATUS.ind from the MTP up into the SCCP stack. The SCCP stack then would use N-PCSTATE.ind (Q.711 6.3.2.3.3)

The BSC would then receive a N-PCSTATE.ind and thus know the MSC is (at least temporarily) gone. However, an intermittent failure of the intermediate signaling network would look exactly the same, so there would probably need to be some kind of timeout, i.e. if the MSC is not again reachable shortly after it is gone, we behave as if we received an implicit RESET.

Another way to move forward is for the MSC to keep local state as to which BSCs were connected, so that after a crash it can send RESET to all of those point codes.

In the old days, the SCCPlite link would go down, but since now OsmoSTP is in-between and has no concept of who depends on who, no-one is notifying BSC or HNBGW that MSC or SGSN have gone down. Find out how this is intended to be solved if at all, and devise a way how osmo-bsc will restart and/or reconnect to a new MSC instance, and so forth.

Well, with an entire (routed!) signaling network between BSC and MSC, the status of the signaling link (M3UA connection) has nothing to do anymore with whether or not the MSC is reachable. Let's imagine one or multiuple STPs in between: Any of them can go down, or any of the links can temporarily fail and recover without the BSC or MSC ever being down or losing their state.

So I guess the best we can do is for the BSC to detect "MSC unavailability" by timeout on any of its SCCP connections or connectionless procedures. If and when the MSC detects any message from an (unknown) BSC, the MSC will generate a RESET procedure and remove all state. Isn't this what's already happening now?

Actions #2

Updated by neels over 6 years ago

laforge wrote:

The specified way to treat this is the A interface RESET procedure (and I'm sure Iu has the same?). So the MSC should perform a RESET procedure towards the BSC after it has started new, to erase all state in the BSC.

What's problematic here is that with our "dynamically accept any BSC from any point code" approach, the re-started MSC has no clue about where BSCs might be. One possible (but ugly) approach would be to simply flood this RESET to an entire range of point codes that's configurable at the MSC.

Scenario: using a sysmoBTS as a NITB, change MSC config, restart MSC -- now osmo-bsc happily continues to run and does not even notice that it is an entirely new MSC instance running in the core net now.

I presume you're hinting that the "MSC config change" included a change of the MSC's point code?

Not really. I mean if the MSC changes configuration items that affect the BSC (though nothing comes to mind); actually I think I also meant that subscribers are still considered attached though the restarted MSC does not. The point is, usually we restart programs when their "server" restarts, like when the BSC goes down, we restart the BTS and hence are sure they are in sync. If the MSC goes down, the BSC currently doesn't ever get notified, because of the above mentioned: we only do the BSSAP Reset dance when the BSC re-attaches.

One could implement the classic SCCP messsages / primitives for infomring the BSC that the MSC is no longer reachable at the old point code. On the MTP-level, this is a MTP-STATUS.ind from the MTP up into the SCCP stack. The SCCP stack then would use N-PCSTATE.ind (Q.711 6.3.2.3.3)

The BSC would then receive a N-PCSTATE.ind and thus know the MSC is (at least temporarily) gone. However, an intermittent failure of the intermediate signaling network would look exactly the same, so there would probably need to be some kind of timeout, i.e. if the MSC is not again reachable shortly after it is gone, we behave as if we received an implicit RESET.

who would trigger that, the OsmoSTP?

Another way to move forward is for the MSC to keep local state as to which BSCs were connected, so that after a crash it can send RESET to all of those point codes.

i.e. keep persistent state. That would solve it, but we would need persistent state ;)

So I guess the best we can do is for the BSC to detect "MSC unavailability" by timeout on any of its SCCP connections or connectionless procedures. If and when the MSC detects any message from an (unknown) BSC, the MSC will generate a RESET procedure and remove all state. Isn't this what's already happening now?

hmm, need to check

Actions #3

Updated by laforge over 6 years ago

Hi Neels,

On Mon, Dec 04, 2017 at 12:11:42PM +0000, neels [REDMINE] wrote:

actually I think I also meant that subscribers are still considered
attached though the restarted MSC does not.

This is what the RESET procedure is for.

The point is, usually we restart programs when their "server"
restarts, like when the BSC goes down, we restart the BTS and hence
are sure they are in sync.

BTS+BSC is different: They cannot function without each other. A BTS
cannot even allocate a radio channel without a BSC. So there's very
tight integration.

Between BSC and MSC it's a bit different. At least with modern features
like MOCN, you can actually have a BSC talking to multiple MSCs (even of
different operators) and there is not such a strict inter-dependency.

If the MSC goes down, the BSC currently doesn't ever get notified,
because of the above mentioned: we only do the BSSAP Reset dance when
the BSC re-attaches.

The BSC should get notified as soon as it sends the first packet to the
MSC, which triggers a RESET procedure from the MSC as it doesn't know
any state about.

This should be rather quick. The only situation in which this takes a
long time is if there's absolutely no activity from any MS.

One could implement the classic SCCP messsages / primitives for
infomring the BSC that the MSC is no longer reachable at the old
point code. On the MTP-level, this is a MTP-STATUS.ind from the MTP
up into the SCCP stack. The SCCP stack then would use
N-PCSTATE.ind (Q.711 6.3.2.3.3)

The BSC would then receive a N-PCSTATE.ind and thus know the MSC is
(at least temporarily) gone. However, an intermittent failure of
the intermediate signaling network would look exactly the same, so
there would probably need to be some kind of timeout, i.e. if the
MSC is not again reachable shortly after it is gone, we behave as if
we received an implicit RESET.

who would trigger that, the OsmoSTP?

Yes, the STP (or also the local libosmo-sigtran on the client side)
would generate such messages whenever point codes become available or
unavailable.

Another way to move forward is for the MSC to keep local state as to which BSCs were connected, so that after a crash it can send RESET to all of those point codes.

i.e. keep persistent state. That would solve it, but we would need persistent state ;)

well, this is why normally one configures the point code of each BSC
in the MSC. At that point the MSC can send a RESET after start to each
of them :)

So I guess the best we can do is for the BSC to detect "MSC
unavailability" by timeout on any of its SCCP connections or
connectionless procedures. If and when the MSC detects any message
from an (unknown) BSC, the MSC will generate a RESET procedure and
remove all state. Isn't this what's already happening now?

hmm, need to check

yes, if that's not the case we're definitely broken.

Actions #4

Updated by laforge over 5 years ago

  • Related to Bug #3403: osmo-sgsn doesn not connect properly with via SCCP when restarted added
Actions #5

Updated by laforge over 4 years ago

  • Priority changed from High to Urgent
Actions #6

Updated by laforge over 4 years ago

  • Assignee changed from laforge to neels
  • Priority changed from Urgent to High

ticket should have been moved back to neels after my response 5 months ago

Actions #7

Updated by neels over 3 years ago

  • Related to Feature #4701: implement OsmoSTP notification of peers disconnecting, e.g. for OsmoBSC to detect that a specific MSC in the pool is disconnected added
Actions #8

Updated by laforge about 3 years ago

  • % Done changed from 0 to 50

I've implemented notification of SCCP users via the SCCP-User SAP in https://gerrit.osmocom.org/c/libosmo-sccp/+/22778

This means that every time a [remote] point code becomes available or unavailable, the SCCP user application (MSC, BSC, SGSN, SMLC, ...) should now receive a N-PCSTATE.ind stating the "affected point code". and either
sp_status OSMO_SCCP_SP_S_INACCESSIBLE or OSMO_SCCP_SP_S_ACCESSIBLE.

This indication should then be used by the applications to trigger whatever internal logic that needs to happen if a remote point-code disappears or re-appears. I'll leave that part to neels. I'll just implement some tests to ensure we get those notifications as expected now.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)