Project

General

Profile

Redundancy between GbProxy » History » Revision 2

Revision 1 (daniel, 02/24/2021 05:06 PM) → Revision 2/3 (daniel, 02/25/2021 05:21 PM)

h1. Redundancy between GbProxy 

 I commercial setup with redundant network setups an SGSNs (through SGSN pooling) having OsmoGbProxy be a single point of failure is not desired. However, there is no official specification to provide redundancy on that level because a Gb proxy simply exists. 

 OsmoGbProxy sits in between the BSS and SGSN and terminates the NS connections while transparently routing BSSGP messages back and forth. 

 To provide redundancy towards the SGSN multiple OsmoGbProxy processes need to appear as belonging to the same NS Entity. The SGSN needs to have different NSVC configured pointing to the different GbProxies or the GbProxy advertises (through    IP-SNS) the other GbProxy as another endpoint. This should be entirely transparent to the SGSN. Initially only IP-SNS will be supported on the SGSN side. 

 h2. Implications: 

 This means: 
 * NS needs to be able to announce "foreign" IP endpoints to the SGSN in SNS-CONFIG 
 * NS needs to be able to disable/enable the transmission of SNS-SIZE to the SGSN at runtime 
 * the SNS-CONFIG from the SGSN (listing its IP endpoints) is only received by the "primary" "master" gbproxy who has started the SNS-SIZE/CONFIG procedure 
   * we will likely have to replicate that SGSN-originated SNS-CONFIG to the "sec" slave gbproxy; maybe simply spoof that UDP packet (and suppress sending a response). At least this way we'd not need to invent new parsers, etc? 

 On the BSS-side we also need to share an NSE: 
 * each BSS is one NSE with multiple NS-VC (otherwise no redundancy is possible), no way to split that 
 * a likely implementation would implement a 1:1 mapping of NS-VCs from BSS to SGSN side (thus also a 1:1 mapping between BSS NSE and SGSN NSE) 
 * his also ensures downlink load sharing is performed inside the SGSN and gbproxy doesn't have to re-route user plane traffic 
 * if one NSVC on the BSS side fails, we block the corresponding NS-VC on the SGSN side. This causes the SGSN to send the traffic over the remaining NS-VCs, as expected 

 Performing this 1:1 NSE mapping and 1:1 NS-VC mapping on the SGSN side will introduce the following externally visible changes: 
 

 * not just one NSE per gbproxy, but one NSE per BSS-side NSE 
 * one IP endpoint endpoints on the SGSN-facing gbproxy side per BSS NSVC (one IP endpoint maps to one BSS-side NS-VC) 
 * there will be multiple SGSN-side NS-VC for each of those endpoints, as the SGSN has different IP endpoints itself 
   (typically at least one EP for user traffic and one for signalling signallign traffic) 

 To simplify handling of BVC signalling traffic: 
 * Only advertise signalling-weight > 0 on the primary GbProxy 
 * The GbProxy failover detection needs to be faster than IP-SNS failure detection (to    respond with a SIG-CHANGEWEIGHT >0 for the secondary GbProxy before NS resets all state) 
 * Any new BVC state is replicated to the secondary GbProxy. The primary waits for an ACK before it handles the message further (forward/reply to SGSN/BSS) 
 * primary/secondary decision must be made on a per-BSS (NSE) level (because different BSS could have broken connections to different GbProxies) 
 * Since there is no way to force signalling traffic over an NSVC on the BSS side (with FR or non-SNS UDP) the secondary GbProxy will need to forward any signalling traffic it receives to the primary. 
   A workaround (especially for IP-based BSS) would be to BLOCK the NSVC on the secondary GbProxy during operation and unblock during failover before NS on the other side had a chance to detect that all other NSVC are down. 

 h2. 


 Changes in OsmoGbProxy: 

 
 * osmo-gbproxy and possibly libosmogb will need some support to allow the fine-grained control by the application (gbproxy) to control which NS-VC a given packet will go to 
 * The IP-SNS statemachine needs to be kept in sync on BSSGP-level we need some state replication 
 * BVC per-BVC state for the BVC FSMs need all BVCs needs to be replicated 

 State that needs to be replicated: 
 * gbproxy_bvc, _cell, _sgsn gbproxy_bvc - nse->nsei, sgsn_facing, bvci 
 * The gbproxy_cell - bvci, raid, cid 
 * Can we just get away with ignoring the tlli/imsi cache can be ignored. cache? The paging/suspend/resume will simply worst would probably be missing a RESUME ACK, but it could also happen that our gbproxy goes down after receiving that and before replicating the state or forwarding it. In that case the resume ack would be resent after a timeout. time and the other gbproxy can route that. 
 * Instead of replaying Not sure if it's necessary to replay all bssgp messages we will just transmit (block/unblock/reset and *ack) or if it would be enough to simply set the new state to of the secondary replicating gbproxy (features, locally_block, block_cause, blocked or of unblocked). That way there is no possibility We should be able to ignore the "transient states" like wait_reset_ack - gbproxy would simply repeat the reset procedure. 
   On the other hand this probably opens the door wide for both gbproxies to go out of sync. race conditions. So maybe we simply forward all BSSGP signalling messages (bvci0) and await an ack from our gbproxy peer before we continue 
Add picture from clipboard (Maximum size: 48.8 MB)