Bug #4960
closedVTY doesn't show BVCs getting blocked on transport network failure
100%
Description
If I start osmo-gbproxy, run a single TTCN3 test against it (so all BVC get up once), the output looks as follows:
OsmoGbProxy> show gbproxy bvc bss NSEI 2003, SIG-BVCI 0 [UNBLOCKED] NSEI 2003, PTP-BVCI 20031, RAI 262-42-13135-1 [UNBLOCKED] NSEI 2003, PTP-BVCI 20032, RAI 262-42-13300-0 [UNBLOCKED] NSEI 2003, PTP-BVCI 20033, RAI 262-42-13300-0 [UNBLOCKED] NSEI 2001, SIG-BVCI 0 [UNBLOCKED] NSEI 2001, PTP-BVCI 20011, RAI 262-42-13135-0 [UNBLOCKED] NSEI 2002, SIG-BVCI 0 [UNBLOCKED] NSEI 2002, PTP-BVCI 20021, RAI 262-42-13135-1 [UNBLOCKED] NSEI 2002, PTP-BVCI 20022, RAI 262-42-13135-2 [UNBLOCKED] OsmoGbProxy> show gbproxy bvc bss NSEI 2003, SIG-BVCI 0 [UNBLOCKED] NSEI 2003, PTP-BVCI 20031, RAI 262-42-13135-1 [UNBLOCKED] NSEI 2003, PTP-BVCI 20032, RAI 262-42-13300-0 [UNBLOCKED] NSEI 2003, PTP-BVCI 20033, RAI 262-42-13300-0 [UNBLOCKED] NSEI 2001, SIG-BVCI 0 [UNBLOCKED] NSEI 2001, PTP-BVCI 20011, RAI 262-42-13135-0 [UNBLOCKED] NSEI 2002, SIG-BVCI 0 [UNBLOCKED] NSEI 2002, PTP-BVCI 20021, RAI 262-42-13135-1 [UNBLOCKED] NSEI 2002, PTP-BVCI 20022, RAI 262-42-13135-2 [UNBLOCKED]
However, even 10 minutes after the TTCN3 tester terminates (and hence all BSS and SGSN peers are gone), the output is still unchanged.
I guess a normal user would have expected that the BVCs would go into BLOCKED or some kind of recovery state if the underlying NSE disappears / becomes unavailable.
The same applies to
OsmoGbProxy> show gbproxy cell BVCI 20031 RAI 262-42-13135-1: BSS NSEI 2003, SGSN NSEI 101 102 BVCI 20021 RAI 262-42-13135-1: BSS NSEI 2002, SGSN NSEI 101 102 BVCI 20011 RAI 262-42-13135-0: BSS NSEI 2001, SGSN NSEI 101 102 BVCI 20032 RAI 262-42-13300-0: BSS NSEI 2003, SGSN NSEI 101 102 BVCI 20022 RAI 262-42-13135-2: BSS NSEI 2002, SGSN NSEI 101 102 BVCI 20033 RAI 262-42-13300-0: BSS NSEI 2003, SGSN NSEI 101 102
where the NSEI are shown even a long time after those NSEI are gone. Interestingly, when you start another test, they temporarily become
OsmoGbProxy> show gbproxy cell BVCI 20031 RAI 262-42-13135-1: BSS NSEI <none>, SGSN NSEI 101 102 BVCI 20021 RAI 262-42-13135-1: BSS NSEI <none>, SGSN NSEI 101 102 BVCI 20011 RAI 262-42-13135-0: BSS NSEI <none>, SGSN NSEI 101 102 BVCI 20032 RAI 262-42-13300-0: BSS NSEI <none>, SGSN NSEI 101 102 BVCI 20022 RAI 262-42-13135-2: BSS NSEI <none>, SGSN NSEI 101 102 BVCI 20033 RAI 262-42-13300-0: BSS NSEI <none>, SGSN NSEI 101 102
only to go bac kto 2001/2002/2003 a few seconds later. So the state is lost (maybe on BVC RESET?) In that case, maybe if the BVC would go to BLOCKED or some kind of other state, this would solve itself?
This may not seem super critical, but from an operational point of view, we will be wondering about this as soon as we go into deployment/testing, as will our users, AFAICT.
Updated by laforge over 3 years ago
And yes, I'm aware I wrote that code, so I'm not saying it's daniels fault when assigning this to him. I just try to focus at testing at the moment.
Updated by laforge over 3 years ago
"BLOCKED" is spec-wise the wrong state for the signaling BVCs, as by definition it can never be blocked. At gbproxy start-up the SGSN side BVC are in WAIT_RESET_ACK state:
NSEI 101, SIG-BVCI 0 [WAIT_RESET_ACK] NSEI 102, SIG-BVCI 0 [WAIT_RESET_ACK]
and the BSS side BVCs simply don't exist. I would argue that the PTP BVCs could actually be deleted when a BSS disappears. This would mean
- BLOCK each SGSN side PTP BVC for this BVCI
- destroy the BSS side PTP BVC object
- possibly also destroy the cell object?
On the other hand, that would also destroy any related counters etc. - and from the operational point of view it might be interesting to keep them around even if there is an outage. After all, the number of BSS/BVC is not something that changes frequently in a production network.
So as an alternative, we could simply mark the PTP BVC on the BSS side as blocked (we don't even need to start a BLOCKING procedure, as that will try to send packets and wait for ACKs). Plus start the BLOCK procedure on the SGSN side as described above.
Maybe all of the above is a "Holzweg" and we should simply show the NSE ALIVE/DEAD state next to each BVC?
Any comments/ideas?
Updated by daniel almost 3 years ago
I believe this was caused by an issue that has since been resolved where IP-SNS NSEs would never be considered dead.
At least show gbproxy bvc bss will show blocked cells if the BSS-NSE disappears.
The SGSN BSS will also appear blocked (if the SGSN NSE is still connected).
I have also observed show gbproxy cell with BSS NSEI <none> for a cell whose corresponding BSS NSE went down and didn't reconnect. It was still listed as connected (blocked) to the SGSN because we can't "delete" BVCs without resetting BVC 0 (which deletes all BVCs and is not what we want).
- BLOCK each SGSN side PTP BVC for this BVCI
- destroy the BSS side PTP BVC object
We already do those two.
- possibly also destroy the cell object?
This we don't do yet, but I'm not sure if we should. As soon as the BSS is gone and we block the BVC towards the SGSN we could also get rid of the cell. When the cell comes back (maybe on a different NSE or BVCI) we go through the RESET procedure anyway so we don't really need the cell.
Updated by daniel almost 3 years ago
- % Done changed from 0 to 40
Patch in Gerrit to cleanup cells: https://gerrit.osmocom.org/c/osmo-gbproxy/+/24956
Updated by daniel almost 3 years ago
- Status changed from In Progress to Resolved
- % Done changed from 40 to 100
Merged