Project

General

Profile

Bug #4312

GSUP keepalives / connection loss detection

Added by neels 11 months ago. Updated 10 days ago.

Status:
New
Priority:
High
Assignee:
-
Target version:
-
Start date:
12/06/2019
Due date:
% Done:

0%

Spec Reference:

Description

In the presence of unreliable back-haul mesh between villages, the GSUP
connection can also not be seen as reliable. We would expect to see TCP
stalls due to packet loss, etc.

Have you considered this in your implementation and/or done any testing
based on simulated lossy networks to ensure we properly use either TCP
keepalives or IPA application-level PING/PONG to detect lost connections
and recover from such situations (by closing the old and
re-establishing)?

Unreliable networks can be easily simulated by Linux built-in 'tc netem'
for providing configurable packet loss / latency / jitter.

I also saw some comments / code related to "if a second connection using
the same IPA ID arrives, we're screwed" (paraphrasing here). I would
expect this not to be uncommon even if every MSC/HLR out there is
configred correctly exactly because e.g .the remote MSC/HLR has already
decided that the TCP/GSUP is dead and starts to reconnect by performing
a local-end release, while the "local" MSC/HLR still thinks the old
connection is alive. If the old connection "wins" (i.e. is preferred)
I see potential trouble here.

Situations like that probably warrant some carefully designed tests to
create exactly those situations.

Goals:
a) ensuring that keepalive on either TCP or IPA is enabled and works, and
b) creating situations where the same peer establishes a second new connection
while the old one is still not torn down (timeout not expired yet, FIN packets
lost, ...)

(Keeping as one issue because these aspects are tightly related...)

History

#1 Updated by pespin 10 days ago

I confirm there's currently no keepalive being used or possibility to configure it in GSUP client connections, as we already do in OML/RSL conns.

In OML/RSL IPA conns, we have all those already in place (IPA ping/pong and TCP keepalive), see e1_input_vty.c, vty commands like:
"e1_line <0-255> keepalive <1-300> <1-20> <1-300>"
"e1_line <0-255> ipa-keepalive <1-300> <1-300>"

Those values are applied in libosmo-abis/src/input/ipaccess.c in update_fd_settings() called during updown cb in ipa client and during listen cb in servers.

The ipaccess.c code uses the lower layer ipa_client_conn and ipa_server_conn APIs fom libosmo-abis/src/input/ipa.c.

So in GSUP we are basically missing doing the same that's done in ipaccess.c, that is, during updown cb and listen callback, use something like update_df_settings() to set params configured previously thoguh VTY comands provided by libgsupclient/server.

#2 Updated by pespin 10 days ago

See https://gerrit.osmocom.org/c/osmo-hlr/+/20577 fore reference on where to add the bits to be applied on the sockets (this commit only sets TCP_NODELAY on them).

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)