GSUP keepalives / connection loss detection
In the presence of unreliable back-haul mesh between villages, the GSUP
connection can also not be seen as reliable. We would expect to see TCP
stalls due to packet loss, etc.
Have you considered this in your implementation and/or done any testing
based on simulated lossy networks to ensure we properly use either TCP
keepalives or IPA application-level PING/PONG to detect lost connections
and recover from such situations (by closing the old and
Unreliable networks can be easily simulated by Linux built-in 'tc netem'
for providing configurable packet loss / latency / jitter.
I also saw some comments / code related to "if a second connection using
the same IPA ID arrives, we're screwed" (paraphrasing here). I would
expect this not to be uncommon even if every MSC/HLR out there is
configred correctly exactly because e.g .the remote MSC/HLR has already
decided that the TCP/GSUP is dead and starts to reconnect by performing
a local-end release, while the "local" MSC/HLR still thinks the old
connection is alive. If the old connection "wins" (i.e. is preferred)
I see potential trouble here.
Situations like that probably warrant some carefully designed tests to
create exactly those situations.
a) ensuring that keepalive on either TCP or IPA is enabled and works, and
b) creating situations where the same peer establishes a second new connection
while the old one is still not torn down (timeout not expired yet, FIN packets
(Keeping as one issue because these aspects are tightly related...)
I confirm there's currently no keepalive being used or possibility to configure it in GSUP client connections, as we already do in OML/RSL conns.
In OML/RSL IPA conns, we have all those already in place (IPA ping/pong and TCP keepalive), see e1_input_vty.c, vty commands like:
"e1_line <0-255> keepalive <1-300> <1-20> <1-300>"
"e1_line <0-255> ipa-keepalive <1-300> <1-300>"
Those values are applied in libosmo-abis/src/input/ipaccess.c in update_fd_settings() called during updown cb in ipa client and during listen cb in servers.
The ipaccess.c code uses the lower layer ipa_client_conn and ipa_server_conn APIs fom libosmo-abis/src/input/ipa.c.
So in GSUP we are basically missing doing the same that's done in ipaccess.c, that is, during updown cb and listen callback, use something like update_df_settings() to set params configured previously thoguh VTY comands provided by libgsupclient/server.