Project

General

Profile

Bug #3394

CTRL iface bsc<->bsc_nat causes infinite ping-pong ERROR message type loop

Added by pespin 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
07/12/2018
Due date:
% Done:

100%

Spec Reference:

Description

Found today in a running-bsc-nat this line constantly showing up in logs, around 1 per second:

<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 
<0025> control_cmd.c:369 Invalid message ID number: "err" 

Then took a pcap trace, and saw what seems to be an infinite conversation with a specific BSC<->BSCNAT CTRL messages:
BSC->BSCNAT: "ERROR err Failed to parse control message."
BSCNAT-BSC: "ERROR err Failed to parse command."

The message in there doesn't match the errors printed in the log. It seems they craft their own instead. It is fixed in:
https://gerrit.osmocom.org/#/c/openbsc/+/9973 "nat: ctrl: Use ctrl_cmd_parse2 to obtain detailed error"
https://gerrit.osmocom.org/#/c/openbsc/+/9974 "bsc: ctrl: Use ctrl_cmd_parse2 to obtain detailed error"

However the infinite loop is still there.
The cause: when ctrl_cmd_parse2 cannot parse a CTRL message, it returns a ctrl_cmd structure of type ERROR, which is then sent back to the sender. However, if an ERROR message is received, it also fails to parse it (because it uses "error" instead of a valid ID) and then a new ERROR message is returned and sent back to the sender, creating the loop.

What we should do:
1- Fix ctrl_cmd_parse2 to expect "err" token in ERROR messages as ID.
2- Create a new API ctrl_cmd_parse3 with an extra out bool param which specifies if there was an error parsing the messages. This way callers can differentiate between an ERROR message being received or a parsing ERROR. In the first case, they should drop the ERROR message and do something (like printing log), in the second they should send the ERROR message to the sender.

History

#1 Updated by pespin 5 months ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 90

Related patches submitted:

libosmocore:
https://gerrit.osmocom.org/#/c/libosmocore/+/9972 ctrl: Log CMD TYPE on invalid ID number
https://gerrit.osmocom.org/#/c/libosmocore/+/9977 ctrl: Fix parsing of ERROR recvd msgs with id=err
https://gerrit.osmocom.org/#/c/libosmocore/+/9978 ctrl: Introduce ctrl_cmd_parse3 API

openbsc:
https://gerrit.osmocom.org/#/c/openbsc/+/9973/ nat: ctrl: Use ctrl_cmd_parse2 to obtain detailed error
https://gerrit.osmocom.org/#/c/openbsc/+/9974/ bsc: ctrl: Use ctrl_cmd_parse2 to obtain detailed error
https://gerrit.osmocom.org/#/c/openbsc/+/9979 nat: ctrl: use strtol instead of atoi as it has explicit error documentation
https://gerrit.osmocom.org/#/c/openbsc/+/9980 nat: ctrl: Avoid sending back received ERROR msgs
https://gerrit.osmocom.org/#/c/openbsc/+/9980 bsc: ctrl: Avoid sending back received ERROR msgs

osmo-bsc:
https://gerrit.osmocom.org/#/c/osmo-bsc/+/9982/ ctrl: Avoid sending back received ERROR msgs

Since the bug is in osmo-bsc-sccplite and in osmo-bsc-nat, we'll need to create new experimental images as well as the debian OBS packages once all the patches are merged.

#2 Updated by pespin 5 months ago

Added 2 more libosmocore commits with another fix and unit tests:
remote: https://gerrit.osmocom.org/#/c/libosmocore/+/9983 ctrl: ctrl_handle_msg: Avoid sending back received ERROR msgs
remote: https://gerrit.osmocom.org/#/c/libosmocore/+/9984 tests: ctrl: Test received ERROR messages are handled correctly

#3 Updated by pespin 5 months ago

Merged, closing.

#4 Updated by pespin 4 months ago

  • Status changed from Feedback to Resolved
  • % Done changed from 90 to 100

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)