Project

General

Profile

Bug #3178

gbproxy: failed to parse invalid BSSGP-UNITDATA message

Added by pespin 6 months ago. Updated 22 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
04/16/2018
Due date:
% Done:

0%

Spec Reference:

Description

Today while checking log output (debug all) of osmo-gbproxy 8343b4adbbe9afd4294232429696ee9736ec1004 in a testing/production set up:

<000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada
<000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108)
<0012> gprs_llc_parse.c:82 LLC SAPI=9 C   U GEA? IOV-UI=0x000000 FCS=0x62a1a7 CMD=UI DATA
<0012> gprs_gb_parse.c:408 Got LLC message, CRC: 62a1a7 (computed 62a1a7)
<000f> gprs_gb_parse.c:535 LLC: Got TLLI e5225438, BSSGP RAID 274-8-4120-0
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438)
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada)
<000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada
<000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108)
<000f> gprs_gb_parse.c:535 BSSGP: Got BSSGP RAID 274-8-4120-0, BSSGP PTMSI f3687732, IMSI 274080000004803
<000f> gb_proxy_patch.c:184 Patching BSSGP P-TMSIs: Replacing f3687732 -> e155cfea
<000f> gb_proxy.c:1080 NSEI=9108(SGSN) BSSGP PAGING routing by RAI to peer BVCI=9108
<000f> gb_proxy.c:791 NSEI=9108 proxying SGSN->BSS (NS_BVCI=0, NSEI=110)
<0012> gprs_llc_parse.c:82 LLC SAPI=1 C   U GEA? IOV-UI=0x000000 FCS=0x38436c CMD=UI DATA
<0012> gprs_gb_parse.c:408 Got LLC message, CRC: 38436c (computed 38436c)
<000f> gb_proxy.c:575 NSEI=110(BSS) patching: failed to parse invalid BSSGP-UNITDATA message
<000f> gprs_gb_parse.c:535 BSSGP-UNITDATA: Got TLLI 00000000, BSSGP RAID 274-8-4120-0
<000f> gb_proxy.c:579 NSEI=110(BSS) invalid message was: [L2]> 00 00 23 94 01 e1 55 cf ea 00 00 04 08 88 72 f4 80 10 18 00 9c 40 00 80 0e 00 06 01 c0 41 6c 43 38
<0012> gprs_llc_parse.c:82 LLC SAPI=9 C   U GEA? IOV-UI=0x000000 FCS=0xa65d4d CMD=UI DATA
<0012> gprs_gb_parse.c:408 Got LLC message, CRC: a65d4d (computed a65d4d)
<000f> gprs_gb_parse.c:535 LLC: Got TLLI e5225438, BSSGP RAID 274-8-4120-0
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438)
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada)
<000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada
<000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108)
<0012> gprs_llc_parse.c:82 LLC SAPI=9 C   U GEA? IOV-UI=0x000000 FCS=0x2bbfe2 CMD=UI DATA
<0012> gprs_gb_parse.c:408 Got LLC message, CRC: 2bbfe2 (computed 2bbfe2)
<000f> gprs_gb_parse.c:535 LLC: Got TLLI e5225438, BSSGP RAID 274-8-4120-0
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438)
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada)
<000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing e5225438 -> da1d5ada
<000f> gb_proxy.c:767 NSEI=110 proxying BTS->SGSN (NS_BVCI=9108, NSEI=9108)
<0012> gprs_llc_parse.c:82 LLC SAPI=9 R   U GEA? IOV-UI=0x000000 FCS=0x376834 CMD=UI DATA
<0012> gprs_gb_parse.c:408 Got LLC message, CRC: 376834 (computed 376834)
<000f> gprs_gb_parse.c:535 LLC: Got TLLI da1d5ada, IMSI 274085000030459
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = da1d5ada, assigned = 00000000, net_vld = 1, bss_vld = 1}, da1d5ada)
<000f> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = e5225438, assigned = 00000000, net_vld = 1, bss_vld = 1}, e5225438)
<000f> gb_proxy_patch.c:154 Patching TLLIs: Replacing da1d5ada -> e5225438

So it seems there are some messages we are not handling correctly.

History

#1 Updated by laforge 5 months ago

  • Assignee set to stsp

#2 Updated by stsp 4 months ago

#3 Updated by stsp 4 months ago

I have written a small test to reproduce the problem: https://gerrit.osmocom.org/#/c/osmo-sgsn/+/9500

The parsing failure happens when this part of the message is parsed as TLV type TLV_TYPE_TvLV:

ea 00 00 04 08 88 72 f4 80 10 18 00 9c 40 00 80 0e 00 06 01 c0 41 6c 43 38

libosmocore's TLV parser (src/gsm/tlv_parser.c) ends up "return -2" in line 214 at the end of this block of code:

194         case TLV_TYPE_TvLV:
195                 if (*(buf+1) & 0x80) {
196                         /* like TLV, but without highest bit of len */
197                         if (buf + 1 > buf + buf_len)
198                                 return -1;
199                         *o_val = buf+2;
200                         *o_len = *(buf+1) & 0x7f;
201                         len = *o_len + 2;
202                         if (len > buf_len)
203                                 return -2;
204                         break;
205                 }
206                 /* like TL16V, fallthrough */
207         case TLV_TYPE_TL16V:
208                 if (2 > buf_len)
209                         return -1;
210                 *o_val = buf+3;
211                 *o_len = *(buf+1) << 8 | *(buf+2);
212                 len = *o_len + 3;
213                 if (len > buf_len)
214                         return -2;

In gdb, we see:

(gdb) p len
$14 = 2187
(gdb) p buf_len
$15 = 22

#4 Updated by stsp 4 months ago

  • Status changed from New to In Progress

#5 Updated by stsp 4 months ago

Manually decoded message according to:

ETSI TS 148 018 V10.6.0 (2012 07) 96
3GPP TS 48.018 version 10.6.0 Release 10
Table 10.2.2: UL-UNITDATA PDU content

    00    - PDU type UL-UNITDATA (ok)

        11.3.35 Temporary logical link Identity (TLLI)
    00    - TLLI[0]
    23    - TLLI[1]
    94    - TLLI[2]
    01    - TLLI[3]
          TLLI == "00239401" 

    e1    - QOS[0] (bit rate MSB)
    55    - QOS[1] (bit rate LSB)
          bit rate = "57685" (57685*100000 bit/s per PBRG)
    cf    - QOS[2] PBRG = 11 (bit rate is expressed in 100000 bit/s increments),
            C/R 0 (contains LLC ACK/SACK),
            T 0 (contains signalling),
            A 1 (radio if uses MAC/UNITDATA,
            Precedence 111 (reserved value)

    ea    - CELL_ID[0] (TLV IEI: wrong, should be 0x08)
    00    - CELL_ID[1] (length 1)
    00    - CELL_ID[2] (length 2)
        lenth == 0
    04    -- CELL_ID[3]
    08    -- CELL_ID[4]
    88    -- CELL_ID[5]
    72    -- CELL_ID[6]
    f4    -- CELL_ID[7]
    80    -- CELL_ID[8]
    10    -- CELL_DI[9]

    18    -- QOSP[0] OoS Profile IEI
        not allowed in BSSGP Userdata
    00    -- QOSP[1]
    9c    -- QOSP[2]
    40    -- QOSP[3]
    00    -- QOSP[4]

    80    -- IEI for "E-UTRAN Inter RAT Handover Info" 
        not allowed in BSSGP Userdata
    0e    -- length (14 bytes -- only 8 bytes remain)
    00
    06
    01
    c0
    41
    6c
    43
    38

EDIT: typos

#6 Updated by stsp 4 months ago

As best as I can tell this message is invalid.
Apart from tne first byte (and perhaps the TLLI) none of the data makes sense.

It would be nice to get a live pcap of such a message with lower-layer headers so we can tell where it originated. Would it be possible to reproduce this problem and record a pcap file?

#8 Updated by pespin 4 months ago

We can most probably reproduce it by connecting to any machine currently using osmo-gbproxy in production.

Once we know if this message is uplink or downlink, I guess it's worth looking at related code generating this kind of message in osmo-pcu or osmo-sgsn to see if we can find some code malforming the packet there.

#9 Updated by stsp 4 months ago

We already know that the message is uplink, based on where it was logged.

EDIT: To be clearer: The message processing failure was logged in gb_proxy.c:575 which is processing BSSGP uplink messages.

#10 Updated by stsp 4 months ago

Invalid BSSGP messages could slip through Rx in osmo-pcu and libosmocore as well.

See https://gerrit.osmocom.org/#/c/osmo-pcu/+/9728/
and https://gerrit.osmocom.org/#/c/libosmocore/+/9729

#11 Updated by stsp 4 months ago

I suggest we run another test with the above patches applied and see if more invalid messages show up.

I don't have a test setup ready. Can someone either test this on my behalf or guide me through the process of getting a test setup going, with my patches?

#12 Updated by stsp 4 months ago

We could also return an error notification to the sender if parsing fails:
https://gerrit.osmocom.org/#/c/libosmocore/+/9730/

#13 Updated by stsp 2 months ago

This issue is still waiting for tests. I will need to coordinate with Pau to see how I could run such a test.

#14 Updated by pespin 2 months ago

Running some tests with the patches mentioned above already applied we got the following:

<000f> gprs_ns.c:584 NSEI=18033 Tx NS ALIVE_ACK (NSVCI=18033)
<0011> gprs_llc_parse.c:81 LLC SAPI=1 C   U GEA? IOV-UI=0x000000 FCS=0x63f476 CMD=UI DATA
<0011> gprs_gb_parse.c:408 Got LLC message, CRC: 63f476 (computed 63f476)
<0011> gprs_gb_parse.c:384 Unhandled GSM 04.08 message type unknown 0x46 for protocol discriminator SM.
<000e> gprs_gb_parse.c:535 GMM: Got TLLI f820483d, BSSGP RAID 274-08-8276-0
<000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = f820483d, assigned = 00000000, net_vld = 1, bss_vld = 1}, f820483d)
<000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = c5101308, assigned = 00000000, net_vld = 1, bss_vld = 1}, c5101308)
<000e> gb_proxy_patch.c:161 Patching TLLIs: Replacing f820483d -> c5101308
<000e> gb_proxy.c:770 NSEI=110 proxying BTS->SGSN (NS_BVCI=18033, NSEI=18033)
<0011> gprs_llc_parse.c:81 LLC SAPI=1 R   U GEA? IOV-UI=0x000000 FCS=0xc39c48 CMD=UI DATA
<0011> gprs_gb_parse.c:408 Got LLC message, CRC: c39c48 (computed c39c48)
<0011> gprs_gb_parse.c:384 Unhandled GSM 04.08 message type unknown 0x47 for protocol discriminator SM.          <------------------------ SEE THIS LINE
<000e> gprs_gb_parse.c:535 GMM: Got TLLI c5101308, IMSI 274080000008245
<000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = c5101308, assigned = 00000000, net_vld = 1, bss_vld = 1}, c5101308)
<000e> gb_proxy_tlli.c:317 gbproxy_validate_tlli({current = f820483d, assigned = 00000000, net_vld = 1, bss_vld = 1}, f820483d)
<000e> gb_proxy_patch.c:161 Patching TLLIs: Replacing c5101308 -> f820483d

#15 Updated by stsp 2 months ago

<0011> gprs_gb_parse.c:384 Unhandled GSM 04.08 message type unknown 0x47 for protocol discriminator SM

I see no evidence that this message is related to the original problem described in this issue. I think we should treat it as a separate problem until there is such evidence.
This message appeared with a Sony Ericsson phone which had trouble getting online.
I tried to find the corresponding frame in wireshark for quite some time yesterday but failed to locate it. I didn't keep a copy of captured frames unfortunately. But I believe I know how to reproduce this.
The message did not appear with a Samsung Galaxy S2 I tried later on.

The original error "gb_proxy.c:575 NSEI=110(BSS) patching: failed to parse invalid BSSGP-UNITDATA message" did not appear with either phone I tried.

#16 Updated by stsp 2 months ago

The meaning of message type 0x47 is "DEACT PDP ACK"

From gsm_04_08_gprs.h:

/* Table 10.4a, GPRS Session Management (GSM) */
#define GSM48_MT_GSM_ACT_PDP_REQ        0x41
#define GSM48_MT_GSM_ACT_PDP_ACK        0x42
#define GSM48_MT_GSM_ACT_PDP_REJ        0x43
#define GSM48_MT_GSM_REQ_PDP_ACT        0x44
#define GSM48_MT_GSM_REQ_PDP_ACT_REJ    0x45
#define GSM48_MT_GSM_DEACT_PDP_REQ      0x46
#define GSM48_MT_GSM_DEACT_PDP_ACK      0x47

The function gprs_gb_parse_dtap() doesn't handle this message.

#17 Updated by stsp 2 months ago

It would be good to have more information about the conditions in which the original problem occurred. Right now, I see two ways forward:

1) Keep trying various phones in the lab and hope the issue occurs eventually.

2) Wait until someone runs into the problem again in whatever circumstances and then updates this ticket with more information.

Frankly, from my point of view, the quality of bug reports we are filing in Osmocom's tracker is relatively low. I am under the impression that, in general, many bug reports lack descriptions of the environmental conditions a problem was found in. This is especially important information when problems are observed with hardware rather than a reproducible test case. A bug report which contains a single error message is not much to go on. In other projects I am involved in I see issues being closed regularly based on lack of such information.

#18 Updated by laforge 2 months ago

On Wed, Aug 15, 2018 at 12:42:41PM +0000, stsp [REDMINE] wrote:

2) Wait until someone runs into the problem again in whatever circumstances and then updates this ticket with more information.

Full ACK to that. Set it to stalled and low priority, or simlpy set it
to feedback and assign to @pespin. It's not worth spending too much time
on something that's not obvious to reproduce. We have no shortage of
other issues where we can use our time more productively than hard to
reproduce issues.

Frankly, from my point of view, the quality of bug reports we are
filing in Osmocom's tracker is relatively low. I am under the
impression that, in general, many bug reports lack descriptions of the
environmental conditions a problem was found in. This is especially
important information when problems are observed with hardware rather
than a reproducible test case.

I agree in this case. pespin, please take that to heart, at least when taking
this current bug report as an example. From your report here we don't
see what kind of phone you were using, what you were doing with the
phone at the time, ...

Even if one files a bug report assigned to oneself, chances are high
that 4 months later you don't remember any of the context that's not in
the ticket.

However, there are also plenty of better bug reports in the tracker in general.
I don't see this as a general problem.

#19 Updated by pespin 2 months ago

laforge wrote:

On Wed, Aug 15, 2018 at 12:42:41PM +0000, stsp [REDMINE] wrote:

2) Wait until someone runs into the problem again in whatever circumstances and then updates this ticket with more information.

Full ACK to that. Set it to stalled and low priority, or simlpy set it
to feedback and assign to @pespin. It's not worth spending too much time
on something that's not obvious to reproduce. We have no shortage of
other issues where we can use our time more productively than hard to
reproduce issues.

Agree. I think we can still look into the error message we got a few days ago which I recorded here and make sure we handle those type of messages, or create a task explaining what's needed. If it's expected that w do nothing with those message, we shouldn't print error messages which distracts people operating the software. stsp you can do that or otherwise re-assign the ticket to me.

I agree in this case. pespin, please take that to heart, at least when taking
this current bug report as an example. From your report here we don't
see what kind of phone you were using, what you were doing with the
phone at the time, ...

Even if one files a bug report assigned to oneself, chances are high
that 4 months later you don't remember any of the context that's not in
the ticket.

I saw that message while busy operating the software doing some other work, so I decided to log it here for later investigation, that's why I didn't provide much information on first place.

#20 Updated by stsp 2 months ago

I have created #3466 about the mysterious 0x47 message.

I want to add that I didn't mean to specifically point fingers at Pau for lack of details in this report. Filing informative bug reports isn't easy, especially when one can't afford sufficient mental focus while doing so. I've certainly filed my own share of bugs lacking vital information as well :)

#21 Updated by stsp 22 days ago

  • Status changed from In Progress to Feedback
  • Assignee changed from stsp to pespin

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)