Bug #6415
openosmo_panic after truncated packet
0%
Description
I'm running icE1usb connected to a Raspberry CM3 module.
A few times a day osmo-e1d crashes with the following output:
Fri Mar 22 17:16:59 2024 DLINP octoi_clnt_fsm.c:242 OCTOI_CLIENT(N11MD)[0x55a37878e0]{ACCEPTED}: Rx OCTOI ECHO_RESP (seq=690, rtt=777) Fri Mar 22 17:17:02 2024 DE1D usb.c:150 (I0:L0) IN EP 82 ISO packet truncated: len-4 = 173 Assert failed size % 32 == 0 mux_demux.c:450 backtrace() returned 19 addresses /usr/local/lib/libosmocore.so.21(osmo_generate_backtrace+0x18) [0x7f83880f60] /usr/local/lib/libosmocore.so.21(+0x30aa0) [0x7f838a0aa0] /usr/local/lib/libosmocore.so.21(osmo_panic+0xd4) [0x7f838a0b78] osmo-e1d(+0x6e8c) [0x557f4a6e8c] osmo-e1d(+0x787c) [0x557f4a787c] /lib/aarch64-linux-gnu/libusb-1.0.so.0(+0xb1b4) [0x7f8381b1b4] /lib/aarch64-linux-gnu/libusb-1.0.so.0(+0x116ec) [0x7f838216ec] /lib/aarch64-linux-gnu/libusb-1.0.so.0(+0x1271c) [0x7f8382271c] /lib/aarch64-linux-gnu/libusb-1.0.so.0(+0xabc8) [0x7f8381abc8] /lib/aarch64-linux-gnu/libusb-1.0.so.0(libusb_handle_events_timeout_completed+0x218) [0x7f8381c09c] /usr/local/lib/libosmousb.so.0(+0x1908) [0x7f83901908] /usr/local/lib/libosmocore.so.21(+0x34398) [0x7f838a4398] /usr/local/lib/libosmocore.so.21(+0x3450c) [0x7f838a450c] /usr/local/lib/libosmocore.so.21(osmo_select_main+0x14) [0x7f838a4528] osmo-e1d(+0x39a4) [0x557f4a39a4] /lib/aarch64-linux-gnu/libc.so.6(+0x27780) [0x7f83627780] /lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0x98) [0x7f83627858] osmo-e1d(+0x3bf0) [0x557f4a3bf0] signal 6 received
As the packet is truncated something has gone wrong in the USB communication or maybe in the firmware, but I think that the process should not do abort but try to continue.
// Peter
Related issues
Updated by laforge 3 months ago
On Fri, Mar 22, 2024 at 05:44:25PM +0000, pfassberg wrote:
A few times a day osmo-e1d crashes with the following output:
Fri Mar 22 17:17:02 2024 DE1D usb.c:150 (I0:L0) IN EP 82 ISO packet truncated: len-4 = 173
This points to general USB stakc issues. IT means that either the USB host controller or the USB
host controller driver has decided to truncate an isochronous USB transfer, resulting in it no longer
containing an integral number of E1 frames (which are 32 bytes each).
The firmware always sends 32 byte frames, and never fractional frames.
As the packet is truncated something has gone wrong in the USB communication or maybe in the firmware, but I think that the process should not do abort but try to continue.
This kind of error points to a very serious problem on the USB side of things. It should never happen, and it
doesn't happen for me in (x86 based) icE1usb setups that are running fine for many consecutve months or
occasionally likely also years without any such events.
I'm not sure we should simply plaster over it. We don't even know how many E1 frames might
have been truncated, so we don't know how many to substitute, etc.
Updated by laforge 20 days ago
- Project changed from E1/T1 Hardware Interface (including icE1usb) to osmo-e1d
- Category changed from firmware to icE1usb
Going back over old tickets...
I think the only situation in which the osmo_panic is hurting is if there are multiple devices handled within one osmo-e1d. When one of them shows USB errors, there's no need to kill the daemon that serves other devices, too.
So the proper cause of action would bo close the [lib]usb device and re-open it, just like if somebody had re-plugged that single USB device.
This however depends on introducing a libusb hotplug event thread, see #4916