Bug #6209
closedmodem: no response to ICMP Echo Req (ping to MS) with mssdr-ms
100%
Description
I am trying to ping the MS from the BTS side, but seeing lots of errors and almost no Echo Rsp.
trxcon logs errors about GPRS UL BLOCK.req without prior TBF CFG.req:
20231005180900657 DSCH NOTICE trxcon(0)[0x5572f10900]: Reset scheduler (sched_trx.c:190) 20231005180900657 DSCH NOTICE trxcon(0)[0x5572f10900]: Delete TDMA timeslot #0 (sched_trx.c:226) 20231005180900657 DSCH NOTICE trxcon(0)[0x5572f10900]: Add a new TDMA timeslot #3 (sched_trx.c:207) 20231005180900657 DSCH NOTICE trxcon(0)[0x5572f10900]: (Re)configure TDMA timeslot #3 as PDCH (sched_trx.c:276) 20231005180900657 DSCH NOTICE trxcon(0)[0x5572f10900]: TS3-PDTCH activating (sched_trx.c:476) 20231005180900657 DSCH NOTICE trxcon(0)[0x5572f10900]: TS3-PTCCH activating (sched_trx.c:476) 20231005180900843 DSCH NOTICE trxcon(0)[0x5572f10900]: Delete TDMA timeslot #3 (sched_trx.c:226) 20231005180900843 DGPRS ERROR trxcon(0)[0x5572f10900]: (PDCH-3) Rx UL BLOCK.req (fn=2003126, len=34), but this PDCH has no configured TBFs (l1gprs.c:643) 20231005180900871 DL1C NOTICE trxcon(0)[0x5572f10900]{PACKET_DATA}: Received reset request (FULL) (l1ctl.c:439) 20231005180900871 DSCH NOTICE trxcon(0)[0x5572f10900]: Reset scheduler and clock counter (sched_trx.c:190) 20231005180900872 DL1C NOTICE trxcon(0)[0x5572f10900]{RESET}: Received FBSB request (GSM900 979, timeout 100 TDMA FNs) (l1ctl.c:380) 20231005180900872 DSCH NOTICE trxcon(0)[0x5572f10900]: Add a new TDMA timeslot #0 (sched_trx.c:207) 20231005180900872 DSCH NOTICE trxcon(0)[0x5572f10900]: (Re)configure TDMA timeslot #0 as BCCH+CCCH+SDCCH/4+SACCH/4 (sched_trx.c:276) 20231005180900872 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-SCH activating (sched_trx.c:476) 20231005180900872 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-BCCH activating (sched_trx.c:476) 20231005180900872 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-RACH activating (sched_trx.c:476) 20231005180900872 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-CCCH activating (sched_trx.c:476) 20231005180900890 DAPP ERROR trxcon(0)[0x5572f10900]{FBSB_SEARCH}: Event TRXCON_EV_RX_DATA_IND not permitted (trxcon_shim.c:93) 20231005180901644 DL1C NOTICE trxcon(0)[0x5572f10900]{BCCH_CCCH}: Received 8-bit RACH request (offset=0, ra=0x78) (l1ctl.c:542) 20231005180901646 DSCHD NOTICE trxcon(0)[0x5572f10900]: TS0-RACH Scheduled 8-bit RACH (TS0: GSM, GMSK) at fn=2003300 (sched_lchan_rach.c:130) 20231005180901834 DL1C NOTICE trxcon(0)[0x5572f10900]{BCCH_CCCH}: Received L1CTL_DM_EST_REQ (tn=4, chan_nr=0xc4, tsc=7, tch_mode=SIGNALLING) (l1ctl.c:630) 20231005180901834 DL1C NOTICE trxcon(0)[0x5572f10900]{BCCH_CCCH}: L1CTL_DM_EST_REQ indicates single ARFCN GSM900 979 (l1ctl.c:572) 20231005180901834 DSCH NOTICE trxcon(0)[0x5572f10900]: Reset scheduler (sched_trx.c:190) 20231005180901834 DSCH NOTICE trxcon(0)[0x5572f10900]: Delete TDMA timeslot #0 (sched_trx.c:226) 20231005180901834 DSCH NOTICE trxcon(0)[0x5572f10900]: Add a new TDMA timeslot #4 (sched_trx.c:207) 20231005180901834 DSCH NOTICE trxcon(0)[0x5572f10900]: (Re)configure TDMA timeslot #4 as PDCH (sched_trx.c:276) 20231005180901834 DSCH NOTICE trxcon(0)[0x5572f10900]: TS4-PDTCH activating (sched_trx.c:476) 20231005180901834 DSCH NOTICE trxcon(0)[0x5572f10900]: TS4-PTCCH activating (sched_trx.c:476) 20231005180902026 DSCH NOTICE trxcon(0)[0x5572f10900]: Delete TDMA timeslot #4 (sched_trx.c:226) 20231005180902026 DGPRS ERROR trxcon(0)[0x5572f10900]: (PDCH-4) Rx UL BLOCK.req (fn=2003382, len=34), but this PDCH has no configured TBFs (l1gprs.c:643) 20231005180902054 DL1C NOTICE trxcon(0)[0x5572f10900]{PACKET_DATA}: Received reset request (FULL) (l1ctl.c:439) 20231005180902054 DSCH NOTICE trxcon(0)[0x5572f10900]: Reset scheduler and clock counter (sched_trx.c:190) 20231005180902055 DL1C NOTICE trxcon(0)[0x5572f10900]{RESET}: Received FBSB request (GSM900 979, timeout 100 TDMA FNs) (l1ctl.c:380) 20231005180902055 DSCH NOTICE trxcon(0)[0x5572f10900]: Add a new TDMA timeslot #0 (sched_trx.c:207) 20231005180902055 DSCH NOTICE trxcon(0)[0x5572f10900]: (Re)configure TDMA timeslot #0 as BCCH+CCCH+SDCCH/4+SACCH/4 (sched_trx.c:276) 20231005180902055 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-SCH activating (sched_trx.c:476) 20231005180902055 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-BCCH activating (sched_trx.c:476) 20231005180902055 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-RACH activating (sched_trx.c:476) 20231005180902055 DSCH NOTICE trxcon(0)[0x5572f10900]: TS0-CCCH activating (sched_trx.c:476) 20231005180902511 DL1D NOTICE trxcon(0)[0x5572f10900]: L1CTL connection error: read() failed (rc=0): Resource temporarily unavailable (l1ctl_server.c:55) 20231005180902511 DL1C NOTICE trxcon(0)[0x5572f10900]: Closing L1CTL connection (l1ctl_server.c:203) 20231005180902511 DSCH NOTICE trxcon(0)[0x5572f10900]: Shutdown scheduler (sched_trx.c:173) 20231005180902511 DSCH NOTICE trxcon(0)[0x5572f10900]: Delete TDMA timeslot #0 (sched_trx.c:226) <pre> Please find the PCAPs (MS side + BTS side) attached.
Files
Related issues
Updated by pespin 7 months ago
- Status changed from New to Feedback
- Assignee changed from pespin to fixeria
This looks like we are transmitting an UL block during free of a tbf and immediately after going back to CCCH.
It may be fixed by libosmo-gprs.git which was submitted earlier this week (df4f1cf0a3026b83177410008a8dd5794effe3ba).
In order to know I need matching pcaps with osmo-trx-ms logs, sorry.
Updated by fixeria 7 months ago
- File mssdr_gprs_ms_ping.pcap.gz mssdr_gprs_ms_ping.pcap.gz added
- Status changed from In Progress to Feedback
- Assignee changed from fixeria to pespin
Hi Pau,
I did some more testing, and somehow I am not seeing these trxcon errors anymore:
20231005180900843 DGPRS ERROR trxcon(0)[0x5572f10900]: (PDCH-3) Rx UL BLOCK.req (fn=2003126, len=34), but this PDCH has no configured TBFs (l1gprs.c:643)
and furthermore I saw 3 ICMP Echo Resp made it back to the GGSN side. But after that the modem got stuck on PDCH endlessly receiving dummy blocks.
I recorded a PCAP on the MS side only, but this time it should contain everything:
- osmo-trx-ms/trxcon logging,
- layer23/modem Um frames and logging.
Here is a patch that makes layer23/modem send GPRS blocks over GSMTAP:
https://gerrit.osmocom.org/c/osmocom-bb/+/34733 layer23: send UL/DL GPRS blocks over GSMTAP
Assigning back to you. Let me know if anything is missing.
Updated by pespin 7 months ago
fixeria mssdr_gprs_ms_ping.pcap.gz starting from frame_nr 2554 seems to be buggy, submitting empty GSMTAP blocks:
- We should probably fix your https://gerrit.osmocom.org/c/osmocom-bb/+/34733 patch to avoid submitting empty blocks to gsmtap?
- We should fix wireshark to show empty GSMTAP payload as malformed.
Updated by pespin 7 months ago
fixeria, Hoernchen the empty gsmtap packets seems to be buggy RLC/MAC blocks received on the DL at the ms-sdr, see for instance:
2550 12:13:30.287279 Oct 13, 2023 14:13:30.287279000 CEST 127.0.0.1 59719 127.0.0.1 4729 GSMTAP 230 trxcon(0)[0x55b23abd20]: TS3-PDTCH Received bad frame (rc=-1, ber=444/444) at fn=496509 2551 12:13:30.287485 Oct 13, 2023 14:13:30.287485000 CEST 127.0.0.1 58441 127.0.1.1 4729 GSMTAP 222 GPRS-RR(1)[0x557190c470]{PACKET_TRANSFER}: Received Event GRR_EV_PDCH_BLOCK_IND 2552 12:13:30.287541 Oct 13, 2023 14:13:30.287541000 CEST 127.0.0.1 58441 127.0.1.1 4729 GSMTAP 191 Rx from lower layers: L1CTL-PDCH_DATA.indication 2553 12:13:30.287584 Oct 13, 2023 14:13:30.287584000 CEST 127.0.0.1 58441 127.0.1.1 4729 GSMTAP 179 Dropping DL data block with length 0 2554 12:13:30.287649 Oct 13, 2023 14:13:30.287649000 CEST 127.0.0.1 51492 127.0.0.1 4729 GSMTAP 58 [Malformed Packet]
So since DL blocks cannot be decoded, that means no USF can be decoded and hence "modem" cannot transmit anymore on the UL because it is not "scheduled" by any USF from the network.
Updated by pespin 7 months ago
Regarding the problem of MS staying for too long (or forever) in PDCH instead of going back to CCCH:
At the time the MS stops receiving DL blocks, it has a UL and a DL TBF assigned. The timers below would be the responsibles in this scenario to move the MS back to CCCH:
-UL TBF:
T3180 "Wait for Uplink State Flag After Data Block"
When transmitting an RLC/MAC block to the
network an instance of T3180 is started for
the TBF for which the block was intended. 5s.
-DL TBF:
T3190 "Wait for Valid Downlink Data Received from the Network". 5s.
So both are 5s. Last time the timers would be armed:
2395 12:13:30.084197 Oct 13, 2023 14:13:30.084197000 CEST 127.0.0.1 51492 127.0.0.1 4729 GSM RLC/MAC 98 GPRS DL DATA: CS3 TFI=0 BSN=15 USF=7 Len=36 2549 12:13:30.265814 Oct 13, 2023 14:13:30.265814000 CEST 127.0.0.1 51492 127.0.0.1 4729 GSM RLC/MAC 92 GPRS UL DATA: CS2 TFI=0 BSN=0 CV=3 Len=30
T3190 (DL TBF) is actually working, as can be seen here:
5240 12:13:35.084188 Oct 13, 2023 14:13:35.084188000 CEST 127.0.0.1 58441 127.0.1.1 4729 GSMTAP 186 TBF(DL:NR-2:TLLI-f0085356) Timeout of T3190
However, T3180 is not yet implemented on the UL TBF side, and hence even if T3190 triggers on the DL TBF side and frees it, the MS still keeps the UL TBF established and as a result it stays in PDCH.
Implementing T3180 should fix that problem, I'll do it now.
Updated by pespin 7 months ago
Find here some timer & test fixes, plus a patch implementing missing T3180:
remote: https://gerrit.osmocom.org/c/libosmo-gprs/+/34770 rlcmac: Avoid re-arming T3182 in TBF Acknowledged mode [NEW]
remote: https://gerrit.osmocom.org/c/libosmo-gprs/+/34771 rlcmac: rlcmac: tests: Account for T3180 existance in test_ul_tbf_t3182_timeout [NEW]
remote: https://gerrit.osmocom.org/c/libosmo-gprs/+/34772 rlcmac: tests: Account for T3180 existance in test_ul_tbf_t3166_timeout [NEW]
remote: https://gerrit.osmocom.org/c/libosmo-gprs/+/34773 rlcmac: ul_tbf: Implement T3180 [NEW]
After those are merge, I'd expect the MS to go to CCCH after 5 seconds given the current situation with lower layers.