https://projects.osmocom.org/https://projects.osmocom.org/favicon.ico?16647414092020-09-16T16:50:25ZOpen Source Mobile CommunicationsOsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=196482020-09-16T16:50:25Zpespin
<ul></ul><p>Happened again, during a call, probably because the PDCH TS where the MS was laying was deleted (because I have all TS set as dynamic TCH_H/TCH_F/PDCH).</p>
<pre>
20200916184620436 DRLCMACMEAS gprs_rlcmac_meas.cpp:46 MS(TLLI=0xe37fcdf2, IMSI=901700000015256, TA=0, 12/12,) Rx Measurement Report: NC1 Serv -50 dbm
20200916184620598 DRLCMACMEAS gprs_rlcmac_meas.cpp:46 MS(TLLI=0xd90756fa, IMSI=901700000015254, TA=0, 12/12,) Rx Measurement Report: NC1 Serv -49 dbm
20200916184621657 DRLCMACMEAS gprs_rlcmac_meas.cpp:108 MS(TLLI=0xe37fcdf2, IMSI=901700000015256, TA=0, 12/12, UL) UL RSSI: -12 dBm
20200916184625518 DRLCMACMEAS gprs_rlcmac_meas.cpp:46 MS(TLLI=0xe37fcdf2, IMSI=901700000015256, TA=0, 12/12,) Rx Measurement Report: NC1 Serv -50 dbm
20200916184629354 DBSSGP gprs_bssgp_bss.c:64 BSSGP (BVCI=0) Tx SUSPEND (TLLI=0xd90756fa)
20200916184631961 DBSSGP gprs_bssgp_bss.c:64 BSSGP (BVCI=0) Tx SUSPEND (TLLI=0xe37fcdf2)
20200916184638476 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=549952, TS=2 (curr FN 550017)
20200916184638476 DTBF tbf.cpp:856 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT.
20200916184638476 DTBF tbf.cpp:1177 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) Assignment was on PACCH
20200916184638476 DTBF tbf.cpp:1185 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) No downlink ACK received yet
20200916184638854 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=550038, TS=2 (curr FN 550099)
20200916184639256 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=550121, TS=2 (curr FN 550186)
20200916184639634 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=550207, TS=2 (curr FN 550268)
20200916184640036 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=550290, TS=2 (curr FN 550355)
20200916184640085 DTBF tbf.cpp:1136 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=ASSIGN EGPRS) releasing due to PACCH assignment timeout.
20200916184640085 DTBF tbf.h:516 TBF(TFI=0 TLLI=0xd90756fa DIR=DL STATE=RELEASING EGPRS) FIXME: Software error: Pending downlink assignment in state GPRS_RLCMAC_DL_ASS_WAIT_ACK. This may not happen, because the assignment message never gets transmitted. Please be sure not to free in this state. PLEASE FIX!
20200916184643396 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=551022, TS=2 (curr FN 551083)
20200916184643396 DTBF tbf.cpp:856 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT.
20200916184643396 DTBF tbf.cpp:1177 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) Assignment was on PACCH
20200916184643396 DTBF tbf.cpp:1185 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) No downlink ACK received yet
20200916184643775 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=551104, TS=2 (curr FN 551165)
20200916184644176 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=551187, TS=2 (curr FN 551252)
20200916184644556 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=551273, TS=2 (curr FN 551334)
20200916184644956 DTBF tbf.cpp:806 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=551356, TS=2 (curr FN 551421)
20200916184645029 DTBF tbf.cpp:1136 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=ASSIGN EGPRS) releasing due to PACCH assignment timeout.
20200916184645029 DTBF tbf.h:516 TBF(TFI=0 TLLI=0xe37fcdf2 DIR=DL STATE=RELEASING EGPRS) FIXME: Software error: Pending downlink assignment in state GPRS_RLCMAC_DL_ASS_WAIT_ACK. This may not happen, because the assignment message never gets transmitted. Please be sure not to free in this state. PLEASE FIX!
Assert failed pdch != NULL /git/osmo-pcu/src/encoding.cpp:562
backtrace() returned 21 addresses
/build/new/out/lib/libosmocore.so.16(+0x146c2e) [0x7ffff6a01c2e]
/build/new/out/lib/libosmocore.so.16(osmo_generate_backtrace+0x18) [0x7ffff6a01ffd]
/build/new/out/lib/libosmocore.so.16(+0x1469b8) [0x7ffff6a019b8]
/build/new/out/lib/libosmocore.so.16(osmo_panic+0x13c) [0x7ffff6a01afe]
/build/new/out/bin/osmo-pcu(+0x29eda5) [0x5555557f2da5]
/build/new/out/bin/osmo-pcu(_ZN8Encoding30write_packet_uplink_assignmentEP16RlcMacDownlink_thhjhPK18gprs_rlcmac_ul_tbfhhhhab+0x1428) [0x5555557f4d04]
/build/new/out/bin/osmo-pcu(_ZN15gprs_rlcmac_tbf13create_ul_assEjh+0xa54) [0x5555557ac19c]
/build/new/out/bin/osmo-pcu(+0x2d3153) [0x555555827153]
/build/new/out/bin/osmo-pcu(_Z25gprs_rlcmac_rcv_rts_blockP15gprs_rlcmac_btshhjh+0x832) [0x555555829267]
/build/new/out/bin/osmo-pcu(pcu_rx_rts_req_pdtch+0x5d) [0x55555577bd63]
/build/new/out/bin/osmo-pcu(+0x228756) [0x55555577c756]
/build/new/out/bin/osmo-pcu(_Z6pcu_rxhP10gsm_pcu_if+0x176) [0x555555785b86]
/build/new/out/bin/osmo-pcu(+0x2c537f) [0x55555581937f]
/build/new/out/bin/osmo-pcu(+0x2c5a6c) [0x555555819a6c]
/build/new/out/lib/libosmocore.so.16(osmo_fd_disp_fds+0xdb2) [0x7ffff69b901a]
/build/new/out/lib/libosmocore.so.16(+0xfe654) [0x7ffff69b9654]
/build/new/out/lib/libosmocore.so.16(osmo_select_main+0x16) [0x7ffff69b9741]
/build/new/out/bin/osmo-pcu(main+0x2bb6) [0x555555748196]
/usr/lib/libc.so.6(__libc_start_main+0xf2) [0x7ffff5a31152]
/build/new/out/bin/osmo-pcu(_start+0x2e) [0x555555744d7e]
Program received signal SIGABRT, Aborted.
0x00007ffff5a46615 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff5a46615 in raise () from /usr/lib/libc.so.6
#1 0x00007ffff5a2f862 in abort () from /usr/lib/libc.so.6
#2 0x00007ffff6a019c2 in osmo_panic_default (
fmt=0x5555558a3940 "Assert failed %s %s:%d\n", args=0x7fffffffd340)
at /git/libosmocore/src/panic.c:49
#3 0x00007ffff6a01afe in osmo_panic (
fmt=0x5555558a3940 "Assert failed %s %s:%d\n")
at /git/libosmocore/src/panic.c:84
#4 0x00005555557f2da5 in gen_freq_params (freq_params=0x6340003608cc,
tbf=0x7ffff21bb860)
at /git/osmo-pcu/src/encoding.cpp:562
#5 0x00005555557f4d04 in Encoding::write_packet_uplink_assignment (
block=0x634000360860, old_tfi=1 '\001', old_downlink=0 '\000',
tlli=3641136890, use_tlli=1 '\001', tbf=0x7ffff21bb860, poll=1 '\001',
rrbp=0 '\000', alpha=0 '\000', gamma=0 '\000', ta_idx=-1 '\377',
use_egprs=true)
at /git/osmo-pcu/src/encoding.cpp:678
#6 0x00005555557ac19c in gprs_rlcmac_tbf::create_ul_ass (this=0x7ffff21bb860,
fn=561734, ts=7 '\a')
at /git/osmo-pcu/src/tbf.cpp:1372
#7 0x0000555555827153 in sched_select_ctrl_msg (trx=0 '\000', ts=7 '\a',
fn=561734, block_nr=7 '\a', pdch=0x555555b11800 <s_bts+4320>,
ul_ass_tbf=0x7ffff21bb860, dl_ass_tbf=0x0, ul_ack_tbf=0x0)
--Type <RET> for more, q to quit, c to continue without paging--
at /git/osmo-pcu/src/gprs_rlcmac_sched.cpp:215
#8 0x0000555555829267 in gprs_rlcmac_rcv_rts_block (
bts=0x555555b10728 <s_bts+8>, trx=0 '\000', ts=7 '\a', fn=561734,
block_nr=7 '\a')
at /git/osmo-pcu/src/gprs_rlcmac_sched.cpp:427
#9 0x000055555577bd63 in pcu_rx_rts_req_pdtch (trx=0 '\000', ts=7 '\a',
fn=561734, block_nr=7 '\a')
at /git/osmo-pcu/src/pcu_l1_if.cpp:386
#10 0x000055555577c756 in pcu_rx_rts_req (rts_req=0x7fffffffdaf4)
at /git/osmo-pcu/src/pcu_l1_if.cpp:420
#11 0x0000555555785b86 in pcu_rx (msg_type=16 '\020', pcu_prim=0x7fffffffdaf0)
at /git/osmo-pcu/src/pcu_l1_if.cpp:765
#12 0x000055555581937f in pcu_sock_read (bfd=0x555555b1a3e0 <pcu_sock_state>)
at /git/osmo-pcu/src/osmobts_sock.cpp:141
#13 0x0000555555819a6c in pcu_sock_cb (bfd=0x555555b1a3e0 <pcu_sock_state>,
flags=1) at /git/osmo-pcu/src/osmobts_sock.cpp:196
#14 0x00007ffff69b901a in osmo_fd_disp_fds (_rset=0x7fffffffddc0,
_wset=0x7fffffffde60, _eset=0x7fffffffdf00)
at /git/libosmocore/src/select.c:227
#15 0x00007ffff69b9654 in _osmo_select_main (polling=0)
at /git/libosmocore/src/select.c:265
#16 0x00007ffff69b9741 in osmo_select_main (polling=0)
at /git/libosmocore/src/select.c:274
--Type <RET> for more, q to quit, c to continue without paging--
#17 0x0000555555748196 in main (argc=7, argv=0x7fffffffe1f8)
at /git/osmo-pcu/src/pcu_main.cpp:357
</pre> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=196492020-09-16T17:15:56Zpespin
<ul></ul><p>I confirm I can reproduce this by placing a call between 2 MS whenever all timeslots are set to TCH/F_TCH/H_PDCH.</p>
<p>1- MS are GPRS attached to the network, probably doing data transfers<br />2- call MO_MT between them<br />3- Take the call, leave it for a few seconds and hang the call<br />4- pcu crashes.</p> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=196502020-09-16T18:16:17Zfixeria
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=196532020-09-16T21:28:41Zfixeria
<ul></ul><p>This is very interesting, because we used to dereference that pointer without any checks before my hopping patches:</p>
<pre><code class="c syntaxhl"><span class="kt">void</span> <span class="n">Encoding</span><span class="o">::</span><span class="n">write_packet_uplink_assignment</span><span class="p">(...)</span>
<span class="p">{</span>
<span class="c1">// ...</span>
<span class="cm">/* Frequency Parameters IE */</span>
<span class="n">fp</span><span class="o">-></span><span class="n">TSC</span> <span class="o">=</span> <span class="n">tbf</span><span class="o">-></span><span class="n">tsc</span><span class="p">();</span> <span class="c1">// Training Sequence Code (TSC)</span>
<span class="n">fp</span><span class="o">-></span><span class="n">UnionType</span> <span class="o">=</span> <span class="mh">0x00</span><span class="p">;</span> <span class="c1">// Single ARFCN</span>
<span class="n">fp</span><span class="o">-></span><span class="n">u</span><span class="p">.</span><span class="n">ARFCN</span> <span class="o">=</span> <span class="n">tbf</span><span class="o">-></span><span class="n">trx</span><span class="o">-></span><span class="n">arfcn</span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
<p>here is the definition of tbf->tsc():</p>
<pre><code class="c syntaxhl"><span class="kt">uint8_t</span> <span class="n">gprs_rlcmac_tbf</span><span class="o">::</span><span class="n">tsc</span><span class="p">()</span> <span class="k">const</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">trx</span><span class="o">-></span><span class="n">pdch</span><span class="p">[</span><span class="n">first_ts</span><span class="p">].</span><span class="n">tsc</span><span class="p">;</span>
<span class="p">}</span>
</code></pre>
<p>so I am pretty sure we had this bug, but never noticed it?</p> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=196542020-09-16T21:31:30Zfixeria
<ul></ul><blockquote>
<p>so I am pretty sure we had this bug, but never noticed it?</p>
</blockquote>
<p>I'll try to implement a TTCN-3 test case, so we can see if the problem does exist in the 'latest'.</p> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=196922020-09-21T14:11:24Zfixeria
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Assignee</strong> changed from <i>fixeria</i> to <i>pespin</i></li></ul><p>Unfortunately, I was not able to reproduce the problem, neither in GPRS nor in EGPRS mode.</p> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=197042020-09-22T17:48:53Zpespin
<ul></ul><p>I have been looking at this for a while and there's definetly a bug with TS allocation producing a later crash, but I'm still unsure at what's exactly the issue.</p>
<p>So as a reminder, I have TS=CCCH+SDCCH4, TS1..5=TCH/F_TCH/H_PDCH, TS6..7=PDCH. I also have "channel allocator descending", which means TS 4 and 5 are switched to TCH to handle the call, which means they get deactivated in PCU. That's working fine. I also added a VTY command to verify during the call the TS are marked as disabled in PCU.<br /><pre>
20200922184508812 DL1IF pcu_l1_if.cpp:523 BTS available
20200922184508813 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=1): tsc=7, hopping=no
20200922184508813 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=2): tsc=7, hopping=no
20200922184508813 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=3): tsc=7, hopping=no
20200922184508813 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=4): tsc=7, hopping=no
20200922184508813 DL1IF pcu_l1_if.cpp:151 Sending deactivate request: trx=0 ts=5
20200922184508813 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=6): tsc=7, hopping=no
20200922184508813 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=7): tsc=7, hopping=no
20200922184511181 DL1IF pcu_l1_if.cpp:744 GPRS Suspend request received: TLLI=0xc35fc59c RAI=234-70-5-0
20200922184511181 DBSSGP gprs_bssgp_bss.c:64 BSSGP (BVCI=0) Tx SUSPEND (TLLI=0xc35fc59c)
20200922184511656 DL1IF pcu_l1_if.cpp:523 BTS available
20200922184511656 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=1): tsc=7, hopping=no
20200922184511656 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=2): tsc=7, hopping=no
20200922184511656 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=3): tsc=7, hopping=no
20200922184511656 DL1IF pcu_l1_if.cpp:151 Sending deactivate request: trx=0 ts=4
20200922184511656 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=6): tsc=7, hopping=no
20200922184511656 DL1IF pcu_l1_if.cpp:679 PDCH (trx=0, ts=7): tsc=7, hopping=no
</pre></p>
<p>Now, I added some more debugging during chan allocation:<br /><pre>
20200922184517883 DTBF tbf.cpp:997 Allocating UL TBF: MS_CLASS=12/12
20200922184517883 DTBF tbf.cpp:1009 TBF(TFI=0 TLLI=0x00000000 DIR=UL STATE=NULL EGPRS) Enabled EGPRS, mode EGPRS
20200922184517883 DRLCMAC gprs_rlcmac_ts_alloc.cpp:895 [UL] algo B <multi> (suggested TRX: 0): reserved slots: ul=0x0c dl=0x0e, first_comon_ts=ffffffff
20200922184517883 DRLCMAC gprs_rlcmac_ts_alloc.cpp:904 [UL] algo B <multi> (suggested TRX: 0): first_ts=2
20200922184517883 DRLCMAC gprs_rlcmac_ts_alloc.cpp:929 [UL] algo B <multi> (suggested TRX: 0): first_common_ts=3
20200922184517883 DRLCMAC gprs_rlcmac_ts_alloc.cpp:941 [UL] algo B <multi> (suggested TRX: 0): using 1/1 slots (0x08)
20200922184517883 DL1IF bts.cpp:1071 TRX reserve mask dir=0: 0x0e
20200922184517884 DL1IF bts.cpp:1071 TRX reserve mask dir=1: 0x08
20200922184517884 DRLCMAC gprs_rlcmac_ts_alloc.cpp:807 - Assigning UL TS 3
20200922184517884 DRLCMAC gprs_rlcmac_ts_alloc.cpp:225 Attaching TBF tbf->pdch[3] = 0x555555b1e360
20200922184517884 DTBF tbf.cpp:539 TBF(TFI=1 TLLI=0x00000000 DIR=UL STATE=NULL EGPRS) Setting Control TS 3
20200922184517884 DTBF tbf.cpp:952 TBF(TFI=1 TLLI=0xf99b124d DIR=UL STATE=NULL EGPRS) Allocated: trx = 0, ul_slots = 08, dl_slots = 00
20200922184517884 DTBFUL tbf_ul.cpp:613 TBF(TFI=1 TLLI=0xf99b124d DIR=UL STATE=NULL EGPRS) setting EGPRS UL window size to 64, base(64) slots(1) ws_pdch(0)
20200922184517893 DTBF tbf.cpp:1379 TBF(TFI=1 TLLI=0xf99b124d DIR=UL STATE=ASSIGN EGPRS) start Packet Uplink Assignment (PACCH)
Assert failed pdch != NULL /home/pespin/dev/sysmocom/git/osmo-pcu/src/encoding.cpp:562
</pre></p>
<p>Not seen in the text above, but in the pcap file in debug I see (so available TS in C are correct: 1,2,3,7,8:<br /><pre>
63499 18:45:17.883736890 127.0.0.1 34946 127.0.0.1 4729 GSMTAP 189 - Possible DL/UL slots: (TS=0)".CCC..CC"(TS=7)
</pre></p>
<p>Then, after doing the calculations it decides the available slots are: reserved slots: ul=0x0c dl=0x0e so 00001100 and 00001110.<br />From there, it selects fn_first=2, and (only seen in debug pcap), afterwards it says "- Skipping TS 2, because num TBFs 1 > 0" and immediatelly after sets fn_common_first=3.<br />Later it calculates the amount of slots to 1.<br />So if you sum all the points above, there's only 1 TS being selected for this PDCH, yet fn_first and fn_common_first differ, which makes no sense to me. The issue seems to come from UL TBF branch calling allocate_usf() in alloc_algorithm_b(), which updates "reserved_ul_slots" and "ul_slots" later passed to update_ms_reserved_slots and assign_ul_tbf_slots, but forgets to update first_ts accordingly.</p> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=197072020-09-22T18:41:48Zpespin
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>90</i></li></ul><p>Should be fixed by:<br /><a class="external" href="https://gerrit.osmocom.org/c/osmo-pcu/+/20254">https://gerrit.osmocom.org/c/osmo-pcu/+/20254</a> Fix crash accessing NULL tbf->pdch[first_ts]</p>
<p>AFAICT this bug has been there since Jan 31 2018.</p> OsmoPCU - Bug #4756: osmo-pcu: assert hithttps://projects.osmocom.org/issues/4756?journal_id=197572020-09-30T20:03:02Zlaforge
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul>