https://projects.osmocom.org/https://projects.osmocom.org/favicon.ico?16647414092021-08-06T17:11:22ZOpen Source Mobile CommunicationsOsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=224302021-08-06T17:11:22Zlaforge
<ul><li><strong>Assignee</strong> set to <i>neels</i></li></ul> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=224312021-08-06T17:12:14Zlaforge
<ul></ul> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=224452021-08-09T16:40:26Zneelsnhofmeyr@sysmocom.de
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>70</i></li></ul><p>submitted a patch to fix the immediate crash cause: <a class="external" href="https://gerrit.osmocom.org/c/osmo-pcu/+/25182">https://gerrit.osmocom.org/c/osmo-pcu/+/25182</a></p>
<p>I haven't identified any higher level reason why the DL TBF would be NULL even though it apparently is always expected non-NULL.<br />Anyway, not crashing the entire PCU is a good thing, maybe we still need to figure out a higher level error later on.</p> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=225372021-09-15T09:25:42Zlaforge
<ul><li><strong>Assignee</strong> changed from <i>neels</i> to <i>pespin</i></li></ul> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=225612021-09-15T12:35:22Zpespin
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Assignee</strong> changed from <i>pespin</i> to <i>iedemam</i></li></ul><p>As far as I know this should be fixed in master versions of osmo-pcu. I asked <a class="user active" href="https://projects.osmocom.org/users/732038">iedemam</a> a few days ago to test it with a recent version of osmo-pcu after the FSM refactorings where merged, and afair the issue was solved.</p>
<p>Assigning to him for feedback.</p> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=225892021-09-24T10:05:57Ziedemam
<ul></ul><p>I think this crash is still an issue. Here is a trace from a version which includes all submitted fixes.</p>
<pre>
# osmo-pcu --version
OsmoPCU version 0.9.0-35d51ca4
</pre>
<pre>
<0008> tbf.cpp:381 TBF(TFI=5 TLLI=0xb9ce6756 DIR=UL STATE=FLOW EGPRS) N3101 exceeded MAX (10)
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216522 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216527 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216531 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216535 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216540 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216544 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216548 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1216557 but previous FN=1216553 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=1216557 but previous FN=1216553 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=1216553): TBF(TFI=1 TLLI=0x8dc3ee16 DIR=DL STATE=ASSIGN EGPRS)
<0008> tbf.cpp:538 TBF(TFI=1 TLLI=0x8dc3ee16 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=1216553, TS=7 (curr FN 1216557)
<0008> tbf_dl_ass_fsm.c:196 TBF(TFI=1 TLLI=0x8dc3ee16 DIR=DL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT: |Assignment was on PACCH|No downlink ACK received yet|
Program received signal SIGSEGV, Segmentation fault.
gprs_rlcmac_pdch::rcv_control_egprs_dl_ack_nack (this=this@entry=0x7af430, ack_nack=ack_nack@entry=0x8e6e50, fn=fn@entry=1216561, meas=meas@entry=0x7fffffffe290) at pdch.cpp:511
511 pdch.cpp: No such file or directory.
No symbol "all" in current context.
$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {128672, 0, 160, 0, 89408, 0, 40297152, 0, 1666373180, 32765, 682903690, 32695, 1, 0, 160, 0, 29012848, 0, 35819584, 0, 44, 0, 6622689, 0, 8149536, 0, 40171264, 0}, _kill = {si_pid = 128672, si_uid = 0}, _timer = {si_tid = 128672, si_overrun = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _rt = {si_pid = 128672, si_uid = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _sigchld = {si_pid = 128672, si_uid = 0, si_status = 160, si_utime = 384004436000768, si_stime = 173074949961940992}, _sigfault = {si_addr = 0x1f6a0, _addr_lsb = 160, _addr_bnd = {_lower = 0x15d40, _upper = 0x266e2c0}}, _sigpoll = {si_band = 128672, si_fd = 160}}}
</pre>
<p>It looks like another unprotected tbf read is occurring? That section of code is:</p>
<pre>
tbf = as_dl_tbf(poll->tbf_poll.poll_tbf);
if (tbf->tfi() != tfi) {
LOGPDCH(this, DRLCMAC, LOGL_NOTICE, "EGPRS PACKET DOWNLINK ACK with "
"wrong TFI=%d, ignoring!\n", tfi);
return;
}
</pre> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=225902021-09-24T10:13:10Ziedemam
<ul></ul><p>Same version, a new type of SEGV.</p>
<pre>
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=5) Expiring FN=1369155 but previous FN=1369103 is still reserved!
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=5) Expiring FN=1369155 but previous FN=1369108 is still reserved!
<0008> tbf.cpp:431 TBF(TFI=25 TLLI=0x959e1446 DIR=DL STATE=RELEASING EGPRS) T3191 timeout expired, freeing TBF: |Assignment was on PACCH|Downlink ACK was received|
<0008> tbf_fsm.c:363 TBF(TFI=3 TLLI=0xa3415a40 DIR=UL STATE=ASSIGN EGPRS) releasing due to PACCH assignment timeout.
<0002> bts.cpp:314 Detected FN jump! 1369160 -> 1369173
<0008> tbf.cpp:431 TBF(TFI=6 TLLI=0x8cbbee1e DIR=DL STATE=RELEASING EGPRS) T3191 timeout expired, freeing TBF: |Assignment was on PACCH|Downlink ACK was received|
<0002> bts.cpp:314 Detected FN jump! 1369173 -> 1369207
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=4) Expiring FN=1369207 but previous FN=1369160 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=4) Timeout for registered POLL (FN=1369160): TBF(TFI=18 TLLI=0xf7b1f400 DIR=DL STATE=WAIT_RELEASE EGPRS
<0008> tbf.cpp:538 TBF(TFI=18 TLLI=0xf7b1f400 DIR=DL STATE=WAIT_RELEASE EGPRS poll timeout for FN=1369160, TS=4 (curr FN 1369207)
<0008> tbf_ul_ass_fsm.c:224 TBF(TFI=18 TLLI=0xf7b1f400 DIR=DL STATE=WAIT_RELEASE EGPRS Timeout for polling PACKET CONTROL ACK for PACKET UPLINK ASSIGNMENT: |Assignment was on PACCH|Downlink ACK was received|
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=6) Expiring FN=1369207 but previous FN=1369181 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=6) Timeout for registered POLL (FN=1369181): TBF(TFI=24 TLLI=0xe5643a40 DIR=DL STATE=ASSIGN EGPRS)
<0008> tbf.cpp:538 TBF(TFI=24 TLLI=0xe5643a40 DIR=DL STATE=ASSIGN EGPRS) poll timeout for FN=1369181, TS=6 (curr FN 1369207)
<0008> tbf_dl_ass_fsm.c:196 TBF(TFI=24 TLLI=0xe5643a40 DIR=DL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET DOWNLINK ASSIGNMENT: |Assignment was on PACCH|No downlink ACK received yet|
<0002> bts.cpp:314 Detected FN jump! 1369207 -> 1369220
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=1369220 but previous FN=1369038 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=1369038): TBF(TFI=1 TLLI=0xb622b676 DIR=UL STATE=ASSIGN EGPRS)
<0008> tbf.cpp:538 TBF(TFI=1 TLLI=0xb622b676 DIR=UL STATE=ASSIGN EGPRS) poll timeout for FN=1369038, TS=7 (curr FN 1369220)
<0008> tbf_ul_ass_fsm.c:224 TBF(TFI=1 TLLI=0xb622b676 DIR=UL STATE=ASSIGN EGPRS) Timeout for polling PACKET CONTROL ACK for PACKET UPLINK ASSIGNMENT: |Assignment was on PACCH|No uplink data received yet|
<0002> pdch_ul_controller.c:315 PDCH(bts=0,trx=1,ts=7) Expiring FN=1369220 but previous FN=1369069 is still reserved!
<0002> pdch_ul_controller.c:329 PDCH(bts=0,trx=1,ts=7) Timeout for registered POLL (FN=1369069): TBF(TFI=3 TLLI=0xffffffff DIR=UL STATE=NULL EGPRS)
<0008> tbf.cpp:538 TBF(TFI=3 TLLI=0xffffffff DIR=UL STATE=NULL EGPRS) poll timeout for FN=1369069, TS=7 (curr FN 1369220)
Program received signal SIGSEGV, Segmentation fault.
gprs_rlcmac_tbf::poll_timeout (this=0xd3f030, pdch=0x7af430, poll_fn=1369069, reason=PDCH_ULC_POLL_UL_ASS) at tbf.cpp:551
551 tbf.cpp: No such file or directory.
No symbol "all" in current context.
$1 = {si_signo = 11, si_errno = 0, si_code = 1, _sifields = {_pad = {52, 0, 160, 0, 77632, 0, 35389632, 0, -561014292, 32766, -17229686, 32701, 1, 0, 160, 0, 24068976, 0, 30876096, 0, 44, 0, 6622689, 0, 8149536, 0, 34555216, 0}, _kill = {si_pid = 52, si_uid = 0}, _timer = {si_tid = 52, si_overrun = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _rt = {si_pid = 52, si_uid = 0, si_sigval = {sival_int = 160, sival_ptr = 0xa0}}, _sigchld = {si_pid = 52, si_uid = 0, si_status = 160, si_utime = 333426901123072, si_stime = 151997312057475072}, _sigfault = {si_addr = 0x34, _addr_lsb = 160, _addr_bnd = {_lower = 0x12f40, _upper = 0x21c00c0}}, _sigpoll = {si_band = 52, si_fd = 160}}}
</pre>
<p>And the code context for this crash is:</p>
<pre>
void gprs_rlcmac_tbf::poll_timeout(struct gprs_rlcmac_pdch *pdch, uint32_t poll_fn, enum pdch_ulc_tbf_poll_reason reason)
{
gprs_rlcmac_ul_tbf *ul_tbf = as_ul_tbf(this);
LOGPTBF(this, LOGL_NOTICE, "poll timeout for FN=%d, TS=%d (curr FN %d)\n",
poll_fn, pdch->ts_no, bts_current_frame_number(bts));
if (ul_tbf && reason == PDCH_ULC_POLL_UL_ACK && tbf_ul_ack_exp_ctrl_ack(ul_tbf, poll_fn, pdch->ts_no)) {
bts_do_rate_ctr_inc(bts, CTR_RLC_ACK_TIMEDOUT);
bts_do_rate_ctr_inc(bts, CTR_PUAN_POLL_TIMEDOUT);
if (state_is(TBF_ST_FINISHED)) {
if (ul_tbf->n_inc(N3103)) {
bts_do_rate_ctr_inc(bts, CTR_PUAN_POLL_FAILED);
osmo_fsm_inst_dispatch(this->state_fsm.fi, TBF_EV_MAX_N3103, NULL);
return;
}
}
osmo_fsm_inst_dispatch(ul_tbf->ul_ack_fsm.fi, TBF_UL_ACK_EV_POLL_TIMEOUT, NULL);
551: } else if (ul_ass_state_is(TBF_UL_ASS_WAIT_ACK)) {
bts_do_rate_ctr_inc(bts, CTR_RLC_ASS_TIMEDOUT);
bts_do_rate_ctr_inc(bts, CTR_PUA_POLL_TIMEDOUT);
if (n_inc(N3105)) {
osmo_fsm_inst_dispatch(this->state_fsm.fi, TBF_EV_MAX_N3105, NULL);
bts_do_rate_ctr_inc(bts, CTR_RLC_ASS_FAILED);
bts_do_rate_ctr_inc(bts, CTR_PUA_POLL_FAILED);
return;
}
/* Signal timeout to FSM to reschedule UL assignment */
osmo_fsm_inst_dispatch(this->ul_ass_fsm.fi, TBF_UL_ASS_EV_ASS_POLL_TIMEOUT, NULL);
</pre> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=226052021-09-28T15:12:35Zpespin
<ul></ul><p>Hi <a class="user active" href="https://projects.osmocom.org/users/732038">iedemam</a>, I submitted a set of patches doing some general clean up and refactoring which was still pending on that side of the code. See following commits in gerrit:<br /><pre>
tbf_fsm: Ignore event DL_ACKNACK_MISS in state RELEASING
assert if tbf pointer for POLL event is NULL
cosmetic: Fix missing space
pdch: refactor rcv_control_ack() with a switch statement
tests: TbfTest: Fix wrong behavior in test_tbf_dl_reuse()
tbf: refactor poll_timeout() with a switch statement
nacc: Introduce helper function nacc_fsm_exp_ctrl_ack()
tbf: poll_timeout(): Validate expected poll reason
</pre></p>
<p>Those link to <a class="external" href="https://gerrit.osmocom.org/c/osmo-pcu/+/25625">https://gerrit.osmocom.org/c/osmo-pcu/+/25625</a> and follow-up patches.</p>
<p>I'm not sure they will fix the crash but they will at least add some extra levels of security against unexpected paths (it already caught a wrong set of steps from one of the unit tests).</p>
<p>From looking at your traces, the exact issue is not clear. It seems to come from the PDCH ULC (uplink controller), where it reports a timeout from a tbf poll, but the tbf seems gone from memory or corrupted or whatever. I added a few asserts in some patches which should help clarify the issue. Also if you can, building osmo-pcu with --enable-sanitize would help find some possible heap-use-after-free.</p>
<p>BTW, I'm seeing a lot of FN clock jumps in your setup (eg: "bts.cpp:314 Detected FN jump! 1369160 -> 1369173"). This indicates something is wrong with your setup, and it may be somehow responsible to the crashes (I'm not saying osmo-pcu is fine crashing there, just that it may be triggering some code paths usually not triggered). Are you using master osmo-bts-trx too? Do you have NOPE.ind implemented in your TRX?</p>
<p>PS: Feel free to contact me on jabber if you want to debug this issue or discuss whatever related together.</p> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=227422021-10-15T15:48:58Zpespin
<ul></ul><p>I believe this should be fixed in current osmo-pcu.git master b0aba591433c7c22298035453713724172d1cfbc</p>
<p>Please <a class="user active" href="https://projects.osmocom.org/users/732038">iedemam</a> see if you can still reproduce.</p> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=228112021-10-25T06:03:39Zlaforge
<ul></ul><p>pespin wrote:</p>
<blockquote>
<p>I believe this should be fixed in current osmo-pcu.git master b0aba591433c7c22298035453713724172d1cfbc</p>
<p>Please <a class="user active" href="https://projects.osmocom.org/users/732038">iedemam</a> see if you can still reproduce.</p>
</blockquote>
<p>any update here, can the issue be closed?</p> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=228142021-10-25T08:37:01Ziedemam
<ul></ul><p>Have not been able to reproduce this. Looking good and can be closed. Thanks!</p> OsmoPCU - Bug #5205: Crash: SEGV on current master 945be910https://projects.osmocom.org/issues/5205?journal_id=228202021-10-25T09:22:22Zpespin
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>70</i> to <i>100</i></li></ul>