Bug #2176
closedCS paging during active PS TBF (regression)
100%
Description
According to a use report, it seems that commit b78a4a6dfef217c538d45949a6ae725e22a36b05 in osmo-pcu has broken the successful delivery of CS paging messages via the PCU during ongoing TBFs.
Please verify / investigate / fix accordingly.
Related issues
Updated by neels almost 7 years ago
Despite me being the author of the commit that seems to cause this behavior, unfortunately I have hardly a clue of the osmo-pcu code, nor what that particular function should do. I'll see how far I get:
commit b78a4a6dfef in osmo-pcu was:
"fix segfault: check for NULL tbf in sched_select_ctrl_msg()"
It came about this way:
I had come across a coverity warning complaining about a possible NULL dereference, and made a (naive) fix for it.
Then during the GSM operations at the 33C3 congress I found osmo-pcu crashing, and just by chance tried the fix for the coverity issue: the problem seemed to be solved by it.
The current problem report means there is a more complex problem than this NULL dereference. Checking the evolution of that function:
Dereference of tbf at "tbf->rotate_in_list();" existed as early as 2013 7a5f3c2153beb -- but apparently that only happens when originally msg != NULL, which apparently implies that tbf is then always also != NULL. So at first sight it looks like the code could never have worked, but on second thought there probably never was a missing tbf in that code path.
Two more dereferences were added shortly before I came across the segfault in 2016-12, commit da7250ad2c1 by radisys: tbf is dereferenced to "update the dl ctrl msg counter" in the case where there is originally a msg present (and presumably also tbf != NULL), but also for the paging case of msg NULL with presumably tbf NULL. I suspect that the paging case tbf dereference added by radisys is bogus.
The details I received for this bug report revolves around paging: in the failing version, no paging is seen on PACCH. The code path excluded by my 'if (!tbf) return NULL' happens to include a paging. I added the early return because I saw every code path below using tbf. But the abovementioned more recent patch was the one adding the tbf dereference in the paging code path below.
I attempt an improved fix in https://gerrit.osmocom.org/2420 -- hopefully review will confirm that I'm right.
Here is an annotated "map" of the code in question:
static struct msgb *sched_select_ctrl_msg( uint8_t trx, uint8_t ts, uint32_t fn, uint8_t block_nr, struct gprs_rlcmac_pdch *pdch, struct gprs_rlcmac_tbf *ul_ass_tbf, struct gprs_rlcmac_tbf *dl_ass_tbf, struct gprs_rlcmac_ul_tbf *ul_ack_tbf) { struct msgb *msg = NULL; struct gprs_rlcmac_tbf *tbf = NULL; struct gprs_rlcmac_tbf *next_list[3] = { ul_ass_tbf, dl_ass_tbf, ul_ack_tbf }; for (size_t i = 0; i < ARRAY_SIZE(next_list); ++i) { tbf = next_list[(pdch->next_ctrl_prio + i) % 3]; if (!tbf) continue; <-------- this loop at first looks like tbf shouldn't be NULL but in fact leaving the loop with tbf == NULL is possible. /* * Assignments for the same direction have lower precedence, * because they may kill the TBF when the CONTROL ACK is * received, thus preventing the others from being processed. */ if (tbf == ul_ass_tbf && tbf->ul_ass_state == GPRS_RLCMAC_UL_ASS_SEND_ASS_REJ) msg = ul_ass_tbf->create_packet_access_reject(); else if (tbf == ul_ass_tbf && tbf->direction == GPRS_RLCMAC_DL_TBF) if (tbf->ul_ass_state == GPRS_RLCMAC_UL_ASS_SEND_ASS_REJ) msg = ul_ass_tbf->create_packet_access_reject(); else msg = ul_ass_tbf->create_ul_ass(fn, ts); else if (tbf == dl_ass_tbf && tbf->direction == GPRS_RLCMAC_UL_TBF) msg = dl_ass_tbf->create_dl_ass(fn, ts); else if (tbf == ul_ack_tbf) msg = ul_ack_tbf->create_ul_ack(fn, ts); if (!msg) { tbf = NULL; continue; } pdch->next_ctrl_prio += 1; pdch->next_ctrl_prio %= 3; break; } if (!msg) { /* * If one of these is left, the response (CONTROL ACK) from the * MS will kill the current TBF, only one of them can be * non-NULL */ if (dl_ass_tbf) { tbf = dl_ass_tbf; <-------- This looks like msg = dl_ass_tbf->create_dl_ass(fn, ts); msg != NULL means tbf != NULL } else if (ul_ass_tbf) { tbf = ul_ass_tbf; msg = ul_ass_tbf->create_ul_ass(fn, ts); } } if (!tbf) <---------- My null dereference fix, return NULL; <---------- this apparently breaks PACCH paging /* any message */ if (msg) { tbf->rotate_in_list(); <-------- exists since 2013. LOGP(DRLCMACSCHED, LOGL_DEBUG, "Scheduling control " A pending message apparently "message at RTS for %s (TRX=%d, TS=%d)\n", implies a non-null tbf. tbf_name(tbf), trx, ts); /* Updates the dl ctrl msg counter for ms */ tbf->ms()->update_dl_ctrl_msg(); <-------- added in December 2016 by radisys, probably ok return msg; } /* schedule PACKET PAGING REQUEST */ <------ here we handle a case where there was no msg. msg = pdch->packet_paging_request(); I believe this should run even if tbf == NULL. if (msg) { LOGP(DRLCMACSCHED, LOGL_DEBUG, "Scheduling paging request " "message at RTS for (TRX=%d, TS=%d)\n", trx, ts); /* Updates the dl ctrl msg counter for ms */ tbf->ms()->update_dl_ctrl_msg(); <-------- added in December 2016 by radisys, probably bogus. return msg; } return NULL; }
Updated by neels almost 7 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 90
https://gerrit.osmocom.org/2420 was merged, but I'm only 90% convinced.
Updated by neels almost 7 years ago
- Has duplicate Bug #2241: PACCH pagings are omitted added
Updated by neels almost 7 years ago
- Related to Feature #2204: osmo-gsm-tester: add test case: CS paging while GPRS is active added
Updated by neels almost 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
I assume this is solved. We should have a dedicated test for this, see related issue.