Project

General

Profile

Actions

Bug #2176

closed

CS paging during active PS TBF (regression)

Added by laforge almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assignee:
Target version:
-
Start date:
04/24/2017
Due date:
% Done:

100%

Spec Reference:

Description

According to a use report, it seems that commit b78a4a6dfef217c538d45949a6ae725e22a36b05 in osmo-pcu has broken the successful delivery of CS paging messages via the PCU during ongoing TBFs.

Please verify / investigate / fix accordingly.


Related issues

Related to OsmoGSMTester - Feature #2204: osmo-gsm-tester: add test case: CS paging while GPRS is activeResolvedpespin04/27/2017

Actions
Has duplicate OsmoPCU - Bug #2241: PACCH pagings are omittedRejected05/05/2017

Actions
Actions #1

Updated by neels almost 7 years ago

Despite me being the author of the commit that seems to cause this behavior, unfortunately I have hardly a clue of the osmo-pcu code, nor what that particular function should do. I'll see how far I get:

commit b78a4a6dfef in osmo-pcu was:
"fix segfault: check for NULL tbf in sched_select_ctrl_msg()"

It came about this way:

I had come across a coverity warning complaining about a possible NULL dereference, and made a (naive) fix for it.
Then during the GSM operations at the 33C3 congress I found osmo-pcu crashing, and just by chance tried the fix for the coverity issue: the problem seemed to be solved by it.

The current problem report means there is a more complex problem than this NULL dereference. Checking the evolution of that function:

Dereference of tbf at "tbf->rotate_in_list();" existed as early as 2013 7a5f3c2153beb -- but apparently that only happens when originally msg != NULL, which apparently implies that tbf is then always also != NULL. So at first sight it looks like the code could never have worked, but on second thought there probably never was a missing tbf in that code path.

Two more dereferences were added shortly before I came across the segfault in 2016-12, commit da7250ad2c1 by radisys: tbf is dereferenced to "update the dl ctrl msg counter" in the case where there is originally a msg present (and presumably also tbf != NULL), but also for the paging case of msg NULL with presumably tbf NULL. I suspect that the paging case tbf dereference added by radisys is bogus.

The details I received for this bug report revolves around paging: in the failing version, no paging is seen on PACCH. The code path excluded by my 'if (!tbf) return NULL' happens to include a paging. I added the early return because I saw every code path below using tbf. But the abovementioned more recent patch was the one adding the tbf dereference in the paging code path below.

I attempt an improved fix in https://gerrit.osmocom.org/2420 -- hopefully review will confirm that I'm right.

Here is an annotated "map" of the code in question:

static struct msgb *sched_select_ctrl_msg(
                    uint8_t trx, uint8_t ts, uint32_t fn,
                    uint8_t block_nr, struct gprs_rlcmac_pdch *pdch,
                    struct gprs_rlcmac_tbf *ul_ass_tbf,
                    struct gprs_rlcmac_tbf *dl_ass_tbf,
                    struct gprs_rlcmac_ul_tbf *ul_ack_tbf)
{
        struct msgb *msg = NULL;
        struct gprs_rlcmac_tbf *tbf = NULL;
        struct gprs_rlcmac_tbf *next_list[3] = { ul_ass_tbf, dl_ass_tbf, ul_ack_tbf };

        for (size_t i = 0; i < ARRAY_SIZE(next_list); ++i) {
                tbf = next_list[(pdch->next_ctrl_prio + i) % 3]; 
                if (!tbf)
                        continue;  <-------- this loop at first looks like tbf shouldn't be NULL
                                             but in fact leaving the loop with tbf == NULL is possible.

                /*
                 * Assignments for the same direction have lower precedence,
                 * because they may kill the TBF when the CONTROL ACK is
                 * received, thus preventing the others from being processed.
                 */
                if (tbf == ul_ass_tbf && tbf->ul_ass_state ==
                                GPRS_RLCMAC_UL_ASS_SEND_ASS_REJ)
                        msg = ul_ass_tbf->create_packet_access_reject();
                else if (tbf == ul_ass_tbf && tbf->direction ==
                                GPRS_RLCMAC_DL_TBF)
                        if (tbf->ul_ass_state ==
                                        GPRS_RLCMAC_UL_ASS_SEND_ASS_REJ)
                                msg = ul_ass_tbf->create_packet_access_reject();
                        else
                                msg = ul_ass_tbf->create_ul_ass(fn, ts);
                else if (tbf == dl_ass_tbf && tbf->direction == GPRS_RLCMAC_UL_TBF)
                        msg = dl_ass_tbf->create_dl_ass(fn, ts);
                else if (tbf == ul_ack_tbf)
                        msg = ul_ack_tbf->create_ul_ack(fn, ts);

                if (!msg) {
                        tbf = NULL;
                        continue;
                }

                pdch->next_ctrl_prio += 1;
                pdch->next_ctrl_prio %= 3;
                break;
        }

        if (!msg) {
                /*
                 * If one of these is left, the response (CONTROL ACK) from the
                 * MS will kill the current TBF, only one of them can be
                 * non-NULL
                 */
                if (dl_ass_tbf) {
                        tbf = dl_ass_tbf;                          <-------- This looks like
                        msg = dl_ass_tbf->create_dl_ass(fn, ts);             msg != NULL means tbf != NULL
                } else if (ul_ass_tbf) {
                        tbf = ul_ass_tbf;
                        msg = ul_ass_tbf->create_ul_ass(fn, ts);
                }
        }

        if (!tbf)                <---------- My null dereference fix,
                return NULL;     <---------- this apparently breaks PACCH paging

        /* any message */
        if (msg) {
                tbf->rotate_in_list();                                <-------- exists since 2013.
                LOGP(DRLCMACSCHED, LOGL_DEBUG, "Scheduling control "            A pending message apparently
                        "message at RTS for %s (TRX=%d, TS=%d)\n",              implies a non-null tbf.
                        tbf_name(tbf), trx, ts);
                /* Updates the dl ctrl msg counter for ms */
                tbf->ms()->update_dl_ctrl_msg();                      <-------- added in December 2016 by radisys, probably ok
                return msg;
        }
        /* schedule PACKET PAGING REQUEST */       <------ here we handle a case where there was no msg.
        msg = pdch->packet_paging_request();               I believe this should run even if tbf == NULL.
        if (msg) {
                LOGP(DRLCMACSCHED, LOGL_DEBUG, "Scheduling paging request " 
                        "message at RTS for (TRX=%d, TS=%d)\n", trx, ts);

                /* Updates the dl ctrl msg counter for ms */
                tbf->ms()->update_dl_ctrl_msg();                      <-------- added in December 2016 by radisys, probably bogus.
                return msg;
        }

        return NULL;
}
Actions #2

Updated by neels almost 7 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 90

https://gerrit.osmocom.org/2420 was merged, but I'm only 90% convinced.

Actions #3

Updated by neels almost 7 years ago

  • Has duplicate Bug #2241: PACCH pagings are omitted added
Actions #4

Updated by neels almost 7 years ago

  • Related to Feature #2204: osmo-gsm-tester: add test case: CS paging while GPRS is active added
Actions #5

Updated by neels almost 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100

I assume this is solved. We should have a dedicated test for this, see related issue.

Actions #6

Updated by laforge over 6 years ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)