Project

General

Profile

Actions

Bug #6191

closed

osmo-pcu from OBS latest feeds transmits Dummy blocks a few more FNs after last TBF in TS was freed

Added by pespin 5 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
-
Start date:
09/26/2023
Due date:
% Done:

100%

Spec Reference:

Description

After new release 1.3.0/1.3.1, it was spotted that the following tests pass in ttcn3-pcu-test (nightly) but not in ttcn3-pcu-test-latest (running1.3.1).

TC_n3105_max_t3195
TC_pcuif_suspend_active_tbf
TC_pdch_energy_saving
TC_ta_ptcch_idle
TC_ul_tbf_reestablish_with_pkt_resource_req_t3168

I can reproduce this locally using the docker setup with TC_n3105_max_t3195. Using default IMAGE_SUFFIX=master, the test passes. However, using IMAGE_SUFFIX=latest, the test fails.

This is strange since current master has no real big changes over latest release 1.3.1, only a commit modifying a bit a log message.
In fact, I tried running with IMAGE_SUFFIX=master and OSMO_PCU_BRANCH pointing to commit in 1.3.1, and then it passes too. So that means it has something to do with how osmo-pcu binary in the OBS package was built.

I attach a successful pcap and a failing pcap as reference.

Among other thing, the test expects that after T_3195 times out it should receive an empty dl data block from the PCU because there's no more TBFs active in that TS. This seems to work as expected when running against master or 1.3.1 compiled by the docket container, as seen in the good pcap. After the last DUMMY dl block is sent, one can see how the MS and TBF is freed and hence no more DUMMY dl blocks are sent.

However, when running against the package from OBS ("latest"), one can see that after the TBF and MS is freed, the PCU still submits DUMMY DL DATA blocks for a few (3-4) more FN rounds, which is wrong.

The following patch was submitted in the ttcn3 test to more clearly show the problem:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/34536 pcu: Fail immediately in TC_n3105_max_t3195

So all in all, it looks like:
1- Bug in the compiler used in OBS
2- Some bug of ours only appeareing when compiled against specific compile flags or compiler version

The logic on whether to submit a DL Dummy block or an empty one, is done in osmo-pcu at gprs_rlcmac_rcv_rts_block():

    /* Prio 3: send dummy control message if need to poll or USF */
    else {
        /* If there's no TBF attached to this PDCH, we can submit an empty
         * data_req since there's nothing to transmit nor to poll/USF. This
         * way we help BTS energy saving (on TRX!=C0) by sending nothing
         * instead of a dummy block. The early return is done here and
         * not at the start of the function because the condition below
         * (num_tbfs==0) may not be enough, because temporary dummy TBFs
         * created to send Imm Ass Rej (see handle_tbf_reject()) don't
         * have a TFI assigned and hence are not attached to the PDCH
         * TS, so they don't show up in the count below.
         */
        const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF)
                    + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF);
        bool skip_idle = (num_tbfs == 0);
#ifdef ENABLE_DIRECT_PHY
        /* In DIRECT_PHY mode we want to always submit something to L1 in
         * TRX0, since BTS is not preparing dummy bursts on idle TS for us */
        skip_idle = skip_idle && trx != 0;
#endif
        if (!skip_idle && (msg = sched_dummy())) {
            /* increase counter */
            gsmtap_cat = PCU_GSMTAP_C_DL_DUMMY;
        } else {
            msg = NULL; /* submit empty frame */
        }
    }

In theory ENABLE_DIRECT_PHY should not be enabled in the OBS build, so in summary, the dummy message should only be generated based on:

        const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF)
                    + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF);
        bool skip_idle = (num_tbfs == 0);
                if (!skip_idle)
                     msg = sched_dummy()


Files

Actions #1

Updated by pespin 5 months ago

Since a while ago (last release 1.3.0/1.3.1), debian/rules contain:

debian/rules
27:    dh_auto_configure -- --with-systemdsystemunitdir=/lib/systemd/system --enable-manuals --enable-er-e1-ccu

Mind the "--enable-er-e1-ccu" part.

This activates the following in configure.ac:

AC_MSG_CHECKING([whether to enable direct E1 CCU access for PDCH of Ericsson RBS])
AC_ARG_ENABLE(er-e1-ccu,
                AC_HELP_STRING([--enable-er-e1-ccu],
                                [enable code for Ericsson RBS E1 CCU [default=no]]),
                [enable_er_e1_ccu="$enableval"],[enable_er_e1_ccu="no"])
AC_MSG_RESULT([$enable_er_e1_ccu])
AM_CONDITIONAL(ENABLE_ER_E1_CCU, test "x$enable_er_e1_ccu" = "xyes")
AS_IF([test "x$enable_er_e1_ccu" = "xyes"], [
    PKG_CHECK_MODULES(LIBOSMOABIS, libosmoabis >= 1.5.0)
    PKG_CHECK_MODULES(LIBOSMOTRAU, libosmotrau >= 1.5.0)
])

which in turn activates:

if ENABLE_ER_E1_CCU
AM_CPPFLAGS += -DENABLE_DIRECT_PHY
endif

So in summary, after latest release, define ENABLE_DIRECT_PHY is enabled, while it wasn't before. This means ENABLE_DIRECT_PHY is being used by people potentially using the osmo-pcu binary against an osmo-bts-trx, which is unexpected.

Actions #2

Updated by pespin 5 months ago

As a result of ENABLE_DIRECT_PHY being enabled, Dummy dl blocks are always transmitted in TRX0, which is the case here. This unexpected when submitting the data to an osmo-bts-trx, breaks the testsuite and it's a regression which needs to be fixed and a new patch release made.

dexter this code path should only be enabled when transmitting against a BTS which is going through direct phy, and never when connected to an osmo-bts-trx:

#ifdef ENABLE_DIRECT_PHY
        /* In DIRECT_PHY mode we want to always submit something to L1 in
         * TRX0, since BTS is not preparing dummy bursts on idle TS for us */
        skip_idle = skip_idle && trx != 0;
#endif

Commit which introduced the regression:

commit 4bac39892399e2056fc03981140d654eb65b6e54
Author: Philipp Maier <pmaier@sysmocom.de>
Date:   Fri Feb 3 16:52:42 2023 +0100

    support for Ericsson RBS E1 CCU

    Ericsson RBS series BTSs do not have a built in PCU. Rather than having
    the PCU on board the PCU is co-located to the BSC in those cases. Just
    like the MGW the PCU would connect to the CCU (Channel Coding Unit) via
    E1 line. The PCU is connected via an unix domain socket (pcu_sock) to
    the BSC to receive RACH requests and to control Paging and TBF assignment.

    This patch adds all the required functionality to run an Ercisson RBS in
    GPRS/EGPRS mode. It supports 16k I.460 E1 subslots and full 64k E1
    timeslots (recommended)

    Change-Id: I5c0a76667339ca984a12cbd2052f5d9e5b0f9c4d
    Related: OS#5198

Actions #3

Updated by pespin 5 months ago

  • Status changed from New to Feedback
  • Assignee changed from pespin to dexter

dexter , first of all you should check if RBS works well if we send empty blocks (or no blocks) to it. It should generate dummy bursts as explained here:

        /* If there's no TBF attached to this PDCH, we can submit an empty
         * data_req since there's nothing to transmit nor to poll/USF. This
         * way we help BTS energy saving (on TRX!=C0) by sending nothing
         * instead of a dummy block. The early return is done here and
         * not at the start of the function because the condition below
         * (num_tbfs==0) may not be enough, because temporary dummy TBFs
         * created to send Imm Ass Rej (see handle_tbf_reject()) don't
         * have a TFI assigned and hence are not attached to the PDCH
         * TS, so they don't show up in the count below.
         */
        const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF)
                    + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF);
        bool skip_idle = (num_tbfs == 0);
#ifdef ENABLE_DIRECT_PHY
        /* In DIRECT_PHY mode we want to always submit something to L1 in
         * TRX0, since BTS is not preparing dummy bursts on idle TS for us */
        skip_idle = skip_idle && trx != 0;
#endif
    /* If there's no TBF attached to this PDCH, we can skip Tx of PTCCH
     * since there's nothing worthy of being transmitted. This way BTS can
     * identify idle blocks and send nothing or dumy blocks with reduced
     * energy for the sake of energy saving.
     */
    const unsigned num_tbfs = pdch->num_tbfs(GPRS_RLCMAC_DL_TBF)
                + pdch->num_tbfs(GPRS_RLCMAC_UL_TBF);
    bool skip_idle = (num_tbfs == 0);
#ifdef ENABLE_DIRECT_PHY
        /* In DIRECT_PHY mode we want to always submit something to L1 in
         * TRX0, since BTS is not preparing dummy bursts on idle TS for us: */
        skip_idle = skip_idle && trx != 0;
#endif

Then, we should add some way for osmo-pcu to know if a given BTS is an RBS, to know what to do based on which BTS it is generating a block for. sysmobts seems to be using:

        if ((info_ind->flags & PCU_IF_FLAG_SYSMO)
         && info_ind->trx[trx_nr].hlayer1) {

Actions #4

Updated by pespin 5 months ago

  • Priority changed from Normal to High

dexter this should be worked on soon since it may be creating problems with users of osmo-bts-trx using latest. We'll probably need to do a patch release once we fix it.

Actions #5

Updated by dexter 5 months ago

  • Status changed from Feedback to In Progress
Actions #6

Updated by dexter 5 months ago

  • % Done changed from 0 to 30

When I understand the problem correctly than the define constant ENABLE_DIRECT_PHY is used incorrectly. ENABLE_DIRECT_PHY only means that the support for direct phy access is enabled, it does not tell anything about if we actually accessing the phy directly during runtime. I have uploaded a fix to gerrit, but I didn't test it with TTCN3 yet:

https://gerrit.osmocom.org/c/osmo-pcu/+/34575 gprs_rlcmac_sched: check if we really use direct phy

Actions #7

Updated by dexter 5 months ago

To make it more visible what PCU_IF_FLAG_SYSMO is actually for, I have renamed it to PCU_IF_FLAG_DIRECT_PHY

https://gerrit.osmocom.org/c/osmo-pcu/+/34582 pcuif_proto: rename PCU_IF_FLAG_SYSMO to PCU_IF_FLAG_DIRECT_PHY [NEW]
https://gerrit.osmocom.org/c/osmo-bsc/+/34583 pcuif_proto: rename PCU_IF_FLAG_SYSMO to PCU_IF_FLAG_DIRECT_PHY [NEW]
https://gerrit.osmocom.org/c/osmo-bts/+/34584 pcuif_proto: rename PCU_IF_FLAG_SYSMO to PCU_IF_FLAG_DIRECT_PHY [NEW]
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/34585 rename sysmo_direct_dsp to direct_phy [NEW]

Actions #8

Updated by dexter 4 months ago

Unfortunately we found out that we cannot decide so easily if we have to generate dummy dl blocks or not as it really depends on the BTS model and not on the fact whether we use direct phy access or not. This means we have to signal the BTS model to the PCU. The following patches implement such a mechanism. Once they made it into master we can easily use the BTS model info to fix dl-block generation problem.

https://gerrit.osmocom.org/c/osmo-pcu/+/34647 pcu_l1_if: signal BTS model via PCUIF
https://gerrit.osmocom.org/c/osmo-bts/+/34648 pcuif_proto: signal BTS model via PCUIF
https://gerrit.osmocom.org/c/osmo-bsc/+/34649 pcuif_proto: signal BTS model via PCUIF

Actions #9

Updated by dexter 3 months ago

The patches above have enough review points to get merged, but we still cannot merge them right now. The TTCN3 testsuite also needs to be updated. Eventually it is also very important that the docker tests are aware of the PCUIF version change so that the configs are changed on the fly to the correct PCUIF version number (depending on whether we test on latest or on master).

This means that the following two patches must be merged together with the patches above at once:
https://gerrit.osmocom.org/c/docker-playground/+/34998 ttcn3-bts/pcu-test: Use PCUIF v12 for current master and PCUIF v11 for latest
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/34934 PCUIF: upgrade to PCUIF v12

Actions #10

Updated by dexter 3 months ago

The patches for PCU, BTS and BSC are ready to get merged. However we must not merge them without merging the corresponding patches for TTCN3 and docker. Those are the two patches mentioned in #9 (above). I have tested everything in docker locally to make sure there is no fallout. Everything looks good and I would not expect any problems.

Actions #11

Updated by laforge 3 months ago

On Mon, Nov 20, 2023 at 02:00:37PM +0000, dexter wrote:

The patches for PCU, BTS and BSC are ready to get merged. However we must not merge them without merging the corresponding patches for TTCN3 and docker. Those are the two patches mentioned in #9 (above). I have tested everything in docker locally to make sure there is no fallout. Everything looks good and I would not expect any problems.

if the review is sufficient, go ahead and merge all of them on a day when you are prepared to fix any
potential fall-out the next 1-2 days. So I think now is a good time as the week still has a few days
to catch fall-out or revert.

Actions #12

Updated by dexter 3 months ago

The patches are now all merged. Unfortunately there is a problem in ttcn3-bts-test-latest, the following patch should fix it:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/35090 PCUIF_Types: make PCUIF_info_ind field bts_model optional

Actions #13

Updated by dexter 3 months ago

  • % Done changed from 30 to 50

I have now revisited the older patches and fixed everything so that it uses the new bts->bts_model flag to decide whether we have to generate dummy blocks or not:

https://gerrit.osmocom.org/c/osmo-pcu/+/35096 gprs_rlcmac_sched: fix condition for generating dummy blocks on idle

Actions #14

Updated by dexter 3 months ago

  • % Done changed from 50 to 80

The last patch is now merged, as far as my understanding goes the problem should be fixed now. pespin can you check back if the problem that was observed initially is now fixed?

Actions #15

Updated by dexter 3 months ago

  • Status changed from In Progress to Feedback
  • Assignee changed from dexter to pespin
Actions #16

Updated by dexter 3 months ago

The few testcases that currently fail were indeed related to PCUIF (not protocol wise, but we didn't set a realistic BTS model yet)

I hope this fixes the problem:
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/35144 PCU_Tests: tell PCU a more realistic BTS model

Actions #17

Updated by dexter 3 months ago

  • % Done changed from 80 to 100

The patch above is merged. As far as I can see ttcn3-pcu-test and its variants look good. As far as I can see this ticket could be closed now.

Actions #18

Updated by fixeria 3 months ago

For the record, we had a Jitsi call with jolly who volunteered to do some testing. We decided that it would be best to confirm that things are working as expected by sniffing the Um interface.

For this purpose, I hacked up a GPRS sniffer based on the Sylvain's burst_ind hack:

https://cgit.osmocom.org/osmocom-bb/log/?h=fixeria/burst_ind_gprs

All you need to do is:

  • patch osmocom, in case you're using CP2102 (grep for I_HAVE_A_CP210x);
  • patch timeslot number / TSC in app_ccch_scan.c (grep for RSL_CHAN_OSMO_PDCH);
  • compile the firmware and load the 'layer1';
  • run ccch_scan -i 127.0.0.1 -a ARFCN.
Actions #19

Updated by fixeria 3 months ago

fixeria wrote in #note-18:

All you need to do is:
  • run ccch_scan -i 127.0.0.1 -a ARFCN.

This app writes raw BURST.ind into a file and additionally decodes PDTCH blocks on the fly, so you should see GSMTAP in wireshark.
The BURST.ind captures (looking like bursts_20231106_2218_1018_1767319_c7.dat) can later be decoded using the gprsdecode tool:

fixeria@LEGION:/home/fixeria/projects/osmocom/burst_ind/src/host/gprsdecode$ ./gprsdecode -i 127.0.0.1 -c /tmp/bursts_20231106_2218_1018_1766812_c7.dat
Actions #20

Updated by fixeria 3 months ago

Together with jolly we found in the code that in theory sysmoBTS (and the likes) can handle empty PDCH DATA.req. Here is the related code:

static int ph_data_req(struct gsm_bts_trx *trx, struct msgb *msg,
                       struct osmo_phsap_prim *l1sap, bool use_cache)
{    
        struct femtol1_hdl *fl1 = trx_femtol1_hdl(trx);
        struct msgb *l1msg = l1p_msgb_alloc();
        struct gsm_lchan *lchan;
        uint32_t u32Fn;
        uint8_t u8Tn, subCh, u8BlockNbr = 0, sapi = 0;
        uint8_t chan_nr, link_id;
        int len;

        if (!msg) {
                LOGP(DL1C, LOGL_FATAL, "PH-DATA.req without msg. " 
                        "Please fix!\n");
                abort();
        }

        len = msgb_l2len(msg);  // <-- (!) segfault, jolly has a patch

        // ...

        /* convert l1sap message to GsmL1 primitive, keep payload */
        if (len) {
                // ...
        } else {
                /* empty frame */
                GsmL1_Prim_t *l1p = msgb_l1prim(l1msg);

                empty_req_from_l1sap(l1p, fl1, u8Tn, u32Fn, sapi, subCh, u8BlockNbr);
        }

Using the GPRS Um sniffer it should be possible to see what exactly it's transmitting in this case.

Actions #21

Updated by pespin 2 months ago

  • Assignee changed from pespin to osmith

I think this ticket can be closed whenever a new osmo-pcu is released.
We should check that ttcn3-pcu-test-latest PCU_Tests.TC_n3105_max_t3195 turns green after releasing.

Assigning to osmith since I think he's tracking some ticket already about those releases.

Actions #22

Updated by osmith 2 months ago

  • Status changed from Feedback to In Progress
  • % Done changed from 100 to 80

ack, preparing a release.

I've looked at commits since the last release and thought about whether tagging a new release from current master or whether backporting patches. I will make backports since current master requires a new libosmocore: https://gerrit.osmocom.org/c/osmo-pcu/+/34521

Actions #23

Updated by osmith 2 months ago

  • % Done changed from 80 to 90

-> https://gerrit.osmocom.org/c/osmo-pcu/+/35320 and previous patches

pespin wrote in #note-21:

I think this ticket can be closed whenever a new osmo-pcu is released.
We should check that ttcn3-pcu-test-latest PCU_Tests.TC_n3105_max_t3195 turns green after releasing.

I've verified that it turns green:
https://jenkins.osmocom.org/jenkins/job/osmith-ttcn3-pcu-test/test_results_analyzer/

I went with 1.4.0 instead of 1.3.2, because there is a new PCUIF version.

Actions #25

Updated by osmith 2 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 90 to 100
Actions #26

Updated by fixeria 2 months ago

  • Status changed from Resolved to In Progress
  • Assignee changed from osmith to fixeria
  • % Done changed from 100 to 90

We're not done yet, docker-playground.git needs to be updated to use PCUIFv12 for osmo-pcu v1.4.0 too:

https://gerrit.osmocom.org/c/docker-playground/+/35368 ttcn3-pcu-test: set PCUIFv12 for both -master and -latest

Actions #27

Updated by fixeria 2 months ago

I submitted a patch for ttcn3-bts-test:

https://gerrit.osmocom.org/c/docker-playground/+/35369 ttcn3-bts-test: set PCUIFv12 for both -master and -latest

This patch also fixes ttcn3-bts-test-2023q1 by setting PCUIF version to 10, like we already do for ttcn3-pcu-test-2023q1.

Actions #28

Updated by fixeria 2 months ago

More PCUIF related patches for osmo-ttcn3-hacks.git:

https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/35371 library/PCUIF_Types: fix padding attribute for PCUIF_Message
https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/35372 library/PCUIF_Types: clean up, drop remains of PCUIFv10

Actions #29

Updated by fixeria 2 months ago

  • Status changed from In Progress to Feedback
  • % Done changed from 90 to 100
Actions #30

Updated by fixeria 2 months ago

  • Status changed from Feedback to Resolved

Ok, ttcn3-pcu-test-latest looks good today:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-pcu-test-latest/1778/
https://jenkins.osmocom.org/jenkins/view/TTCN3-centos/job/TTCN3-centos-pcu-test-latest/1018/

  • [PASS] TC_n3105_max_t3195
  • [PASS] TC_pcuif_suspend_active_tbf
  • [PASS] TC_pdch_energy_saving
  • [PASS] TC_ul_tbf_reestablish_with_pkt_resource_req_t3168

Also, TTCN3-centos-bts-test-2023q1 (all PCUIF related tests were broken) looks a lot better now:

https://jenkins.osmocom.org/jenkins/view/TTCN3-centos/job/TTCN3-centos-bts-test-2023q1/

Now we're done. Marking as resolved.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)