Project

General

Profile

Actions

Bug #5020

closed

osmo-pcu: Poll and SBA allocation improvements and fixes

Added by pespin about 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
02/11/2021
Due date:
% Done:

100%

Spec Reference:

Description

Current implementations are quite basic and as far as i can tell prone to errors and sometimes unable to schedule CTRL polling on TBFs (RRBP) or SBAs (RACH).

  • SBA blocks allocated (reserved) are stored in SBAController class under gprs_rlcmac_bts object [bts_sba(bts)]. It sets the SBA block on FN + "AGCH_START_OFFSET=52" (also theoretically linked to timer X2002).
  • SBAController doesn't seem to have into account that a TBF may have already reserved that FN when allocating a SBA (hence either there may be a collision at that time).
  • Current allocation of TBF polls consist on always allocating poll on RRBP as FN+13. This has the advantage that there's no need to check for collisions between different TBFs on a given PDCH, since on a given FN only a TBF is selected/executed and hence only that one can even reserve FN+13 (because the +13 is fixed offset). However, it may happen sometimes that a TBF tries to allocate FN+13 (tbf->check_polling()/set_polling()), but that FNa+13 is already taken by a previous SBA which allocated it to FNb+52 such that FNb=FNa-52+13, and hence the TBF fails to allocate a poll for that CTRL message, which in the end it means it ends up failing to construct the CTRL message and the scheduler ends up sending a RLCMAC Dummy Block instead.
  • Since that can happen, the tbf->check_polling() should try harder finding a poll FN by trying other RRBP values than FN+13.
  • TBF class implementation of polling seems to only support 1 active poll at a time. That means, If for instance scheduler selects the same TBF at let's say FNa=10 and immediatelly afeter at FNb=10+4=14, If both where to request poll allocation, it would fail in tbf->check_polling() due to "poll_state != GPRS_RLCMAC_POLL_NONE" being hit. That's basically because the TBF class stores the active poll FN in class attribute "tbf->poll_fn/ts". So ideally there should be some sort of linkedlist or circular buffer or whatever supporting several active pollings. This becomes more important at the time we start using bigger values for RRBP, where the "poll active" state can be there for longer periods and hence probability to hti this issue increments.

Related specs:
3GPP TS 44.060 "10.4.5 Relative Reserved Block Period (RRBP) field"


Related issues

Related to OsmoPCU - Bug #5033: N3101 potentially being updated only during RBBP poll but not during USF pollResolvedpespin02/17/2021

Actions
Related to OsmoPCU - Feature #5122: Choose SBA allocation offset based on AGCH queue load in the BTSNewpespin04/20/2021

Actions
Actions #1

Updated by pespin about 3 years ago

The last point (TBF only supporting 1 active poll FN) is worse than expected, because PollController seems to be timing out non-acked polls for a larger period of time than the FN+13 one:

void bts_set_current_frame_number(struct gprs_rlcmac_bts *bts, int fn)
{
    /* The UL frame numbers lag 3 behind the DL frames and the data
     * indication is only sent after all 4 frames of the block have been
     * received. Sometimes there is an idle frame between the end of one
     * and start of another frame (every 3 blocks).  So the timeout should
     * definitely be there if we're more than 8 frames past poll_fn. Let's
     * stay on the safe side and say 13 or more. An additional delay can
     * happen due to the block processing time in the DSP, so the delay of
     * decoded blocks relative to the timing clock can be much larger.
     * Values up to 50 frames have been observed under load. */
    const static int max_delay = 60;

    bts->cur_fn = fn;
    bts->pollController->expireTimedout(bts->cur_fn, max_delay);
}

For the ACKed one's is fine, since gprs_rlcmac_pdch::rcv_control_ack() calls:

TBF_POLL_SCHED_UNSET(tbf);

Actions #2

Updated by pespin about 3 years ago

  • Related to Bug #5033: N3101 potentially being updated only during RBBP poll but not during USF poll added
Actions #3

Updated by pespin about 3 years ago

Some more facts after further investigation:

  • bts->cur_fn holds the current/last MAC block received on the UL, so it drives the clock.
  • The bts->cur_fn ield is updated in 2 code paths:
    • bts_set_current_frame_number: the main way, triggered by PCUIF time_ind received on every start of MAC block FN from BTS (even when direct phy is used, TIME.IND in lower layers are only received in BTS)
    • bts_set_current_block_frame_number: used only by sysmo direct PHY (not used by other direct phy like oc2g) to correct the clock (bts->cur_fn) in case there was some drift detected between what comes from PCUIF info_ind and its own direct phy.

So, the idea here would be to:
1- Make sure generic osmo-pcu code supports receiving data_len=0 minimally: "pcu_rx_data() => pcu_rx_data_ind_pdtch() => pdch->rcv_block()" This should already be fine OK since mcs_get_by_size_ul() will return UNKNOWN=0 and early return printing an error. So this first step is done.
2- Make sure all phys always send data_ind even if no data received, or corrupted, by setting data_len = 0 (aka NOPE.ind)
3- In osmo-pcu, ignore received time_ind messages (at most cross-check the FN and print warning otherwise), and update the clock (bts_set_current_frame_number) during data_ind instead.
4- In osmo-pcu upper layers, stored scheduled USF for that BTS+TRX+TS+FN and if data_ind data_len=0 (NOPE.ind), then increment N3101 for that TBF.

Step 2 (backend sending NOPE.ind data_len=0) requires:
2.1- Allow direct phy osmo-pcu code to submit corrupt/empty data_ind. For example, for sysmo direct phy:

handle_ph_data_ind (sysmo_l1_if.c)
    pcu_rx_block_time
        bts_set_current_block_frame_number    <--- RX FN number is corrected here
            bts->pollController->expireTimedout(fn, max_delay);
    if (data_ind->msgUnitParam.u8Size == 0)
        return -1;                              <--- NOPE.ind is discarded here
    " /* drop incomplete UL block */ if (data_ind->msgUnitParam.u8Buffer[0] != GsmL1_PdtchPlType_Full) break;    <--- NOPE.ind is discarded here
    pcu_rx_data_ind_pdtch

2.1- Allow indrect phy from osmo-bts to send PCUIF INFO_IND with data_len=0 (osmo-bts.git/src/common/l1sap.c):
l1sap_ph_data_ind:
    if (ts_is_pdch(&trx->ts[tn])) {
        ...
        /* don't send bad frames to PCU */
        if (len == 0)
            return -EINVAL;            <--- NOPE.ind is discarded here
        /* drop incomplete UL block */
        if (pr_info != PRES_INFO_BOTH)
            return 0;                  <--- NOPE.ind is discarded here
        pcu_tx_data_ind(...)               <----- This function contains a memcpy(x, y, len) which must be done conditonal on len>0 (undefined behavior of memcpy for len=0).
        reutrn 0;
    }

Actions #4

Updated by pespin about 3 years ago

I updated osmo-bts-trx, osmo-pcu and TTCN3 PCUIF_Components to send all DATA.ind even if len=0. Related patches:

osmo-bts.git: osmo-pcu.git: osmo-ttcn3-hacks.git: docker-plaground.git:

I did some manual tests with osmo-bts-trx and osmo-pcu patches applied and everything looks good.
I also checked that TTCN3 patches still work fine with older versions of osmo-pcu.

osmo-ttcn3-hacks + docker can already be merged. The first osmo-pcu patch silently ignorig data_len=0 too. Other ones I want to first implement it for all BTS types and are marked as WIP in gerrit.

Actions #5

Updated by pespin about 3 years ago

pespin wrote:

osmo-bts.git:
  • TODO: implement change for lower layers of other bts types!

Actually the code in osmo-bts.git is OK and already letting pass of size=0 and alike, due to previous work we already did with TCH channels.
The missing changes are in osmo-pcu for the direct phy case. I submitted a patch here:
https://gerrit.osmocom.org/c/osmo-pcu/+/23268 direct_phy: Support submitting DATA.ind with len=0 to upper layers

Actions #6

Updated by pespin about 3 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 60

remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23486 Fix: left shift cannot be repesented in type int [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23487 sched: Fix scheduling UL TBF not matching conditions [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23488 sched: Simplify usf selection code [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23489 Set matching USF if available when polling a UL TBF [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23490 pdch: Add mising pdch_ulc_release_node in Rx Cell Change Notif [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23491 pdch_ulc: Create helper API pdch_ulc_release_node [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23492 Track scheduled UL blocks through USF [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23493 Properly implement N3101 [NEW]

After this bunch of patches become merged, we already track all sort of UL blocks in a unified way. What's missing regarding this ticket:
  • Improve SBA scheduler to offer something else than N+52 (AGCH_START_OFFSET)
  • Improve SBA/POLL scheduler to offer something else than N+13
Actions #7

Updated by pespin about 3 years ago

So, as per SBA allocation (Imm Assignment on CCCH):

The scheduled FN for the SBA is currently hardcoded to 52 (sba.c):

/* starting time for assigning single slot
 * This offset must be a multiple of 13. */
#define AGCH_START_OFFSET 52

struct gprs_rlcmac_sba *sba_alloc(void *ctx, struct gprs_rlcmac_pdch *pdch, uint8_t ta)
{
    struct gprs_rlcmac_sba *sba;
    sba = talloc_zero(ctx, struct gprs_rlcmac_sba);
    if (!sba)
        return NULL;

    sba->pdch = pdch;
    sba->ta = ta;

    /* TODO: request ULC for next available FN instead of hardcoded AGCH_START_OFFSET */
    sba->fn = next_fn(pdch->last_rts_fn, AGCH_START_OFFSET);

    pdch_ulc_reserve_sba(pdch->ulc, sba);
    return sba;
}

Then, that FN is fed into Imm Assignment when encoding it (bts.cpp bts_rcv_rach):

sba = bts_alloc_sba(bts, ta);
sb_fn = sba->fn;
        plen = Encoding::write_immediate_assignment(
            &bts->trx[trx_no].pdch[ts_no], tbf, bv,
            false, rip->ra, Fn, ta, usf, false, sb_fn,
            bts_get_ms_pwr_alpha(bts), bts->pcu->vty.gamma, -1,
            rip->burst_type);

And sba_Fn is used here in Encoding::write_immediate_assignment as ref_fn to indicate the offset to the MS to start using it:

    bitvec_write_field(dest, &wp,(ref_fn / (26 * 51)) % 32,5); // T1'
    bitvec_write_field(dest, &wp,ref_fn % 51,6);               // T3
    bitvec_write_field(dest, &wp,ref_fn % 26,5);               // T2

Now, the question is why is 52 being used, and why it must be multiple of 13... Need to find that in the spec.

The symbol AGCH_START_OFFSET and the related comment about being multiple of 13 was added here, without much explanation:

commit 07e97cf8a551b05d7f5f3f9583b68b2eff0f1c23
Author: Andreas Eversberg <jolly@eversberg.eu>
Date:   Tue Aug 7 16:00:56 2012 +0200

    Adding single block allocation

    It is mandatory to support it because MS may request a single block.
    In this case the network must assign a single block.

    It is possible to force single block allocation for all uplink requests
    on RACH. (VTY option)

Actions #8

Updated by pespin almost 3 years ago

After these patches, FN for SBA and RRBP poll is scheduled based on available/unreserved FNs in the UL scheduler:
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23520 Pick unreserved UL FN when allocating an SBA
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23522 pdch_ulc: Support picking RRBP other than N+13 [NEW]

So, after those are merged, what's TODO:
  • modify SBA allocator offset in sba_alloc() based on AGCH queue load in the BTS
  • Update gprs_rlcmac_tbf poll_state/poll_fn/poll_ts implementation to support multiple concurrent polls allocated.
Actions #9

Updated by pespin almost 3 years ago

  • % Done changed from 60 to 80

I submitted a bunch of commits to get rid of old tbf's poll_fn/poll_ts infra which only supported 1 concurrent poll request per tbf here:
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23525 pdch_ulc: Store TBF poll reason [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23526 tbf: Get rid of unneeded poll_scheduled() [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23527 tbf: Allow multiple concurrent polls [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23528 Remove unneeded poll_state check [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23529 tbf: get rid of poll_state completely [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23530 Get rid of param 'poll' with constant value [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23531 tbf: Get rid of attribute poll_fn [NEW]
remote: https://gerrit.osmocom.org/c/osmo-pcu/+/23532 tbf: Get rid of attribute poll_ts [NEW]

Actions #10

Updated by pespin almost 3 years ago

  • Related to Feature #5122: Choose SBA allocation offset based on AGCH queue load in the BTS added
Actions #11

Updated by pespin almost 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

Patches were merged.

I created a separate ticket to track the SBA allocation fixed delay here:
https://osmocom.org/issues/5122

Hence, closing this ticket. Other detailed ticket can be created in the future to track specific issues/improvements.

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)