Bug #1798
closeddynpdch and repairing of broken channels
100%
Description
Going through the lchan issue and looking at rsl_rx_rf_chan_rel_ack I see that there is a path we don't do the pdch switching? What is the reason for it?
static int rsl_rx_rf_chan_rel_ack(struct gsm_lchan *lchan) { DEBUGP(DRSL, "%s RF CHANNEL RELEASE ACK\n", gsm_lchan_name(lchan)); /* Stop all pending timers */ osmo_timer_del(&lchan->act_timer); osmo_timer_del(&lchan->T3111); /* * The BTS didn't respond within the timeout to our channel * release request and we have marked the channel as broken. * Now we do receive an ACK and let's be conservative. If it * is a sysmoBTS we know that only one RF Channel Release ACK * will be sent. So let's "repair" the channel. */ if (lchan->state == LCHAN_S_BROKEN) { int do_free = is_sysmobts_v2(lchan->ts->trx->bts); LOGP(DRSL, LOGL_NOTICE, "%s CHAN REL ACK for broken channel. %s.\n", gsm_lchan_name(lchan), do_free ? "Releasing it" : "Keeping it broken"); if (do_free) do_lchan_free(lchan); !!!! !!!! No switch of the PDCH here! !!!! !!!! return 0; } if (lchan->state != LCHAN_S_REL_REQ && lchan->state != LCHAN_S_REL_ERR) LOGP(DRSL, LOGL_NOTICE, "%s CHAN REL ACK but state %s\n", gsm_lchan_name(lchan), gsm_lchans_name(lchan->state)); do_lchan_free(lchan); /* * Put a dynamic TCH/F_PDCH channel back to PDCH mode iff it was * released successfully. If in error, the PDCH ACT will follow after * T3111 in error_timeout_cb(). * * Any state other than LCHAN_S_REL_ERR became LCHAN_S_NONE after above * do_lchan_free(). Assert this, because that's what ensures a PDCH ACT * on a dynamic channel in all cases. */ OSMO_ASSERT(lchan->state == LCHAN_S_NONE || lchan->state == LCHAN_S_REL_ERR); if (lchan->ts->pchan == GSM_PCHAN_TCH_F_PDCH && lchan->state == LCHAN_S_NONE) return rsl_ipacc_pdch_activate(lchan->ts, 1); return 0; }
Updated by neels over 7 years ago
The reason is probably that I have not yet covered that case.
To discuss, let's describe the various facets of this:
dyn type:
- ip.access style (TCH/F_PDCH)
- Osmocom style (TCH/F_TCH/H_PDCH)
broken channel:
- marked broken in BSC
- incoming chan rel ack (this issue)
- incoming chan act ack ( https://gerrit.osmocom.org/713 )
- marked broken in BTS
- ?
I'm trying to figure out how these things relate to each other.
Any hints/facts would be welcome.
Updated by neels over 7 years ago
broken channel state can come from
- chan act timeout
- chan deact timeout
- rx chan act nack
(see rsl_lchan_mark_broken() in abis_rsl.c)
Updated by neels over 7 years ago
I've hacked fake act delays into osmo-bts and tested various situations
with my SysmoBTS. (the prompt says "root@sysmobts-v2:~#" so I assume it's v2,
which is interesting because of the do_free condition above.)
Some may not be strictly related to this issue as reported, but I'd like to
discuss here and split into new issues later.
(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.
I do this only the first time, so the BSC would recover if it tried the
same lchan a second time.
(1a)
For a plain, non-dynamic TCH/H pchan, I observe that the lchan is
never recovered. It remains marked broken forever:
20160824153815711 DRLL <0000> chan_alloc.c:367 (bts=0,trx=0,ts=5,pchan=TCH/H) Allocating lchan=0 as TCH_H 20160824153815711 DRSL <0004> abis_rsl.c:1727 (bts=0,trx=0,ts=5,ss=0) Activating ARFCN(868) SS(0) lctype TCH_H r=CALL ra=0x47 ta=0 20160824153815711 DRSL <0004> abis_rsl.c:533 (bts=0,trx=0,ts=5,pchan=TCH/H) Tx RSL Channel Activate with act_type=INITIAL 20160824153815711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state NONE -> ACTIVATION REQUESTED [osmo-bts doesn't respond with an act ack] [4 seconds later, act_timer fires] 20160824153819711 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=0) TCH_H lchan broken: activation timeout 20160824153819711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state ACTIVATION REQUESTED -> BROKEN UNUSABLE [another 6 seconds and the act ack comes in late] 20160824153825725 DRSL <0004> abis_rsl.c:1456 (bts=0,trx=0,ts=5,ss=0) CHANNEL ACTIVATE ACK 20160824153825725 DRSL <0004> abis_rsl.c:1146 (bts=0,trx=0,ts=5,ss=0) CHAN ACT ACK for broken channel. [another 6 seconds pass and the BTS signals conn failure on RSL:] 20160824153831420 DRSL <0004> abis_rsl.c:1222 (bts=0,trx=0,ts=5,ss=0) CONNECTION FAIL: RELEASING state BROKEN UNUSABLE CAUSE=0x01(Radio Link Failure)
In abis_rsl.c:1222 rsl_rx_conn_fail(), the BSC could free the lchan, but does not because
the lchan state is not LCHAN_S_ACTIVE. rsl_rx_conn_fail() calls rsl_rf_chan_release_err():
/* * Special handling for channel releases in the error case. */ static int rsl_rf_chan_release_err(struct gsm_lchan *lchan) { if (lchan->state != LCHAN_S_ACTIVE) return 0; return rsl_rf_chan_release(lchan, 1, SACCH_DEACTIVATE); }
After this, the lchan remains marked broken:
OpenBSC> show lchan summary BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm BTS 0, TRX 0, Timeslot 5 TCH/H, Lchan 0, Type NONE, State BROKEN UNUSABLE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
If nitb config here has only one TCH/H TS (the rest as SDCCH8 a.k.a. disabled)
the call does not succeed -- one TCH/H remains broken, and there is only one
working TCH/H but two phones wanting one.
If there are two TCH/H, the first TCH/H goes broken, but the call succeeds
because the phones get assigned a different, working TCH/H, which are still
available.
Nevertheless, it looks too harsh to keep this lchan broken forever without
even a second try.
(1b)
For dyn TS (TCH/F_TCH/H_PDCH), the situation is the same as for plain TCH/H.
Before being able to fix dyn TS, we should probably resolve the plain
TCH/* recovery.
(2)
With a 10 second delay hacked into the TCH/H channel deactivation ack
(activation ack back to normal), things look better. The lchan hits above
"Releasing it" condition and gets freed back to NONE state.
(2a)
For plain TCH/H, all is well.
20160824161705795 DRLL <0000> abis_rsl.c:1917 (bts=0,trx=0,ts=5,ss=1) SAPI=0 RELEASE INDICATION 20160824161705795 DRSL <0004> abis_rsl.c:807 (bts=0,trx=0,ts=5,ss=1) RF Channel Release 20160824161705798 DRSL <0004> abis_rsl.c:2334 (bts=0,trx=0,ts=5,ss=1) IPAC_DLCX_IND [...] 20160824161709798 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=1) TCH_H lchan broken: de-activation timeout 20160824161709798 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state RELEASE REQUESTED -> BROKEN UNUSABLE [...] 20160824161715837 DRSL <0004> abis_rsl.c:864 (bts=0,trx=0,ts=5,ss=1) RF CHANNEL RELEASE ACK 20160824161715837 DRSL <0004> abis_rsl.c:882 (bts=0,trx=0,ts=5,ss=1) CHAN REL ACK for broken channel. Releasing it. 20160824161715837 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state BROKEN UNUSABLE -> NONE
OpenBSC> show lchan summary BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...
Updated by zecke over 7 years ago
On 24 Aug 2016, at 16:24, neels [REDMINE] <redmine@lists.osmocom.org> wrote:
Issue #1798 has been updated by neels.
(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.
yes, that is known (see my other gerrit change to release the channel in that case)
(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...
that was the point. Good you have an understanding of the issue now.
Updated by neels over 7 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 90