I've hacked fake act delays into osmo-bts and tested various situations
with my SysmoBTS. (the prompt says "root@sysmobts-v2:~#" so I assume it's v2,
which is interesting because of the do_free condition above.)
Some may not be strictly related to this issue as reported, but I'd like to
discuss here and split into new issues later.
(1)
With a 10 second delay hacked into the TCH/H channel activation ack from
osmo-bts, the act ack comes after the lchan->act_timer expired and the
channel is marked BROKEN_UNUSABLE.
I do this only the first time, so the BSC would recover if it tried the
same lchan a second time.
(1a)
For a plain, non-dynamic TCH/H pchan, I observe that the lchan is
never recovered. It remains marked broken forever:
20160824153815711 DRLL <0000> chan_alloc.c:367 (bts=0,trx=0,ts=5,pchan=TCH/H) Allocating lchan=0 as TCH_H
20160824153815711 DRSL <0004> abis_rsl.c:1727 (bts=0,trx=0,ts=5,ss=0) Activating ARFCN(868) SS(0) lctype TCH_H r=CALL ra=0x47 ta=0
20160824153815711 DRSL <0004> abis_rsl.c:533 (bts=0,trx=0,ts=5,pchan=TCH/H) Tx RSL Channel Activate with act_type=INITIAL
20160824153815711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state NONE -> ACTIVATION REQUESTED
[osmo-bts doesn't respond with an act ack]
[4 seconds later, act_timer fires]
20160824153819711 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=0) TCH_H lchan broken: activation timeout
20160824153819711 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=0) state ACTIVATION REQUESTED -> BROKEN UNUSABLE
[another 6 seconds and the act ack comes in late]
20160824153825725 DRSL <0004> abis_rsl.c:1456 (bts=0,trx=0,ts=5,ss=0) CHANNEL ACTIVATE ACK
20160824153825725 DRSL <0004> abis_rsl.c:1146 (bts=0,trx=0,ts=5,ss=0) CHAN ACT ACK for broken channel.
[another 6 seconds pass and the BTS signals conn failure on RSL:]
20160824153831420 DRSL <0004> abis_rsl.c:1222 (bts=0,trx=0,ts=5,ss=0) CONNECTION FAIL: RELEASING state BROKEN UNUSABLE CAUSE=0x01(Radio Link Failure)
In abis_rsl.c:1222 rsl_rx_conn_fail(), the BSC could free the lchan, but does not because
the lchan state is not LCHAN_S_ACTIVE. rsl_rx_conn_fail() calls rsl_rf_chan_release_err():
/*
* Special handling for channel releases in the error case.
*/
static int rsl_rf_chan_release_err(struct gsm_lchan *lchan)
{
if (lchan->state != LCHAN_S_ACTIVE)
return 0;
return rsl_rf_chan_release(lchan, 1, SACCH_DEACTIVATE);
}
After this, the lchan remains marked broken:
OpenBSC> show lchan summary
BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
BTS 0, TRX 0, Timeslot 5 TCH/H, Lchan 0, Type NONE, State BROKEN UNUSABLE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
If nitb config here has only one TCH/H TS (the rest as SDCCH8 a.k.a. disabled)
the call does not succeed -- one TCH/H remains broken, and there is only one
working TCH/H but two phones wanting one.
If there are two TCH/H, the first TCH/H goes broken, but the call succeeds
because the phones get assigned a different, working TCH/H, which are still
available.
Nevertheless, it looks too harsh to keep this lchan broken forever without
even a second try.
(1b)
For dyn TS (TCH/F_TCH/H_PDCH), the situation is the same as for plain TCH/H.
Before being able to fix dyn TS, we should probably resolve the plain
TCH/* recovery.
(2)
With a 10 second delay hacked into the TCH/H channel deactivation ack
(activation ack back to normal), things look better. The lchan hits above
"Releasing it" condition and gets freed back to NONE state.
(2a)
For plain TCH/H, all is well.
20160824161705795 DRLL <0000> abis_rsl.c:1917 (bts=0,trx=0,ts=5,ss=1) SAPI=0 RELEASE INDICATION
20160824161705795 DRSL <0004> abis_rsl.c:807 (bts=0,trx=0,ts=5,ss=1) RF Channel Release
20160824161705798 DRSL <0004> abis_rsl.c:2334 (bts=0,trx=0,ts=5,ss=1) IPAC_DLCX_IND
[...]
20160824161709798 DRSL <0004> abis_rsl.c:1116 (bts=0,trx=0,ts=5,ss=1) TCH_H lchan broken: de-activation timeout
20160824161709798 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state RELEASE REQUESTED -> BROKEN UNUSABLE
[...]
20160824161715837 DRSL <0004> abis_rsl.c:864 (bts=0,trx=0,ts=5,ss=1) RF CHANNEL RELEASE ACK
20160824161715837 DRSL <0004> abis_rsl.c:882 (bts=0,trx=0,ts=5,ss=1) CHAN REL ACK for broken channel. Releasing it.
20160824161715837 DRSL <0004> abis_rsl.c:1126 (bts=0,trx=0,ts=5,ss=1) state BROKEN UNUSABLE -> NONE
OpenBSC> show lchan summary
BTS 0, TRX 0, Timeslot 0 CCCH+SDCCH4, Lchan 0, Type NONE, State ACTIVE - L1 MS Power: 0 dBm RXL-FULL-dl: -110 dBm RXL-FULL-ul: -110 dBm
(2b)
For dyn TS TCH/F_TCH/H_PDCH, the situation is the same, but a switchover
back to PDCH operation is indeed missing. Testing and fixing now...