Bug #4694
closedRadio link timeout would never expire after RF-lock (resource leak)
100%
Description
I noticed that "RF-locking" a transceiver with active connections (e.g. voice calls) causes a resource leak: the BSC would continue to consider the associated logical channels occupied, so the MSC would also consider the CS connections active (if any). What's even more annoying is that the phones, that were previously connected and lost the signal, would be unable to attach again, because the MSC thinks that they still are.
23:34 < fixeria> interestingly, both BSC and MSC still consider the connection (silent call) as established 23:37 <@LaF0rge> fixeria: The BTS radio link timeout should still be counting down, which in turn should result in a channel releae after some seconds 23:37 <@LaF0rge> fixeria: if that doesn't happen, it's a bug (pleae file one) 23:38 <@LaF0rge> fixeria: so basically to the higher layers it doesn't matter why the RF connection is gone (no more signal, bad antenna/cable, broken power amplifier, or TRX locked) - the radio link is gone and it must detect that and handle it identical 23:39 <@LaF0rge> once the BTS reports radio link failure to the BSC, the BSC will start the release procedure for the A interface SCCP connection and the MSC will release all related resouces 23:39 < fixeria> LaF0rge: yep, looks like a resource leak 23:42 < fixeria> lol, after that osmo-msc refuses to accept a Location Update 23:43 < fixeria> "Cannot associate with VLR subscr, another connection is already active at IMSI-262423403******:MSISDN-******:TMSI-0x7CB52857:GERAN-A-2:PAGING_RESP" 23:45 < fixeria> LaF0rge: I guess the problem is that rf-locking causes osmo-bts to reset the scheduler, from what I see in the logs 23:47 < fixeria> "DL1C NOTICE scheduler.c:616 Exit scheduler for trx=0" then "DL1C NOTICE scheduler.c:591 Init scheduler for trx=0" 23:48 < fixeria> this is basically a result of trx_sched_reset() that calls trx_sched_exit() and then trx_sched_init() 23:49 < fixeria> so if we reset the scheduler, the BTS would simply "forget" all active connections and never report "radio link failure(s)" 23:59 <@LaF0rge> fixeria: I think the radio link timeout etc. are implemented above L1SAP 23:59 <@LaF0rge> fixeria: but maybe if no more events are coming up from the bts-model part, those are never triggered. 00:00 < fixeria> LaF0rge: ok, I'll open a ticket with all the findings
I think we can fix this by sending the RSL Radio Link Failure for all (still) active DCCHs as soon as the ramping down is completed.
And this seems to be what the other BTS models are doing when a transceiver gets RF-locked.
Related issues
Updated by fixeria over 3 years ago
- % Done changed from 0 to 40
I think we can fix this by sending the RSL Radio Link Failure for all (still) active DCCHs as soon as the ramping down is completed.
https://gerrit.osmocom.org/c/osmo-bts/+/19536 rsl: constify the 'lchan' argument of rsl_tx_conn_fail()
https://gerrit.osmocom.org/c/osmo-bts/+/19537 osmo-bts-trx: fix resource leak in bts_model_trx_deact_rf()
This change still needs to be tested, and should ideally be accompanied by a TTCN-3 test case. Leaving this up to somebody else.
Updated by laforge over 3 years ago
- Subject changed from Radio link temeout would never expire after RF-lock (resource leak) to Radio link timeout would never expire after RF-lock (resource leak)
Updated by laforge over 3 years ago
I'm not sure if we should introduce more "special purpose" code paths just for ramping down.
The fundamental problem seems to be that somehow the scheduler stops running, when all we want
is actually the radio transmitter (and possibly receiver) to stop running.
If the scheduler would continue to run, but not transmit or receive any
valid uplink bursts, the radio link timeout should happen automatically
without any special-case code paths.
Even if we only disabled transmission (or replaced all transmit bursts
with idle bursts), the UE would at some point stop transmitting, which
in turn means a radio link timeout happens.
Regards,
Harald
Updated by fixeria over 3 years ago
- Related to Bug #4696: osmo-bts-trx keeps sending dummy bursts after being RF-locked added
Updated by pespin over 3 years ago
AS dar as I remember reading the OML specs, switching Administrative State to Locked shouldn't automatically end up in calls being dropped and specific messages being sent from BTS to BSC, the spec simply says the BTS stops transmitting on the air interface physically, that's all. I agree with Harald that we shouldn't send any RSL Radio Link Failure or alike, but simply let the usual way trigger it.
Updated by fixeria over 3 years ago
- Status changed from New to Resolved
- % Done changed from 40 to 100
Fixed bty pespin, radio link timeout is now triggered as expected:
DMEAS NOTICE l1sap.c:1192 (bts=0,trx=0,ts=2,ss=0) radio link timeout counter S reached zero, dropping connection