Bug #1703
closedosmo-bts-lc15 100% CPU usage afte rdisconnecting OML/RSL link
100%
Description
Copied from an report received by e-mail:
Yesterday, I found an another issue related to closing of the OML/RSL link in osmo-bts-lc15.
This causes the LC15 unit unresponsive (100% CPU load) after disconnecting OML/RSL link from NITB side.
Below is description to generate this issue:
- Start Osmo-NITB
- Start LC15 BTS using command line in console
- Stop Osmo-NITB at moment the LC15 BTS software start to ramp up Tx power to introduce Abis link lost
- At that moment the osmo-bts-lc15 starts to eat 100% CPU time and system is unresponsive
Investigation:
- I put some traces in ' sign_link_down' function in osmo-bts/src/common/abis.c
- I found out the code stuck when trying to call trx_link_estab(trx) with trx->rsl_link was set to NULL earlier. See snapshot of code below
/* Then iterate over the RSL signalling links */ llist_for_each_entry(trx, &g_bts->trx_list, list) { if (trx->rsl_link) { e1inp_sign_link_destroy(trx->rsl_link); trx->rsl_link = NULL; trx_link_estab(trx); } }
In my opinion, I think that the 'trx->rsl_link = NULL;' need to be set to NULL after calling 'trx_link_estab(trx);' will solve this issue.
I have not seen 100% CPU load at LC15 unit with this solution so far.
Updated by laforge almost 8 years ago
laforge wrote:
In my opinion, I think that the 'trx->rsl_link = NULL;' need to be set to NULL after calling 'trx_link_estab(trx);' will solve this issue.
I think this merely works around the bug. If the signalling link
(trx->rsl_link) is destroyed, we should set the pointer to NULL.
What I'm wondering is why we should even try to re-estalish the link at
all. In the very early osmo-bts days, we actually attempted to recover
from such situations by re-establishing certain links directly.
However, to be more safe in those error paths, we adopted a 'fail fast'
policy: To exit osmo-bts and have systemd re-start/re-spawn it. So I
thinkthe proper solution would bet to simply delete the tx_link_estab()
line.
Updated by laforge almost 8 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
https://gerrit.osmocom.org/#/c/235/ has been merged.