Project

General

Profile

Actions

Bug #1703

closed

osmo-bts-lc15 100% CPU usage afte rdisconnecting OML/RSL link

Added by laforge almost 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
05/09/2016
Due date:
% Done:

100%

Spec Reference:

Description

Copied from an report received by e-mail:

Yesterday, I found an another issue related to closing of the OML/RSL link in osmo-bts-lc15.
This causes the LC15 unit unresponsive (100% CPU load) after disconnecting OML/RSL link from NITB side.

Below is description to generate this issue:

- Start Osmo-NITB
- Start LC15 BTS using command line in console
- Stop Osmo-NITB at moment the LC15 BTS software start to ramp up Tx power to introduce Abis link lost
- At that moment the osmo-bts-lc15 starts to eat 100% CPU time and system is unresponsive

Investigation:
- I put some traces in ' sign_link_down' function in osmo-bts/src/common/abis.c
- I found out the code stuck when trying to call trx_link_estab(trx) with trx->rsl_link was set to NULL earlier. See snapshot of code below

/* Then iterate over the RSL signalling links */
        llist_for_each_entry(trx, &g_bts->trx_list, list) {
                if (trx->rsl_link) {
                        e1inp_sign_link_destroy(trx->rsl_link);
                        trx->rsl_link = NULL;
                        trx_link_estab(trx);
                }
        }

In my opinion, I think that the 'trx->rsl_link = NULL;' need to be set to NULL after calling 'trx_link_estab(trx);' will solve this issue.

I have not seen 100% CPU load at LC15 unit with this solution so far.

Actions #1

Updated by laforge almost 8 years ago

laforge wrote:

In my opinion, I think that the 'trx->rsl_link = NULL;' need to be set to NULL after calling 'trx_link_estab(trx);' will solve this issue.

I think this merely works around the bug. If the signalling link
(trx->rsl_link) is destroyed, we should set the pointer to NULL.

What I'm wondering is why we should even try to re-estalish the link at
all. In the very early osmo-bts days, we actually attempted to recover
from such situations by re-establishing certain links directly.
However, to be more safe in those error paths, we adopted a 'fail fast'
policy: To exit osmo-bts and have systemd re-start/re-spawn it. So I
thinkthe proper solution would bet to simply delete the tx_link_estab()
line.

Actions #2

Updated by laforge almost 8 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)