Bug #5324
closedMULTI BSS Handover: Target BTS is NULL, sigsegv in chan_counts_for_bts()
100%
Description
It looks like something with Multi BSS handover is broken:
DHODEC handover_decision_2.c:1470 (lchan 0.020 TCH_F SPEECH_AMR) (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) MEASUREMENT REPORT (1 neighbors) DHODEC handover_decision_2.c:1475 (lchan 0.020 TCH_F SPEECH_AMR) (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) 0: arfcn=247 bsic=63 neigh_idx=0 rxlev=63 flags=0 DHODEC handover_decision_2.c:1522 (lchan 0.020 TCH_F SPEECH_AMR) (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) Avg RX level = -47 dBm, +0 dBm AFS bias = -47 dBm; Avg RX quality = 0, +0 AFS bias = 0 DHODEC handover_logic.c:241 (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) HO-none: There are explicit neighbors configured for this cell DHODEC handover_logic.c:254 (subscr subscr-IMSI-262423203000396-TMSI-0x7a3e1e7f) HO-none: Found remote target cell(s) CGI[1]:{334-07-274-101} Program received signal SIGSEGV, Segmentation fault. 0x00005555555b98de in chan_counts_for_bts (bts_counts=bts_counts@entry=0x7fffffff7a70, bts=0x0) at chan_counts.c:133
From a previous run:
#0 chan_counts_for_bts (bts_counts=bts_counts@entry=0x7fffffff7a10, bts=0x0) at chan_counts.c:137 #1 0x00005555555c0cad in candidate_set_free_tch (c=c@entry=0x7fffffff8240) at handover_decision_2.c:1030 #2 0x00005555555c2bd7 in collect_handover_candidate (lchan=lchan@entry=0x7ffff7e9f1f0, nmp=nmp@entry=0x7ffff7e9f36c, clist=clist@entry=0x7fffffff89b0, candidates=candidates@entry=0x7fffffff899c, include_weaker_rxlev=include_weaker_rxlev@entry=false, rxlev_current=rxlev_current@entry=63, neighbors_count=0x7fffffff8914) at handover_decision_2.c:1146 #3 0x00005555555c5813 in collect_candidates_for_lchan (lchan=lchan@entry=0x7ffff7e9f1f0, clist=clist@entry=0x7fffffff89b0, candidates=candidates@entry=0x7fffffff899c, _rxlev_current=_rxlev_current@entry=0x7fffffff8998, include_weaker_rxlev=include_weaker_rxlev@entry=false) at handover_decision_2.c:1224 #4 0x00005555555c6af4 in find_alternative_lchan (lchan=lchan@entry=0x7ffff7e9f1f0, include_weaker_rxlev=include_weaker_rxlev@entry=false, request_upgrade_to_tch_f=false) at handover_decision_2.c:1303 #5 0x00005555555c7f8f in on_measurement_report (mr=0x7ffff7e9f540) at handover_decision_2.c:1577 #6 0x00005555555d2647 in ho_meas_rep (mr=0x7ffff7e9f540) at handover_logic.c:95 #7 ho_logic_sig_cb (subsys=<optimized out>, signal=<optimized out>, handler_data=<optimized out>, signal_data=<optimized out>) at handover_logic.c:316 #8 0x00007ffff72e3ca4 in osmo_signal_dispatch (subsys=subsys@entry=3, signal=signal@entry=8, signal_data=signal_data@entry=0x7fffffffd170) at signal.c:118 #9 0x0000555555582a72 in send_lchan_signal (resp=0x7ffff7e9f540, lchan=<optimized out>, sig_no=8) at abis_rsl.c:67 #10 rsl_rx_meas_res (msg=msg@entry=0x555555bc0350) at abis_rsl.c:1455 #11 0x00005555555879e5 in abis_rsl_rx_dchan (msg=0x555555bc0350) at abis_rsl.c:1544 #12 abis_rsl_rcvmsg (msg=0x555555bc0350) at abis_rsl.c:3056 #13 0x00007ffff6eac542 in handle_ts1_read () from /usr/local/lib/libosmoabis.so.10 #14 0x00007ffff6eaca2b in ipaccess_fd_cb () from /usr/local/lib/libosmoabis.so.10 #15 0x00007ffff72e36fc in poll_disp_fds (n_fd=<optimized out>) at select.c:361 #16 _osmo_select_main (polling=<optimized out>) at select.c:393 #17 0x00007ffff72e37e6 in osmo_select_main_ctx (polling=<optimized out>) at select.c:449 #18 0x0000555555575909 in main (argc=<optimized out>, argv=<optimized out>) at osmo_bsc_main.c:1087
In candidate_set_free_tch ():
(gdb) p c->target $13 = {ab = {arfcn = 249, bsic = 63 '?'}, cell_ids = {id_discr = CELL_IDENT_WHOLE_GLOBAL, id_list = {{global = {lai = {plmn = {mcc = 334, mnc = 7, mnc_3_digits = false}, lac = 274}, cell_identity = 102}, lac_and_ci = {lac = 334, ci = 7}, ci = 334, lai_and_lac = {plmn = {mcc = 334, mnc = 7, mnc_3_digits = false}, lac = 274}, lac = 334, global_ps = {rai = {lac = {plmn = {mcc = 334, mnc = 7, mnc_3_digits = false}, lac = 274}, rac = 102 'f'}, cell_identity = 0}}, {global = {lai = {plmn = {mcc = 0, mnc = 0, mnc_3_digits = false}, lac = 0}, cell_identity = 0}, lac_and_ci = {lac = 0, ci = 0}, ci = 0, lai_and_lac = {plmn = {mcc = 0, mnc = 0, mnc_3_digits = false}, lac = 0}, lac = 0, global_ps = {rai = {lac = {plmn = {mcc = 0, mnc = 0, mnc_3_digits = false}, lac = 0}, rac = 0 '\000'}, cell_identity = 0}} <repeats 126 times>}, id_list_len = 1}, bts = 0x0, rxlev = 63, rxlev_afs_bias = 0, free_tchf = 0, min_free_tchf = 0, free_tchh = 0, min_free_tchh = 0, next_tchf_reduces_tchh = 0, next_tchh_reduces_tchf = 0}
Related issues
Updated by keith over 2 years ago
- Related to Bug #5246: sigsegv in bts_count_free_ts() added
Updated by keith over 2 years ago
in handover_decision.c:1129:
/* For cells in a remote BSS, we cannot query the target cell's handover config, and hence
* instead assume the local BTS' config to apply. */
neigh_cfg = (neighbor_bts ? : bts)->ho;
So, IIUC, We are expecting that neighbor_bts may be NULL at this point.
In this case, We are then proceeding to define c as struct ho_candidate with member .target.bts = 0x0
We pass &c to candidate_set_free_tch() which calls chan_counts_for_bts(&bts_counts, c->target.bts) at line 1030
That function dereferences the null pointer: llist_for_each_entry(trx, &bts->trx_list, list) -> BOOM!
Updated by keith over 2 years ago
I also notice that a few lines down in handover_decision_2.c: (line 1172) we are doing:
if (neighbor_bts) {
check_requirements(&c);
} else
check_requirements_remote_bss(&c);
So maybe something like this is enough?
--- a/src/osmo-bsc/handover_decision_2.c
+++ b/src/osmo-bsc/handover_decision_2.c
@@ -1143,7 +1143,8 @@ static void collect_handover_candidate(struct gsm_lchan *lchan, struct neigh_mea
.rxlev = neigh_meas_avg(nmp, ho_get_hodec2_rxlev_neigh_avg_win(bts->ho)),
},
};
- candidate_set_free_tch(&c);
+ if (neighbor_bts)
+ candidate_set_free_tch(&c);
Updated by keith over 2 years ago
I've tested this patch (above in #5324-3) and am running successfully a multi-BSS system with HO working.
neels You want to take a look and see if this simple check is enough?
Looks like it was introduced in
https://osmocom.org/projects/osmobsc/repository/osmo-bsc/revisions/d946e5b280dbce0131234a10d28524d910c76553
thnx
Updated by neels over 2 years ago
Thanks for reporting this!
I'll try to find out why the inter-BSC HO ttcn3 testing doesn't catch this problem.
I also have an alternative patch that makes candidate_set_free_tch() safe to call for inter-BSC candidates.
Still testing...
At first I thought the bug was introduced by the recent channel counting refactoring,
but indeed you are right that the problem was introduced much earlier, almost a year ago, when we added
sane handover target channel selection -- which of course doesn't apply for inter-BSC HO.
Updated by neels over 2 years ago
neels wrote in #note-5:
I'll try to find out why the inter-BSC HO ttcn3 testing doesn't catch this problem.
Of course: in ttcn3 we so far just trigger a handover via VTY directly instead of letting
measurements cause HO candidate selection... That's why we didn't catch the bug.
f_vty_transceive(BSCVTY, "handover any to arfcn 123 bsic any");
Given the serious nature of the bug, I'll try to find a way to test that code path...
Updated by neels over 2 years ago
- Status changed from New to In Progress
- Assignee set to neels
- % Done changed from 0 to 90
I am able to reproduce the bug in https://gerrit.osmocom.org/c/osmo-ttcn3-hacks/+/26354
and submitted an alternative fix in https://gerrit.osmocom.org/c/osmo-bsc/+/26352
Updated by neels over 2 years ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
Updated by pespin over 2 years ago
- Related to Bug #5385: Segmentation fault in chan_counts_for_bts() added
Updated by keith about 2 years ago
- Related to Bug #5525: Multi BSS Handover: gsm_bts_cell_id() passed NULL bts added