Project

General

Profile

Actions

Bug #5563

open

OsmoMSC sometimes stalls for dozens of seconds in a production deployment

Added by laforge almost 2 years ago. Updated over 1 year ago.

Status:
Stalled
Priority:
High
Assignee:
Category:
SMS
Target version:
-
Start date:
05/14/2022
Due date:
% Done:

50%

Resolution:
Spec Reference:

Description

When we take a long-term (8 hours) bpftrace showing us the delay between subsequent calls to poll() (by libosmocore/src/select.c) in osmo-msc, we get the following histogram (units in milli-seconds):

@poll: 
[0]               532245 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                13088 |@                                                   |
[2, 4)              5621 |                                                    |
[4, 8)              5566 |                                                    |
[8, 16)             2746 |                                                    |
[16, 32)            5282 |                                                    |
[32, 64)            5262 |                                                    |
[64, 128)           6139 |                                                    |
[128, 256)         14273 |@                                                   |
[256, 512)         18357 |@                                                   |
[512, 1K)          13806 |@                                                   |
[1K, 2K)            4222 |                                                    |
[2K, 4K)            1331 |                                                    |
[4K, 8K)             450 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             0 |                                                    |
[32K, 64K)             5 |                                                    |
[64K, 128K)           17 |                                                    |
[128K, 256K)           2 |                                                    |
So as we can see
  • the majority is very low (sub-second to 128ms)
  • there is a smaller peak in the order of 128ms to 1s (surprisingly long)
  • there are still several thousand of instances where the delay isn the 1s..4s. interval (too long!)
  • there ar rare occasions where we don't return to poll for 32, 64, or evne more than 128 seconds (crazy!)

If we contrast this with the amount of time we spent in dbi_conn_queryf, this is clearly not the culprit:

@dbi_query: 
[0]                37008 |@                                                   |
[1]              1640233 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)           1245771 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@             |
[4, 8)             21406 |                                                    |
[8, 16)              325 |                                                    |
[16, 32)              71 |                                                    |
[32, 64)              17 |                                                    |

So the longest duration DB query was in the order of 32..63 ms. Not good, but not a problem either with all the MSC (MM, CC, SMS, BSSAP, SCCP, ...) time-outs being in the multiple-second range.

So now we have to find out if the stalls are

  1. due to excessive system load (like I/O) outside of osmo-msc, or
  2. due to something osmo-msc is doing by itself (like calling thousands of database queries of several milli-seconds each) without going through the libosmocore poll main loop.

Related issues

Related to OsmoMSC - Bug #5564: blocking database I/O by SMS databaseStalledlaforge05/15/2022

Actions
Related to OsmoMSC - Bug #5559: OsmoMSC at 100% CPU and unresponsive for up to several minutes!Stalledlaforge05/13/2022

Actions
Related to OsmoMSC - Feature #5566: avoid using synchronous = FULLResolvedlaforge05/17/2022

Actions
Actions #1

Updated by laforge almost 2 years ago

The result of all of this is long stalling is (of course) that all of the timers time out, up to the point where the BSC detects that the MSC is gone and it starts dropping new subscriber connections due to MSC inavailability:

May 13 23:59:20 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x28282D3B: No suitable MSC for this Complete Layer 3 request found
May 13 23:59:20 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x60BE0328: No suitable MSC for this Complete Layer 3 request found
May 13 23:59:21 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x611E7F66: No suitable MSC for this Complete Layer 3 request found
May 13 23:59:21 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x5D9F98F2: No suitable MSC for this Complete Layer 3 request found
May 13 23:59:21 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0xA8A19D59: No suitable MSC for this Complete Layer 3 request found
May 13 23:59:21 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x02376F59: No suitable MSC for this Complete Layer 3 request found
The logic for this detection is in a_reset.c:
  • every time a SCCP CR is not responded to for 20s: increment a counter
  • every time a SCCP CR is positively acknowledged: set the counter to zero
  • if the counter reaches 3: declare the MSC as dead

So if the MSC stalls for anything close to 20s or more, the above condition will be met. Likewise, of course, many timers in other protocols run out. This simply must not happen.

Actions #2

Updated by laforge almost 2 years ago

for the record, this is the bpftrace script used to generate the above histograms:

BEGIN {
        @last_elapsed = nsecs;
}

tracepoint:syscalls:sys_enter_poll /comm == "osmo-msc"/ {
        $since_last = (nsecs - @last_elapsed)/1000000;
        @last_elapsed = nsecs;
        @poll = hist($since_last);
}

uprobe:/usr/lib/x86_64-linux-gnu/libdbi.so.1.1.0:dbi_conn_queryf {
        @query_start = nsecs;
}

uretprobe:/usr/lib/x86_64-linux-gnu/libdbi.so.1.1.0:dbi_conn_queryf {
        $query_duration = (nsecs - @query_start)/1000000;
        @dbi_query = hist($query_duration);
}

interval:s:10 {
        printf("==========================================================");
        time("%Y-%m-%d %H:%M:%S\n");
        print(@poll);
        print(@dbi_query);
}

Actions #3

Updated by laforge almost 2 years ago

  • Category set to SMS
  • Status changed from New to In Progress

The culprit is likely sms_submit_pending() where we do up to 1000 iterations of a loop that performs a database read operation. So if the database takes 4ms each, that means 4s of stalling - and of course more if the DB operations take longer.

Actions #4

Updated by laforge almost 2 years ago

laforge wrote in #note-3:

The culprit is likely sms_submit_pending() where we do up to 1000 iterations of a loop that performs a database read operation. So if the database takes 4ms each, that means 4s of stalling - and of course more if the DB operations take longer.

I've now added the following snipped to the bpftrace so we know how long sms_submit_pending takes:

uprobe:/usr/bin/osmo-msc:sms_submit_pending { 
        @sms_submit_pending_start = nsecs;
}
uretprobe:/usr/bin/osmo-msc:sms_submit_pending { 
        $duration = (nsecs - @sms_submit_pending_start)/1000000;
        @sms_submit_pending = hist($duration);
}

Actions #5

Updated by laforge almost 2 years ago

"Unfortunately" it doesn't seem to be sms_submit_pending, at least not from what I can see so far:

==========================================================2022-05-14 06:35:10
@poll: 
[0]                59773 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                  550 |                                                    |
[2, 4)                96 |                                                    |
[4, 8)               171 |                                                    |
[8, 16)              181 |                                                    |
[16, 32)             239 |                                                    |
[32, 64)             290 |                                                    |
[64, 128)            276 |                                                    |
[128, 256)           895 |                                                    |
[256, 512)          1956 |@                                                   |
[512, 1K)           2010 |@                                                   |
[1K, 2K)             952 |                                                    |
[2K, 4K)            1098 |                                                    |
[4K, 8K)            1069 |                                                    |

@dbi_query: 
[0]                  736 |@                                                   |
[1]                32393 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)              5356 |@@@@@@@@                                            |
[4, 8)               328 |                                                    |

@sms_submit_pending: 
[128, 256)            89 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
[256, 512)           150 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

So we have [again] had 1096 times where the time between to successive poll() executions was > 4s, but at the same time the maximum duration of sms_submit_pending was in the 256..512ms bucket.

One possible error is that our '@poll' histogram represents just the absolute time between polls, irrespective of whether or not there actually was any timeout that would have constrained that poll. So assuming no osmo_timers are expiring and no network traffic happens, the value can of course go into the seconds range.

I've now added a '@exit_to_poll' histogram, which masures the time between exit of the last poll until we re-renter poll, and therefore it is a good measure for how much time we spend in osmo-msc code (potentially doing blocking file I/O via sqlite3) before again processing the next I/O event via poll.

total bpftrace script running now:

BEGIN {
        @last_elapsed = nsecs;
        @last_poll_exit = nsecs;
}

tracepoint:syscalls:sys_enter_poll /comm == "osmo-msc"/ {
        /* time since last poll */
        $since_last = (nsecs - @last_elapsed)/1000000;
        @last_elapsed = nsecs;
        @poll = hist($since_last);

        /* time since last poll exit (time spent in osmo-msc) */
        $since_exit = (nsecs - @last_poll_exit)/1000000;
        @exit_to_poll = hist($since_exit);
}

tracepoint:syscalls:sys_exit_poll /comm == "osmo-msc"/ {
        @last_poll_exit = nsecs;
}

uprobe:/usr/lib/x86_64-linux-gnu/libdbi.so.1.1.0:dbi_conn_queryf {
        @query_start = nsecs;
}

uretprobe:/usr/lib/x86_64-linux-gnu/libdbi.so.1.1.0:dbi_conn_queryf {
        $query_duration = (nsecs - @query_start)/1000000;
        @dbi_query = hist($query_duration);
}

uprobe:/usr/bin/osmo-msc:sms_submit_pending { 
        @sms_submit_pending_start = nsecs;
}
uretprobe:/usr/bin/osmo-msc:sms_submit_pending { 
        $duration = (nsecs - @sms_submit_pending_start)/1000000;
        @sms_submit_pending = hist($duration);
}

interval:s:10 {
        printf("==========================================================");
        time("%Y-%m-%d %H:%M:%S\n");
        print(@poll);
        print(@exit_to_poll);
        print(@dbi_query);
        print(@sms_submit_pending);
}
Actions #6

Updated by laforge almost 2 years ago

Running under low load for jus a few minutes, I can see the following (expected) correlation:

@exit_to_poll: 
[0]                 2936 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                   38 |                                                    |
[2, 4)                28 |                                                    |
[4, 8)                69 |@                                                   |
[8, 16)                0 |                                                    |
[16, 32)               0 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)            18 |                                                    |
[256, 512)            36 |                                                    |

@sms_submit_pending: 
[128, 256)            20 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
[256, 512)            34 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

The two longet-duration buckets of 'exit_to_poll' rather nicely correspond with the two (existing at all) buckets for sms_submit_pending.

So let's wait until SMS load increases during daytime, how far both of these will increment.

Actions #7

Updated by laforge almost 2 years ago

We just witnessed one such event, where sms_submit_pending took more than 64 seconds (!):

Until 07:04:05am local time everything was fine:

==========================================================2022-05-14 07:04:05
@poll:
[0]                10822 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                  306 |@                                                   |
[2, 4)                62 |                                                    |
[4, 8)               123 |                                                    |
[8, 16)               65 |                                                    |
[16, 32)             163 |                                                    |
[32, 64)             123 |                                                    |
[64, 128)            132 |                                                    |
[128, 256)           365 |@                                                   |
[256, 512)           479 |@@                                                  |
[512, 1K)            439 |@@                                                  |
[1K, 2K)             180 |                                                    |
[2K, 4K)              44 |                                                    |
[4K, 8K)               8 |                                                    |

@exit_to_poll:
[0]                12540 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                  183 |                                                    |
[2, 4)                85 |                                                    |
[4, 8)               287 |@                                                   |
[8, 16)                3 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)            36 |                                                    |
[256, 512)           176 |                                                    |

@dbi_query:
[0]                  674 |                                                    |
[1]                57332 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)             11394 |@@@@@@@@@@                                          |
[4, 8)               299 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               1 |                                                    |

@sms_submit_pending:
[128, 256)            39 |@@@@@@@@@@@                                         |
[256, 512)           173 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

But then, 10s later, we saw this:

==========================================================2022-05-14 07:04:15
@poll:
[0]                10906 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                  307 |@                                                   |
[2, 4)                68 |                                                    |
[4, 8)               123 |                                                    |
[8, 16)               65 |                                                    |
[16, 32)             163 |                                                    |
[32, 64)             123 |                                                    |
[64, 128)            134 |                                                    |
[128, 256)           366 |@                                                   |
[256, 512)           479 |@@                                                  |
[512, 1K)            439 |@@                                                  |
[1K, 2K)             180 |                                                    |
[2K, 4K)              44 |                                                    |
[4K, 8K)               8 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             0 |                                                    |
[32K, 64K)             0 |                                                    |
[64K, 128K)            1 |                                                    |

@exit_to_poll:
[0]                12626 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                  183 |                                                    |
[2, 4)                91 |                                                    |
[4, 8)               289 |@                                                   |
[8, 16)                3 |                                                    |
[16, 32)               1 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)            36 |                                                    |
[256, 512)           176 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             0 |                                                    |
[32K, 64K)             0 |                                                    |
[64K, 128K)            1 |                                                    |

@dbi_query:
[0]                  680 |                                                    |
[1]                62473 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)             11517 |@@@@@@@@@                                           |
[4, 8)               299 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               1 |                                                    |

@sms_submit_pending:
[128, 256)            39 |@@@@@@@@@@@                                         |
[256, 512)           173 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K)              0 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             0 |                                                    |
[32K, 64K)             0 |                                                    |
[64K, 128K)            1 |                                                    |

So sms_submit_pending took 64..127s and that of course also now shows up in exit_to_poll and poll. So I think we've established evidences that sms_submit_pending is at least one culprit.

I guess the best option would be to not process 1000 database queries at onec but simply only do 1..10, keep the statement / cursor and then continue in the next iteration.

Actions #8

Updated by laforge almost 2 years ago

logs around that time:

May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:575 SMC(0) message MMSMS-EST-CNF received in state MM_CONN_PENDING
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:189 SMC(0) send CP data
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:142 SMC(0) new CP state MM_CONN_PENDING -> WAIT_CP_ACK
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:96 SMC(0) instance created for network
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smr.c:92 SMR(0) instance created for network.
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm_04_11.c:1134 trans(SMS IMSI-334020504656624:MSISDN-69610126530:TMSI-0x1DC26424 callref-0x400004a3 tid-0) Going to send a MT SMS
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smr.c:422 SMR(0) message SM-RL-DATA_REQ received in state IDLE
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smr.c:146 SMR(0) new RP state IDLE -> WAIT_FOR_RP_ACK
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:475 SMC(0) message MNSMS-EST-REQ received in state IDLE
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:142 SMC(0) new CP state IDLE -> MM_CONN_PENDING
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:96 SMC(0) instance created for network
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smr.c:92 SMR(0) instance created for network.
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm_04_11.c:1134 trans(SMS IMSI-334020333482858:MSISDN-69610145942:TMSI-0x21200B95 callref-0x400004a4 tid-0) Going to send a MT SMS
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm_04_11.c:408 trans(SMS IMSI-334020333482858:MSISDN-69610145942:TMSI-0x21200B95 callref-0x400004a4 tid-0) sending status report for SMS reference 45
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smr.c:422 SMR(0) message SM-RL-DATA_REQ received in state IDLE
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smr.c:146 SMR(0) new RP state IDLE -> WAIT_FOR_RP_ACK
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:475 SMC(0) message MNSMS-EST-REQ received in state IDLE
May 14 07:03:00 huautla-bsc osmo-msc[29081]: DLSMS INFO gsm0411_smc.c:142 SMC(0) new CP state IDLE -> MM_CONN_PENDING
May 14 07:03:01 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-4-TCH_H-1)[0x5582ef9a05f0]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 14 07:03:01 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-3-TCH_H-0)[0x5582ef9de120]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 14 07:03:01 huautla-bsc CRON[14949]: pam_unix(cron:session): session opened for user root by (uid=0)
May 14 07:03:01 huautla-bsc CRON[14950]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 14 07:03:01 huautla-bsc CRON[14949]: pam_unix(cron:session): session closed for user root
May 14 07:03:02 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 14 07:03:14 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:527 SUBSCR_CONN(msc0-conn32891_subscr-IMSI-334020240287401-TMSI-0x0323b238)[0x5582efb68ff0]{WAIT_CLEAR_CMD}: Event MO_COMPL_L3 not permitted
May 14 07:03:14 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: Operation not permitted.
May 14 07:03:27 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:527 SUBSCR_CONN(msc0-conn32891_subscr-IMSI-334020240287401-TMSI-0x0323b238)[0x5582efb68ff0]{WAIT_CLEAR_CMD}: Event MO_COMPL_L3 not permitted
May 14 07:03:27 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: Operation not permitted.
May 14 07:03:32 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 14 07:03:37 huautla-bsc dhclient[546]: DHCPREQUEST for 192.168.1.107 on eth0 to 192.168.1.1 port 67
May 14 07:03:37 huautla-bsc dhclient[546]: DHCPACK of 192.168.1.107 from 192.168.1.1
May 14 07:03:37 huautla-bsc dnsmasq[1076]: reading /etc/resolv.conf
May 14 07:03:37 huautla-bsc dnsmasq[825]: reading /etc/resolv.conf
May 14 07:03:37 huautla-bsc dnsmasq[825]: using nameserver 192.168.1.1#53
May 14 07:03:37 huautla-bsc dnsmasq[1076]: using nameserver 192.168.1.1#53
May 14 07:03:37 huautla-bsc dhclient[546]: bound to 192.168.1.107 -- renewal in 220 seconds.
May 14 07:03:46 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:03:46 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[other]: no resources for SDCCH 0x13, retrying with TCH_H
May 14 07:03:47 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:555 SUBSCR_CONN(msc0-conn32899_subscr-TMSI-0x110b4690)[0x5582efa57de0]{WAIT_CC}: Event MO_DTAP not permitted
May 14 07:03:48 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:03:48 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[other]: no resources for SDCCH 0x16, retrying with TCH_H
May 14 07:03:49 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:03:49 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0x6, retrying with TCH_H
May 14 07:03:51 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:03:51 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0xf, retrying with TCH_H
May 14 07:03:53 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:03:53 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0x7, retrying with TCH_H
May 14 07:03:55 huautla-bsc osmo-bsc[5253]: DRESET NOTICE bssmap_reset.c:91 bssmap_reset(msc-0)[0x5582ef9cdce0]{DISC}: link lost
May 14 07:03:55 huautla-bsc osmo-bsc[5253]: DMSC NOTICE a_reset.c:54 (msc0) BSSMAP assocation is down
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/15@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/15@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/15@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/15@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/9@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/9@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/9@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/9@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/7@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/7@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/7@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/7@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/f@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/f@mgw DLCX: connection successfully deleted
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/f@mgw DLCX: deleting connection(s) ...
May 14 07:03:55 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/f@mgw DLCX: connection successfully deleted
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0xb, retrying with TCH_H
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x9B0C8433: No suitable MSC for this Complete Layer 3 request found
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9b0c8433)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9b0c8433)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:03:56 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:03:57 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-5-TCH_H-1)[0x5582ef9dbff0]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 14 07:03:57 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-2-TCH_H-1)[0x5582ef9de8c0]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 14 07:03:57 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-2-TCH_H-0)[0x5582ef9e4710]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 14 07:03:57 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-4-TCH_H-0)[0x5582ef9cf050]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 14 07:03:58 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x0476645D: No suitable MSC for this Complete Layer 3 request found
May 14 07:03:58 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0476645d)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:03:58 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:03:58 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0476645d)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:03:58 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x9472613F: No suitable MSC for this Complete Layer 3 request found
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9472613f)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9472613f)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x30BCC50E: No suitable MSC for this Complete Layer 3 request found
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x30bcc50e)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x30bcc50e)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:03:59 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:01 huautla-bsc CRON[15018]: pam_unix(cron:session): session opened for user root by (uid=0)
May 14 07:04:01 huautla-bsc CRON[15019]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 14 07:04:01 huautla-bsc CRON[15018]: pam_unix(cron:session): session closed for user root
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DLSCCP ERROR sccp_scoc.c:1031 SCCP-SCOC(32895)[0x5582efa25280]{WAIT_CONN_CONF}: transition to state IDLE not permitted!
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x0476645D: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0476645d)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0476645d)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:02 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DCHAN ERROR lchan_fsm.c:1801 lchan(1-0-1-SDCCH8-1)[0x5582ef9cfcd0]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) lchan allocation failed in state WAIT_RLL_RTP_ESTABLISH: Timeou
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x912B8764: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x912b8764)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x912b8764)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:02 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:04 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x3B133113: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:04 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x3b133113)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:04 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:04 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x3b133113)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:04 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x0E00702C: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0e00702c)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0e00702c)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x2205655E: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x2205655e)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x2205655e)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x9472613F: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9472613f)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9472613f)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x912B8764: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x912b8764)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x912b8764)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:05 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:06 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x30BCC50E: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:06 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x30bcc50e)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:06 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:06 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x30bcc50e)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:06 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x1F782AE6: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x1f782ae6)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x1f782ae6)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x0476645D: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0476645d)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x0476645d)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:07 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[other]: no resources for SDCCH 0x13, retrying with TCH_H
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x3CAD94A2: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x3cad94a2)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x3cad94a2)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:08 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:10 huautla-bsc osmo-bsc[5253]: DLSCCP ERROR sccp_scoc.c:1031 SCCP-SCOC(32896)[0x5582efae7f40]{WAIT_CONN_CONF}: transition to state IDLE not permitted!
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x1CC2E34F: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x1cc2e34f)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x1cc2e34f)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x7C863499: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x7c863499)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x7c863499)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x18F30D88: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x18f30d88)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x18f30d88)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: IMSI-334020534773701: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-334020534773701)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Cle
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-IMSI-334020534773701)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP messa
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_LOC_UPD_REQUEST: TMSI-0x4E610EFD: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x4e610efd)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x4e610efd)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x9472613F: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9472613f)[0x5582efb32560]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x9472613f)[0x5582efb32560]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:11 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:12 huautla-bsc osmo-bsc[5253]: DRSL ERROR gsm_08_08.c:473 MM GSM48_MT_MM_CM_SERV_REQ: TMSI-0x3B133113: No suitable MSC for this Complete Layer 3 request found
May 14 07:04:12 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:171 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x3b133113)[0x5582efb63470]{WAIT_CLEAR_CMD}: Unable to deliver BSSMAP Clear Re
May 14 07:04:12 huautla-bsc osmo-bsc[5253]: DMSC ERROR osmo_bsc_sigtran.c:366 MSC is not connected. Dropping.
May 14 07:04:12 huautla-bsc osmo-bsc[5253]: DMSC ERROR bsc_subscr_conn_fsm.c:143 SUBSCR_CONN(msc4294967295-conn4294967295_subscr-TMSI-0x3b133113)[0x5582efb63470]{WAIT_SCCP_RLSD}: Unable to deliver SCCP message
May 14 07:04:12 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: No such file or directory.
May 14 07:04:13 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 14 07:04:13 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0x2, retrying with TCH_H
Actions #9

Updated by laforge almost 2 years ago

So

May 14 07:03:55 huautla-bsc osmo-bsc[5253]: DRESET NOTICE bssmap_reset.c:91 bssmap_reset(msc-0)[0x5582ef9cdce0]{DISC}: link lost
May 14 07:03:55 huautla-bsc osmo-bsc[5253]: DMSC NOTICE a_reset.c:54 (msc0) BSSMAP assocation is down

means that at 07:03:55 we had more than three consecutive SCCP CR from BSC->MSC that received no response within 20s. This means the actual lock-up must have occurred at least 20s earlier, 07:03:35. That's just after a cron job was executed and exactly at the time a DHCP renewal happened. coincidence?

Actions #10

Updated by laforge almost 2 years ago

I can find similar time coincidences earlier in the log

May 13 23:56:20 huautla-bsc dhclient[546]: DHCPREQUEST for 192.168.1.107 on eth0 to 192.168.1.1 port 67
May 13 23:56:20 huautla-bsc dhclient[546]: DHCPACK of 192.168.1.107 from 192.168.1.1
May 13 23:56:20 huautla-bsc dnsmasq[1076]: reading /etc/resolv.conf
May 13 23:56:20 huautla-bsc dnsmasq[825]: reading /etc/resolv.conf
May 13 23:56:20 huautla-bsc dnsmasq[1076]: using nameserver 192.168.1.1#53
May 13 23:56:20 huautla-bsc dnsmasq[825]: using nameserver 192.168.1.1#53
May 13 23:56:20 huautla-bsc dhclient[546]: bound to 192.168.1.107 -- renewal in 266 seconds.
May 13 23:56:21 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:56:51 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:57:01 huautla-bsc CRON[15156]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:57:01 huautla-bsc CRON[15157]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 13 23:57:02 huautla-bsc CRON[15156]: pam_unix(cron:session): session closed for user root
May 13 23:57:21 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:57:51 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:58:01 huautla-bsc CRON[15207]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:58:01 huautla-bsc CRON[15208]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 13 23:58:01 huautla-bsc CRON[15207]: pam_unix(cron:session): session closed for user root
May 13 23:58:02 huautla-bsc osmo-bsc[5253]: DRESET NOTICE bssmap_reset.c:91 bssmap_reset(msc-0)[0x5582ef9cdce0]{DISC}: link lost
May 13 23:34:46 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:35:01 huautla-bsc CRON[12216]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:35:01 huautla-bsc CRON[12217]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:35:01 huautla-bsc CRON[12220]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 13 23:35:01 huautla-bsc CRON[12215]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:35:01 huautla-bsc CRON[12219]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:35:01 huautla-bsc CRON[12214]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:35:01 huautla-bsc CRON[12222]: (root) CMD (/var/rhizomatica/bin/platform_update_rrd.sh)
May 13 23:35:01 huautla-bsc CRON[12218]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:35:01 huautla-bsc CRON[12221]: (root) CMD (/home/rhizomatica/bin/monitor_rapi.sh > /dev/null 2>&1)
May 13 23:35:01 huautla-bsc CRON[12223]: (root) CMD (/home/rhizomatica/bin/monitor_bts.sh > /dev/null 2>&1)
May 13 23:35:01 huautla-bsc CRON[12224]: (root) CMD (/var/rhizomatica/bin/network_update_rrd.sh)
May 13 23:35:01 huautla-bsc CRON[12225]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 13 23:35:01 huautla-bsc CRON[12214]: pam_unix(cron:session): session closed for user root
May 13 23:35:01 huautla-bsc CRON[12215]: pam_unix(cron:session): session closed for user root
May 13 23:35:01 huautla-bsc CRON[12217]: pam_unix(cron:session): session closed for user root
May 13 23:35:02 huautla-bsc CRON[12216]: pam_unix(cron:session): session closed for user root
May 13 23:35:06 huautla-bsc CRON[12218]: pam_unix(cron:session): session closed for user root
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:855 endpoint:rtpbridge/3@mgw CRCX: creating new connection ...
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1091 endpoint:rtpbridge/3@mgw CI:CC7B5F4C CRCX: connection successfully created
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:855 endpoint:rtpbridge/4@mgw CRCX: creating new connection ...
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1091 endpoint:rtpbridge/4@mgw CI:5C7786CB CRCX: connection successfully created
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1124 endpoint:rtpbridge/4@mgw MDCX: modifying existing connection ...
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_sdp.c:436 endpoint:rtpbridge/4@mgw CI:5C7786CB Got media info via SDP: port:17106, addr:172.16.0.11, duration:20, payload-types:3=GSM
May 13 23:35:13 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1312 endpoint:rtpbridge/4@mgw CI:5C7786CB MDCX: connection successfully modified
May 13 23:35:14 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:855 endpoint:rtpbridge/4@mgw CRCX: creating new connection ...
May 13 23:35:14 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_sdp.c:436 endpoint:rtpbridge/4@mgw CI:D995ACEF Got media info via SDP: port:7530, addr:172.16.0.1, duration:20, payload-types:3=GSM
May 13 23:35:14 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1091 endpoint:rtpbridge/4@mgw CI:D995ACEF CRCX: connection successfully created
May 13 23:35:14 huautla-bsc osmo-mgw[481]: DRTP mgcp_network.c:970 (rtpbridge/3@mgw I:CC7B5F4C) RTP packet too short (1 < 12)
May 13 23:35:16 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:35:18 huautla-bsc osmo-sip-connector[755]: Fri May 13 23:35:18 2022 DMNCC ERROR mncc.c:54 command(0x204) never arrived for leg(9495)
May 13 23:35:18 huautla-bsc osmo-sip-connector[755]: Fri May 13 23:35:18 2022 DSIP ERROR sip.c:463 cause2status(): Cause(unknown 0x0) not found in map.
May 13 23:35:19 huautla-bsc CRON[12219]: pam_unix(cron:session): session closed for user root
May 13 23:35:28 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:527 SUBSCR_CONN(msc0-conn26826_subscr-IMSI-334020504491834-TMSI-0x14e5ca1e)[0x5582efa936e0]{WAIT_CLEAR_CMD}: Event MO_COMPL_L3 not permitted
May 13 23:35:28 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: Operation not permitted.
May 13 23:35:36 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:527 SUBSCR_CONN(msc0-conn26826_subscr-IMSI-334020504491834-TMSI-0x14e5ca1e)[0x5582efa936e0]{WAIT_CLEAR_CMD}: Event MO_COMPL_L3 not permitted
May 13 23:35:36 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: Operation not permitted.
May 13 23:35:36 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 13 23:35:36 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[other]: no resources for SDCCH 0x16, retrying with TCH_H
May 13 23:35:43 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 13 23:35:43 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[other]: no resources for SDCCH 0x16, retrying with TCH_H
May 13 23:35:43 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:527 SUBSCR_CONN(msc0-conn26826_subscr-IMSI-334020504491834-TMSI-0x14e5ca1e)[0x5582efa936e0]{WAIT_CLEAR_CMD}: Event MO_COMPL_L3 not permitted
May 13 23:35:43 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: Operation not permitted.
May 13 23:35:45 huautla-bsc osmo-bsc[5253]: DRESET NOTICE bssmap_reset.c:91 bssmap_reset(msc-0)[0x5582ef9cdce0]{DISC}: link lost
May 13 23:33:01 huautla-bsc CRON[12091]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:33:01 huautla-bsc CRON[12092]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 13 23:33:01 huautla-bsc CRON[12091]: pam_unix(cron:session): session closed for user root
May 13 23:33:06 huautla-bsc osmo-bsc[5253]: DMSC ERROR gsm_08_08.c:527 SUBSCR_CONN(msc0-conn26733_subscr-IMSI-334020504491834-TMSI-0x14e5ca1e)[0x5582efa01730]{WAIT_CLEAR_CMD}: Event MO_COMPL_L3 not permitted
May 13 23:33:06 huautla-bsc osmo-bsc[5253]: DLINP ERROR input/ipaccess.c:432 Bad signalling message, sign_link returned error: Operation not permitted.
May 13 23:33:07 huautla-bsc osmo-bsc[5253]: DRESET NOTICE bssmap_reset.c:91 bssmap_reset(msc-0)[0x5582ef9cdce0]{DISC}: link lost
May 13 23:33:07 huautla-bsc osmo-bsc[5253]: DMSC NOTICE a_reset.c:54 (msc0) BSSMAP assocation is down
May 13 23:07:01 huautla-bsc CRON[8905]: pam_unix(cron:session): session opened for user root by (uid=0)
May 13 23:07:01 huautla-bsc CRON[8906]: (root) CMD (/var/rhizomatica/bin/network_update60.sh)
May 13 23:07:01 huautla-bsc CRON[8905]: pam_unix(cron:session): session closed for user root
May 13 23:07:02 huautla-bsc osmo-bsc[5253]: DMSC NOTICE gsm_08_08.c:79 Tx MSC SAPI N REJECT (dlci=0xc3, cause='BSS NOT EQUIPPED')
May 13 23:07:07 huautla-bsc osmo-bsc[5253]: DLSCCP NOTICE sccp_scoc.c:1597 Received message for opc=185=0.23.1 on conn with mismatching remote pc=0=0.0.0
May 13 23:07:07 huautla-bsc osmo-bsc[5253]: DLSCCP NOTICE sccp_scoc.c:1597 Received message for opc=185=0.23.1 on conn with mismatching remote pc=0=0.0.0
May 13 23:07:07 huautla-bsc osmo-bsc[5253]: DLSCCP NOTICE sccp_scoc.c:1597 Received message for opc=185=0.23.1 on conn with mismatching remote pc=0=0.0.0
May 13 23:07:08 huautla-bsc osmo-bsc[5253]: DCHAN ERROR abis_rsl.c:1257 lchan(1-0-1-SDCCH8-6)[0x5582ef9d0090]{ESTABLISHED}: (type=SDCCH) CONNECTION FAIL (cause=Radio Link Failure [ 01 ])
May 13 23:07:09 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:07:17 huautla-bsc osmo-bsc[5253]: DLSCCP NOTICE sccp_scoc.c:1597 Received message for opc=185=0.23.1 on conn with mismatching remote pc=0=0.0.0
May 13 23:07:17 huautla-bsc osmo-bsc[5253]: DLSCCP NOTICE sccp_scoc.c:1597 Received message for opc=185=0.23.1 on conn with mismatching remote pc=0=0.0.0
May 13 23:07:17 huautla-bsc osmo-bsc[5253]: DLSCCP NOTICE sccp_scoc.c:1597 Received message for opc=185=0.23.1 on conn with mismatching remote pc=0=0.0.0
May 13 23:07:26 huautla-bsc osmo-sip-connector[755]: Fri May 13 23:07:26 2022 DSIP NOTICE sip.c:511 Ending leg(0x55c516abf510) in connected state.
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/9@mgw DLCX: deleting connection(s) ...
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/9@mgw DLCX: connection successfully deleted
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/9@mgw DLCX: deleting connection(s) ...
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/9@mgw DLCX: connection successfully deleted
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/a@mgw DLCX: deleting connection(s) ...
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/a@mgw DLCX: connection successfully deleted
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1341 endpoint:rtpbridge/a@mgw DLCX: deleting connection(s) ...
May 13 23:07:26 huautla-bsc osmo-mgw[481]: DLMGCP mgcp_protocol.c:1458 endpoint:rtpbridge/a@mgw DLCX: connection successfully deleted
May 13 23:07:28 huautla-bsc osmo-bsc[5253]: DCHAN NOTICE abis_rsl.c:2947 lchan(1-0-4-TCH_H-1)[0x5582ef9a05f0]{WAIT_RF_RELEASE_ACK}: (type=TCH_H) Rx IPACC DLCX IND (cause=Normal event, unspecified [ 0f ])
May 13 23:07:39 huautla-bsc ssh[2713]: debug1: client_input_global_request: rtype keepalive@openssh.com want_reply 1
May 13 23:07:44 huautla-bsc apcupsd[801]: Communications with UPS lost.
May 13 23:07:47 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 13 23:07:47 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0x9, retrying with TCH_H
May 13 23:07:56 huautla-bsc osmo-bsc[5253]: DRLL NOTICE lchan_select.c:305 (bts=1) Failed to select SDCCH channel
May 13 23:07:56 huautla-bsc osmo-bsc[5253]: DRSL NOTICE abis_rsl.c:2199 (bts=1) CHAN RQD[Location updating]: no resources for SDCCH 0x2, retrying with TCH_H
May 13 23:07:59 huautla-bsc osmo-bsc[5253]: DCHAN ERROR lchan_fsm.c:1801 lchan(1-0-1-SDCCH8-1)[0x5582ef9cfcd0]{WAIT_RLL_RTP_ESTABLISH}: (type=SDCCH) lchan allocation failed in state WAIT_RLL_RTP_ESTABLISH: Timeou
May 13 23:08:01 huautla-bsc osmo-bsc[5253]: DRESET NOTICE bssmap_reset.c:91 bssmap_reset(msc-0)[0x5582ef9cdce0]{DISC}: link lost
May 13 23:08:01 huautla-bsc osmo-bsc[5253]: DMSC NOTICE a_reset.c:54 (msc0) BSSMAP assocation is down

so there always is a cron job for /var/rhizomatica/bin/network_update60.sh running right before the link is lost...

Actions #11

Updated by laforge almost 2 years ago

  • Related to Bug #5564: blocking database I/O by SMS database added
Actions #12

Updated by keith almost 2 years ago

  • Related to Bug #5559: OsmoMSC at 100% CPU and unresponsive for up to several minutes! added
Actions #13

Updated by laforge almost 2 years ago

I'm at a point where I'm able to reproduce the problem (or something that looks like it) locally in a synthetic test case. Note that my local setup differs in several ways:
  • Debian unstable instead of 10 (different kernel/libc/libsqlite3)
  • all builds with --enable-asan and hence much slower than normal builds

I've created a TC_sms_load TTCN3 test which simulates 16 concurrent subscribers (by 16 ConnHdlr components), each performing LU + MO-SMS. This is executed in a loop. The recipients are intentionally all offline, resulting in all SMS to be stored in the database.

When using osmo-msc 2db1966e25bbecb4dfb275f88e03aacace7863d0 (before switching sqlite3 to single-thread operation), I can send about 70 SMS before the code in the test suite interacting with the VTY times out (2s timeout). I can also see in the bpftrace output:

@dbi_query:
[8, 16)                9 |                                                    |
[16, 32)              40 |@@@                                                 |
[32, 64)             629 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128)            101 |@@@@@@@@                                            |
[128, 256)            16 |@                                                   |
[256, 512)             5 |                                                    |
[512, 1K)              1 |                                                    |
[1K, 2K)               0 |                                                    |
[2K, 4K)               1 |                                                    |

@sqlite3_step:
[0]               336853 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                   16 |                                                    |
[2, 4)                 2 |                                                    |
[4, 8)                 5 |                                                    |
[8, 16)               33 |                                                    |
[16, 32)              34 |                                                    |
[32, 64)              14 |                                                    |
[64, 128)              3 |                                                    |

@sms_submit_pending:
[256, 512)             1 |@@@@@@                                              |
[512, 1K)              1 |@@@@@@                                              |
[1K, 2K)               3 |@@@@@@@@@@@@@@@@@@@                                 |
[2K, 4K)               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

@smsq_take_next_sms:
[256, 512)             1 |@@@@@@                                              |
[512, 1K)              1 |@@@@@@                                              |
[1K, 2K)               3 |@@@@@@@@@@@@@@@@@@@                                 |
[2K, 4K)               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

so indeed, we are exceeding 2s several times in the sms_submit_pending / smsq_take_next_sms

When using current master (63494a6bef1a54f87742e83d9ad18f64d2dae2af) with sqlite3 in single-theraded mode, I can see similar results. After 82 SMS the timeout is hit:

@sms_submit_pending:
[512, 1K)              2 |@@@@@@@@@@@@@                                       |
[1K, 2K)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
[2K, 4K)               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)               2 |@@@@@@@@@@@@@                                       |
[8K, 16K)              1 |@@@@@@                                              |

@sms_submit_pending_start: 3793923232666688
@smsq_take_next_sms:
[512, 1K)              2 |@@@@@@@@@@@@@                                       |
[1K, 2K)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
[2K, 4K)               8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[4K, 8K)               2 |@@@@@@@@@@@@@                                       |
[8K, 16K)              1 |@@@@@@                                              |

@sqlite3_step:
[0]               529179 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                   20 |                                                    |
[2, 4)                 4 |                                                    |
[4, 8)                 3 |                                                    |
[8, 16)               55 |                                                    |
[16, 32)              45 |                                                    |
[32, 64)               3 |                                                    |
[64, 128)              1 |                                                    |
[128, 256)             1 |                                                    |

When switching to my current laforge/sqlite3 branch using libsqlite3 directly and prepared statements, I can insert more than 1600 SMS successfully without running into any timeouts:

@sqlite3_step: 
[0]                18337 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                14812 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[2, 4)              3110 |@@@@@@@@                                            |
[4, 8)              1488 |@@@@                                                |
[8, 16)              997 |@@                                                  |
[16, 32)             342 |                                                    |
[32, 64)              44 |                                                    |
[64, 128)              4 |                                                    |
[128, 256)             1 |                                                    |
[256, 512)             1 |                                                    |

@sms_submit_pending: 
[64, 128)             28 |@@@@@@@@                                            |
[128, 256)           168 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)           143 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        |
[512, 1K)             18 |@@@@@                                               |

@smsq_take_next_sms: 
[64, 128)             28 |@@@@@@@@                                            |
[128, 256)           169 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[256, 512)           142 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         |
[512, 1K)             18 |@@@@@                                               |

So as we can see, we occasionally exceeded 512ms, but never reached 1s.

When looking at the osmo-msc log concurrently with the bpftrace output, it is obvious that the culprit is db_sms_get_next_unsent_rr_msisdn() which is called up to 100 times per smsq_take_next_sms(). In the old code (libdbi), we are not preparing (pre-compiling) the query, so we not only have 100 executions, but also 100 compilations of that very same SQL statement.

Actions #14

Updated by laforge almost 2 years ago

  • % Done changed from 0 to 50

So, with 1600 MO-SMS for currently detached subscribers getting fed into osmo-msc, and then later on delivered to their respective recipients, I'm getting this plot on current laforge/sqlite3 (a37c1d3c4977e7a01b27ee109958e4bfb05ea763):

@sqlite3_step: 
[0]                 7096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[1]                 3801 |@@@@@@@@@@@@@@@@@@@@@@@@@@@                         |
[2, 4)              2110 |@@@@@@@@@@@@@@@                                     |
[4, 8)              2731 |@@@@@@@@@@@@@@@@@@@@                                |
[8, 16)             2238 |@@@@@@@@@@@@@@@@                                    |
[16, 32)             837 |@@@@@@                                              |
[32, 64)              85 |                                                    |
[64, 128)              8 |                                                    |
[128, 256)             5 |                                                    |
[256, 512)             1 |                                                    |

@sms_submit_pending: 
[4, 8)                 5 |@@                                                  |
[8, 16)               43 |@@@@@@@@@@@@@@@@@@@@@                               |
[16, 32)              72 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                |
[32, 64)             102 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[64, 128)             60 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                      |
[128, 256)            64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                    |
[256, 512)            36 |@@@@@@@@@@@@@@@@@@                                  |
[512, 1K)              2 |@                                                   |

@smsq_take_next_sms: 
[0]                   34 |@@@@                                                |
[1]                  366 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2, 4)               297 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@          |
[4, 8)               139 |@@@@@@@@@@@@@@@@@@@                                 |
[8, 16)               91 |@@@@@@@@@@@@                                        |
[16, 32)              30 |@@@@                                                |
[32, 64)               5 |                                                    |
[64, 128)              7 |                                                    |
[128, 256)            31 |@@@@                                                |
[256, 512)            30 |@@@@                                                |
[512, 1K)              2 |                                                    |

So as we can see, all latencies less than 1s. This looks much better.

Actions #15

Updated by laforge almost 2 years ago

Actions #16

Updated by laforge over 1 year ago

  • Status changed from In Progress to Stalled
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)