Bug #3704
closedWhen Paging request for SMS is unanswered, osmo-msc sends infinit number of paging requests
100%
Description
The problem occurs when the MSC is attempting to page for an MT SM but the BSC is not responding in any way. The paging requests repeat over and over, even when the SCCP connection is shut down one can still see by the logs that the BSC still trying to page.
The reason for this problem seems not to be located in the code that controls the paging, but rather in the code that runs the SM Queue. The problem is that when the paging fails the SM Queue is trying to deliver the SM again and again. However, there is indeed a mechanism implemented that shall prevent undeliverable SM cycling through the queue again and again. When there are errors during the delivery, the function db_sms_inc_deliver_attempts() is called and once the counters reach a certain maximum the SM is removed from the queue. For some reason this is only done when some error during the delivery happens, but not when the delivery could not be started because the paging did not succeed.
Files
Related issues
Updated by dexter over 5 years ago
- % Done changed from 0 to 40
The problem can be reproduced using the following TTCN3 test: (MSC_Tests.TC_lu_and_mt_sms_paging_and_nothing)
https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/11897/
(The normal intended count for re-delivery attempts seems to be set to 10)
Updated by dexter over 5 years ago
- % Done changed from 40 to 70
A fix can be found here. I have marked the patch as WIP since I did not test with the full set of testcases yet. Once I have done that I will remove the WIP flag.
https://gerrit.osmocom.org/#/c/osmo-msc/+/11899
Updated by fixeria over 5 years ago
(The normal intended count for re-delivery attempts seems to be set to 10)
I don't think this is a good idea to initiate paging 10 times in row,
without any pause in between. Where does this '10 times' comes from?
Any spec. references? Why not e.g. 5?
Also, instead of giving up after X attempts, we can just wait for a periodic
Location Update, which should trigger some signal that is observed by SMSqueue.
Updated by dexter over 5 years ago
- File infinite_paging.pcapng infinite_paging.pcapng added
I have checked the problem back on the lab setup and it seems to be indeed a real world problem. When I send an SMS to a subscriber that has done an LU, but has silently vanished (battery out), the infinite paging happens as well (see attached trace)
Thu Nov 22 19:32:48 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 2 Thu Nov 22 19:32:48 2018 DREF <000a> gsm_subscriber.c:109 VLR subscr MSISDN:23001 usage decreases to: 1 Thu Nov 22 19:32:49 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 2 Thu Nov 22 19:32:49 2018 DREF <000a> sms_queue.c:121 VLR subscr MSISDN:23001 usage increases to: 3 Thu Nov 22 19:32:49 2018 DMM <0002> gsm_subscriber.c:158 Subscriber MSISDN:23001 not paged yet, start paging. Thu Nov 22 19:32:49 2018 DBSSAP <0010> a_iface.c:216 Tx BSSMAP paging message from MSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP to BSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP (imsi=001010000000001, tmsi=0x79dbda07, lac=9070) Thu Nov 22 19:32:49 2018 DREF <000a> gsm_subscriber.c:166 VLR subscr MSISDN:23001 usage increases to: 4 Thu Nov 22 19:32:49 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 5 Thu Nov 22 19:32:49 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 4 Thu Nov 22 19:32:59 2018 DPAG <0005> gsm_subscriber.c:73 Paging failure for MSISDN:23001 (event=1) Thu Nov 22 19:32:59 2018 DPAG <0005> osmo_msc.c:352 Paging can stop for MSISDN:23001 Thu Nov 22 19:32:59 2018 DPAG <0005> gsm_subscriber.c:100 Calling paging cbfn. Thu Nov 22 19:32:59 2018 DLSMS <0017> sms_queue.c:155 Sending SMS 1 failed 0 times. Thu Nov 22 19:32:59 2018 DREF <000a> sms_queue.c:128 VLR subscr MSISDN:23001 usage decreases to: 3 Thu Nov 22 19:32:59 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 2 Thu Nov 22 19:32:59 2018 DREF <000a> gsm_subscriber.c:109 VLR subscr MSISDN:23001 usage decreases to: 1 Thu Nov 22 19:33:00 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 2 Thu Nov 22 19:33:00 2018 DREF <000a> sms_queue.c:121 VLR subscr MSISDN:23001 usage increases to: 3 Thu Nov 22 19:33:00 2018 DMM <0002> gsm_subscriber.c:158 Subscriber MSISDN:23001 not paged yet, start paging. Thu Nov 22 19:33:00 2018 DBSSAP <0010> a_iface.c:216 Tx BSSMAP paging message from MSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP to BSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP (imsi=001010000000001, tmsi=0x79dbda07, lac=9070) Thu Nov 22 19:33:00 2018 DREF <000a> gsm_subscriber.c:166 VLR subscr MSISDN:23001 usage increases to: 4 Thu Nov 22 19:33:00 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 5 Thu Nov 22 19:33:00 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 4 Thu Nov 22 19:33:10 2018 DPAG <0005> gsm_subscriber.c:73 Paging failure for MSISDN:23001 (event=1) Thu Nov 22 19:33:10 2018 DPAG <0005> osmo_msc.c:352 Paging can stop for MSISDN:23001 Thu Nov 22 19:33:10 2018 DPAG <0005> gsm_subscriber.c:100 Calling paging cbfn. Thu Nov 22 19:33:10 2018 DLSMS <0017> sms_queue.c:155 Sending SMS 1 failed 0 times. Thu Nov 22 19:33:10 2018 DREF <000a> sms_queue.c:128 VLR subscr MSISDN:23001 usage decreases to: 3 Thu Nov 22 19:33:10 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 2 Thu Nov 22 19:33:10 2018 DREF <000a> gsm_subscriber.c:109 VLR subscr MSISDN:23001 usage decreases to: 1 Thu Nov 22 19:33:11 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 2 Thu Nov 22 19:33:11 2018 DREF <000a> sms_queue.c:121 VLR subscr MSISDN:23001 usage increases to: 3 Thu Nov 22 19:33:11 2018 DMM <0002> gsm_subscriber.c:158 Subscriber MSISDN:23001 not paged yet, start paging. Thu Nov 22 19:33:11 2018 DBSSAP <0010> a_iface.c:216 Tx BSSMAP paging message from MSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP to BSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP (imsi=001010000000001, tmsi=0x79dbda07, lac=9070) Thu Nov 22 19:33:11 2018 DREF <000a> gsm_subscriber.c:166 VLR subscr MSISDN:23001 usage increases to: 4 Thu Nov 22 19:33:11 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 5 Thu Nov 22 19:33:11 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 4 Thu Nov 22 19:33:21 2018 DPAG <0005> gsm_subscriber.c:73 Paging failure for MSISDN:23001 (event=1) Thu Nov 22 19:33:21 2018 DPAG <0005> osmo_msc.c:352 Paging can stop for MSISDN:23001 Thu Nov 22 19:33:21 2018 DPAG <0005> gsm_subscriber.c:100 Calling paging cbfn. Thu Nov 22 19:33:21 2018 DLSMS <0017> sms_queue.c:155 Sending SMS 1 failed 0 times. Thu Nov 22 19:33:21 2018 DREF <000a> sms_queue.c:128 VLR subscr MSISDN:23001 usage decreases to: 3 Thu Nov 22 19:33:21 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 2 Thu Nov 22 19:33:21 2018 DREF <000a> gsm_subscriber.c:109 VLR subscr MSISDN:23001 usage decreases to: 1 Thu Nov 22 19:33:22 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 2 Thu Nov 22 19:33:22 2018 DREF <000a> sms_queue.c:121 VLR subscr MSISDN:23001 usage increases to: 3 Thu Nov 22 19:33:22 2018 DMM <0002> gsm_subscriber.c:158 Subscriber MSISDN:23001 not paged yet, start paging. Thu Nov 22 19:33:22 2018 DBSSAP <0010> a_iface.c:216 Tx BSSMAP paging message from MSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP to BSC RI=SSN_PC,PC=0.23.1,SSN=BSSAP (imsi=001010000000001, tmsi=0x79dbda07, lac=9070) Thu Nov 22 19:33:22 2018 DREF <000a> gsm_subscriber.c:166 VLR subscr MSISDN:23001 usage increases to: 4 Thu Nov 22 19:33:22 2018 DREF <000a> db.c:772 VLR subscr MSISDN:23001 usage increases to: 5 Thu Nov 22 19:33:22 2018 DREF <000a> gsm_04_11.c:75 VLR subscr MSISDN:23001 usage decreases to: 4
Updated by dexter over 5 years ago
- Status changed from New to In Progress
- Assignee set to dexter
I am currently investigating where the 10sec exactly are coming from. The sms_queue seems to be a bit complex. However, it is important that we get grip on this problem since on the SGs interface there exists a PAGING REJECT message and the MSC should not try paging again and again once the paging was rejected. We also may consider to add logic that kills paging immediately just to satisfy the SGs interface but in the end it feels not right that there is infinite paging when the subscriber does not respond to paging.
(Maybe the paging behavior we see is not even infinite, maybe it will stop after the 1000 th. attempt. I still couldn't get my head around this completely and maybe a discussion is required here.)
Updated by dexter over 5 years ago
- Related to Feature #3615: SGs subscriber state machine added
Updated by dexter over 5 years ago
- % Done changed from 70 to 90
The current master of osmo-msc now passes the TTCN3 test. Presumably this has to do with Vadims Patch Change ID:I4a07ece80d8dd40b23da6bb1ffc9d3d745b54092. Things look much better now.
Updated by dexter over 5 years ago
While the problem is solved in osmo-msc, the related test is not through the review yet. I have rebased the ttcn3 patch and checked if it still works. To me it looks fine.
https://gerrit.osmocom.org/#/c/osmo-ttcn3-hacks/+/11897/
Updated by dexter over 5 years ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
The related TTCN3 test has been merged and the related test is passing.