Project

General

Profile

Actions

Bug #5564

open

blocking database I/O by SMS database

Added by laforge 3 months ago. Updated 3 months ago.

Status:
Stalled
Priority:
High
Assignee:
Category:
SMS
Target version:
-
Start date:
05/15/2022
Due date:
% Done:

20%

Resolution:
Spec Reference:

Description

when OsmoMSC was split from OsmoNITB, we externalized the HLR database and removed the database-stored counters. This leaves the internal SMS queue / database code as the only remaining part of code which performs potentailly blocking disk I/O.

As seen in #5563 this is a real issue.

I spent half a day on reviewing the code in detail and playing with different ideas, including
  1. ripping out the sms_queue.c / db.c code completely into an external osmo-smsc which then uses GSUP
  2. just moving db.c into a separate thread; make DB operations asynchronous
  3. move sms_queue + db.c into a separate thread

moving sms_queue + DB code to new osmo-smsc, intrfaced via GSUP

osmo-msc already contains code to do SMS via GSUP, so there's no mandatory modification to osm-msc expected in this approach.

the major disadvantages of this appraoch are:
  • SMPP code would have to move to SMSC, and it is more tied into the MSC/VLR codebase -> extra effort
  • GSUP SMS interface is at a lower level than current sms_queue intrface -> extra effort of migrating/reimplementing that stuff in SMSC

SMS related VTY commands (not an issue, SMSC would have its own VTY)

this would cover the following API parts

  • sms_queue_clear
  • sms_queue_set_max_failure
  • sms_queue_set_max_pending
  • sms_queue_stats
  • sms_queue_sms_is_pending
  • sms_queue_trigger
  • vty_out

incoming signals into sms_queue

  • SS_SUBSCR / S_SUBSCR_ATTACHED
    • FIXME: unclear how this is handled in the GSUP case?
  • SS_SMS / S_SMS_DELIVERED
    • -> gsm411_gsup_mt_fwd_sm_res()
  • SS_SMS / S_SMS_MEM_EXCEEDED
    • -> gsm411_gsup_mt_fwd_sm_err()
  • SS_SMS / S_SMS_UNKNOWN_ERROR
    • -> gsm411_gsup_mt_fwd_sm_err()
  • SS_SMS / S_SMS_SUBMITTED
    • -> gsm411_gsup_mo_fwd_sm_req()
  • SS_SMS / S_SMS_SMMA
    • -> gsm411_gsup_mo_ready_for_sm_req()

DB (not an issue, DB code would then run in SMSC)

  • db_sms_delete_oldest_expired_message
  • db_sms_delete_sent_message_by_id
  • db_sms_get
  • db_sms_get_next_unsent_rr_msisdn
  • db_sms_get_unsent_for_subscr
  • db_sms_inc_deliver_attempts

SMS transmission

  • gsm411_send_sms calls by sms_queue
    • would have to be mapped to OSMO_GSUP_MSGT_MT_FORWARD_SM_REQUEST
  • sms_free
    • FIXME: what about vsub pointer/references?
  • vlr_subscr_msisdn_or_name
    • just for logging, can be avoided

making just the DB code async / run in separate thread

Is not easy as all of the call sites are assuming synchronous return/results
db_sms_get
  • sms_resend_pending
    • resend_pending timer
      • sms_queue_start
        • => can be executed from separate thread
db_sms_get_next_unsent_rr_msisdn
  • smsq_take_next_sms
    • sms_submit_pending
      • sms_send_next
        • sms_sms_cb / S_SMS_DELIVERED
          • => happens from the send_next it_Q completion handler
      • push_queue_timer
        • sms_queue_start
          • => can be executed from separate thread
db_sms_get_unsent_for_subscr
  • sms_send_next
    • sms_sms_cb / S_SMS_DELIVERED
      • => request to it_Q; completion then might add SMS to pending + gsm411_send_sms
  • sub_ready_for_sm
    • sms_subscr_cb / S_SUBSCR_ATTACHED
      • => request to it_Q; completion then might add SMS to pending + gsm411_send_sms
db_sms_delete_sent_message_by_id
  • sms_sms_cb / S_SMS_DELIVERED
    • => no return value, no success check: async it_Q
db_sms_inc_deliver_attempts
  • sms_sms_cb / S_SMS_UNKNOWN_ERROR
    • => no return value, no success check: async it_Q
db_sms_delete_oldest_expired_message
  • sms_sms_cb / any signal
    • => no return value, no success check: async it_Q

moving sms_queue + DB code to separate thread

access to pending_sms linked list

There are quite a number of accesses to the pending_sms linked list. Given most ar read, and only some are write, we might use a rwlock?

  • sms_find_pending [R]
    • sms_sms_cb
    • sms_queue_sms_is_pending
  • sms_queue_sms_is_pending [R]
    • sms_submit_pending
      • timer
    • vty
  • sms_subscriber_find_pending [R]
    • sub_ready_for_sm
      • SS_SUBSCR / S_SUBSCR_ATTACHED
    • sms_subscriber_is_pending
      • sms_submit_pending
        • timer
      • sms_send_next
        • sms_sms_cb / S_SMS_DELIVERED
  • sms_pending_from [R]
    • sms_submit_pending
      • timer
    • sms_send_next
      • sms_sms_cb / S_SMS_DELIVERED
  • sms_pending_free [W]
    • sms_pending_failed
      • sms_sms_cb / S_SMS_UNKNOWN_ERROR
    • sms_resend_pending
      • sms_sms_cb / S_SMS_DELIVERED
      • sms_sms_cb / S_SMS_MEM_EXCEEDED
    • sms_queue_clear
      • vty
  • sms_resend_pending [R]
    • timer
  • sms_queue_stats [R]
    • vty
  • sms_queue_clear [W]
    • vty

Conclusion

I think the following approach is best:
  • have a separate "SMS" thread
  • all database access happens from that thread only
  • inter-thread message queues (libosmocore it_q) between main thread and SMS thread
  • sms_queue timers (push_queue_timer, resend_pending_timer) run in that thread
  • other input (mainly signals today) are serialized via it_q in main -> SMS direction
  • other output (mainly gsm411_send_sms) are serialized via it_q in SMS -> main direction

Serialize SS_SMS signals

  • we really only need to serialize paging_result and sms->id
  • submit them into it_q to SMS thread

serialize SS_SUBSCR signal

  • sms_subscriber_find_pending() can be done in main thread before serialization
  • check for vsub->lu_complete and zero MSISDN before serialization
  • we really only need to serialize the MSISDN
  • db_sms_get_unsent_for_subscr() then happens from SMS thread

move push_queue_timer + resend_pending_timer to SMS thread

serialize db_sms_store() (MO-SMS, SMPP)

  • failure to store in database would only be known asynchronously!
  • we can probably just ignore that.

serialize db_sms_mark_delivered()

  • we don't care about success right now anyway, so async is no problem

VTY

  • remove 'sms send pending' or implement different command via it_Q
  • remove 'sms delete expired' or implement different command via it_Q
  • serialize 'subscriber ... sms ...' via it_Q

Related issues

Related to OsmoMSC - Bug #5563: OsmoMSC sometimes stalls for dozens of seconds in a production deploymentStalledlaforge05/14/2022

Actions
Related to OsmoMSC - Feature #5566: avoid using synchronous = FULLResolvedlaforge05/17/2022

Actions
Related to Cellular Network Infrastructure - Feature #3587: Possibility to route SMS messages over GSUPResolvedfixeria09/25/2018

Actions
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)