Project

General

Profile

Bug #3064

ofono: crash after merging osmo-gsm-tester patch to read IMSI

Added by pespin 9 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
Start date:
03/14/2018
Due date:
% Done:

0%

Spec Reference:

Description

ofono crashes consistenly after 2nd-3rd test in osmo-gsm-tester_prod after merging this patch in osmo-gsm-tester to read the SIM from the modem: https://gerrit.osmocom.org/#/c/7274/

osmo-gsm-tester output:

13:13:35.834710 tst                        /sierra_2: DBG: got SIM properties {'MobileCountryCode': '901', 'LockedPins': [], 'PinRequired': 'none', 'FixedDialing': False, 'SubscriberNumbers': [], 'Retries': {'network': 40, 'service': 0, 'corppuk': 0, 'puk2': 10, 'pin2': 3, 'phone': 24, 'networkpuk': 60, 'corp': 253, 'netsubpuk': 157, 'pin': 3, 'puk': 10, 'netsub': 48, 'firstphonepuk': 253, 'firstphone': 51, 'servicepuk': 253}, 'MobileNetworkCode': '70', 'SubscriberIdentity': '901700000015252', 'Present': True, 'BarredDialing': False, 'CardIdentifier': '8988211000000152523'}  [trial-215↪ussd:sysmo↪/sierra_2]  [modem.py:438]
13:13:35.835678 run             osmo-nitb_10.42.42.2: Add subscriber {imsi='901700000015252', msisdn='8465'}  [trial-215↪ussd:sysmo↪osmo-nitb_10.42.42.2]  [osmo_nitb.py:140]
13:13:35.836527 bus Ctrl(host='10.42.42.2', port=4249): DBG: Connecting  [trial-215↪ussd:sysmo↪osmo-nitb_10.42.42.2↪CTRL(10.42.42.2:4249)↪Ctrl(host='10.42.42.2', port=4249)]  [osmo_ctrl.py:54]
13:13:35.837398 bus Ctrl(host='10.42.42.2', port=4249): DBG: Sending {data='SET 0 subscriber-modify-v1 901700000015252,8465,comp128v1,EBAB63D06C3F546A16C977CB40E57C68'}  [trial-215↪ussd:sysmo↪osmo-nitb_10.42.42.2↪CTRL(10.42.42.2:4249)↪Ctrl(host='10.42.42.2', port=4249)]  [osmo_ctrl.py:65]
13:13:35.921488 bus Ctrl(host='10.42.42.2', port=4249): DBG: Receiving {data=b'\x00$\xee\x00SET_REPLY 0 subscriber-modify-v1 OK'}  [trial-215↪ussd:sysmo↪osmo-nitb_10.42.42.2↪CTRL(10.42.42.2:4249)↪Ctrl(host='10.42.42.2', port=4249)]  [osmo_ctrl.py:71]
13:13:35.922253 bus            CTRL(10.42.42.2:4249): DBG: Created subscriber {imsi='901700000015252', msisdn='8465'}  [trial-215↪ussd:sysmo↪osmo-nitb_10.42.42.2↪CTRL(10.42.42.2:4249)]  [osmo_nitb.py:198]
13:13:35.922719 bus Ctrl(host='10.42.42.2', port=4249): DBG: Disconnecting  [trial-215↪ussd:sysmo↪osmo-nitb_10.42.42.2↪CTRL(10.42.42.2:4249)↪Ctrl(host='10.42.42.2', port=4249)]  [osmo_ctrl.py:60]
13:13:35.932170 bus                        /sierra_2: DBG: Powered == True  [trial-215↪ussd:sysmo↪/sierra_2↪/sierra_2]  [modem.py:314]
13:13:35.932820 tst                        /sierra_2: DBG: Power cycling  [trial-215↪ussd:sysmo↪/sierra_2]  [modem.py:579]
13:13:35.933375 bus                        /sierra_2: DBG: has_interface(org.ofono.ConnectionManager) == False  [trial-215↪ussd:sysmo↪/sierra_2↪/sierra_2]  [modem.py:306]
13:13:35.934287 bus                        /sierra_2: Setting Online False  [trial-215↪ussd:sysmo↪/sierra_2↪/sierra_2]  [modem.py:322]
13:13:35.943841 bus                        /sierra_2: DBG: Online == False  [trial-215↪ussd:sysmo↪/sierra_2↪/sierra_2]  [modem.py:314]
13:13:35.944772 bus                        /sierra_2: Setting Powered False  [trial-215↪ussd:sysmo↪/sierra_2↪/sierra_2]  [modem.py:322]
13:13:36.002454 tst           assert_extension.py:18: ERR: Error: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying (4)  [trial-215↪ussd:sysmo↪assert_extension.py:18]  [proxy_method.py:75: 0, timeout_to_glib(timeout), None).unpack()]
13:13:36.005071 tst           assert_extension.py:18: TRACEBACK: Traceback (most recent call last):
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/test.py", line 61, in run
    self.path)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/util.py", line 326, in run_python_file
    spec.loader.exec_module( importlib.util.module_from_spec(spec) )
  File "<frozen importlib._bootstrap_external>", line 673, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/suites/ussd/assert_extension.py", line 18, in <module>
    ms.connect(nitb.mcc_mnc())
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 593, in connect
    self.power_cycle()
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 580, in power_cycle
    self.power_off()
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 570, in power_off
    self.set_powered(False)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 413, in set_powered
    return self.dbus.set_powered(powered=powered)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 328, in set_powered
    self.set_bool('Powered', powered)
  File "/home/jenkins/workspace/osmo-gsm-tester_run-prod/osmo-gsm-tester/src/osmo_gsm_tester/modem.py", line 323, in set_bool
    self.interface(iface).SetProperty(name, Variant('b', val))
  File "/usr/local/lib/python3.5/dist-packages/pydbus/proxy_method.py", line 75, in __call__
    0, timeout_to_glib(timeout), None).unpack()
GLib.GError: g-dbus-error-quark: GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying (4)
  [trial-215↪ussd:sysmo↪assert_extension.py:18]  [test.py:104]
13:13:36.008739 tst           assert_extension.py:18: Test FAILED (41.5 sec)  [trial-215↪ussd:sysmo↪assert_extension.py:18]  [test.py:105]

It can be seen that ofono crashes while powering off the modem. The different in behaviour from osmo-gms-testser since the patch was applied, was that we now set Powered=true before and fetch the IMSI with SimManager.getProperties(), then when we call connect(), we call power_cycle which finds the modem ON so it turns it OFF. Before the patch, the modem was already OFF because it was not previously turned ON to read the IMSI.

ofono logs during the same time show the crash occurring:

Mar 14 13:13:35.947250 osmo-gsm-tester-prod ofonod[26036]: src/modem.c:modem_change_state() old state: 2, new state: 0
Mar 14 13:13:35.947307 osmo-gsm-tester-prod ofonod[26036]: src/modem.c:flush_atoms()
Mar 14 13:13:35.947394 osmo-gsm-tester-prod ofonod[26036]: src/message-waiting.c:mw_remove() atom: 0x56188db370c0
Mar 14 13:13:35.947438 osmo-gsm-tester-prod ofonod[26036]: plugins/push-notification.c:push_notification_cleanup() 0x56188dbff1c0
Mar 14 13:13:35.947483 osmo-gsm-tester-prod ofonod[26036]: plugins/smart-messaging.c:smart_messaging_cleanup() 0x56188dbff160
Mar 14 13:13:35.947988 osmo-gsm-tester-prod ofonod[26036]: src/sms.c:sms_remove() atom: 0x56188dba71a0
Mar 14 13:13:35.948017 osmo-gsm-tester-prod ofonod[26036]: drivers/qmimodem/sms.c:qmi_sms_remove()
Mar 14 13:13:35.962295 osmo-gsm-tester-prod ofonod[26036]: src/radio-settings.c:radio_settings_remove() atom: 0x56188db67370
Mar 14 13:13:35.962345 osmo-gsm-tester-prod ofonod[26036]: drivers/qmimodem/radio-settings.c:qmi_radio_settings_remove()
Mar 14 13:13:35.962385 osmo-gsm-tester-prod ofonod[26036]: src/phonebook.c:phonebook_remove() atom: 0x56188db67340
Mar 14 13:13:35.962408 osmo-gsm-tester-prod ofonod[26036]: src/stk.c:stk_remove() atom: 0x56188db41a10
Mar 14 13:13:35.962470 osmo-gsm-tester-prod ofonod[26036]: src/location-reporting.c:location_reporting_remove() atom: 0x56188dc2eb70
Mar 14 13:13:35.962490 osmo-gsm-tester-prod ofonod[26036]: drivers/qmimodem/location-reporting.c:qmi_location_reporting_remove()
Mar 14 13:13:35.967282 osmo-gsm-tester-prod ofonod[26036]: src/voicecall.c:voicecall_remove() atom: 0x56188db3bcd0
Mar 14 13:13:35.967313 osmo-gsm-tester-prod ofonod[26036]: drivers/qmimodem/voicecall.c:qmi_voicecall_remove()
Mar 14 13:13:35.967590 osmo-gsm-tester-prod ofonod[26036]: plugins/bluez5.c:bt_unregister_profile() Bluetooth: Unregistering profile /bluetooth/profile/hfp_ag
Mar 14 13:13:35.968854 osmo-gsm-tester-prod ofonod[26036]: src/sim.c:sim_remove() atom: 0x56188db66930
Mar 14 13:13:35.968883 osmo-gsm-tester-prod ofonod[26036]: drivers/qmimodem/sim.c:qmi_sim_remove()
Mar 14 13:13:35.969121 osmo-gsm-tester-prod ofonod[26036]: Aborting (signal 11) [/usr/local/sbin/ofonod]
Mar 14 13:13:35.972237 osmo-gsm-tester-prod ofonod[26036]: ++++++++ backtrace ++++++++
Mar 14 13:13:35.987586 osmo-gsm-tester-prod ofonod[26036]: #0  0x7f76371ff060 in /lib/x86_64-linux-gnu/libc.so.6
Mar 14 13:13:35.987831 osmo-gsm-tester-prod ofonod[26036]: #1  0x7f7637dda517 in /lib/x86_64-linux-gnu/libglib-2.0.so.0
Mar 14 13:13:35.988221 osmo-gsm-tester-prod ofonod[26036]: #2  0x56188d571ea0 in /usr/local/sbin/ofonod
Mar 14 13:13:36.006677 osmo-gsm-tester-prod systemd[1]: ofono.service: Main process exited, code=exited, status=1/FAILURE
Mar 14 13:13:36.008039 osmo-gsm-tester-prod systemd[1]: ofono.service: Unit entered failed state.
Mar 14 13:13:36.008154 osmo-gsm-tester-prod systemd[1]: ofono.service: Failed with result 'exit-code'.
Mar 14 13:13:38.236776 osmo-gsm-tester-prod systemd[1]: ofono.service: Service hold-off time over, scheduling restart.
Mar 14 13:13:38.238182 osmo-gsm-tester-prod systemd[1]: Stopped Telephony service.
Mar 14 13:13:38.242292 osmo-gsm-tester-prod systemd[1]: Starting Telephony service...

trial-215-run.tgz trial-215-run.tgz 409 KB pespin, 03/14/2018 12:48 PM
ofono-gdb-session.txt ofono-gdb-session.txt 87.2 KB pespin, 03/14/2018 01:03 PM

History

#2 Updated by pespin 9 months ago

(gdb) bt
#0  0x00007ffff7b20517 in g_queue_is_empty () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1  0x00005555556acea0 in sim_fs_op_free (pointer=0x555555974270) at src/simfs.c:101
#2  0x00007ffff7b205fc in g_queue_foreach () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007ffff7b2065b in g_queue_free_full () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00005555556acf44 in sim_fs_free (fs=0x5555559995b0) at src/simfs.c:125
#5  0x0000555555681ae1 in sim_remove (atom=0x555555971bc0) at src/sim.c:3167
#6  0x000055555564e3a8 in flush_atoms (modem=0x5555559ed690, new_state=MODEM_STATE_POWER_OFF) at src/modem.c:432
#7  0x000055555564e5f6 in modem_change_state (modem=0x5555559ed690, new_state=MODEM_STATE_POWER_OFF)
    at src/modem.c:510
#8  0x000055555564f1d2 in set_powered (modem=0x5555559ed690, powered=0) at src/modem.c:896
#9  0x000055555564f985 in modem_set_property (conn=0x55555596b8d0, msg=0x5555559719e0, data=0x5555559ed690)
    at src/modem.c:1120
#10 0x00005555556c6800 in process_message (connection=0x55555596b8d0, message=0x5555559719e0,
    method=0x555555940128 <modem_methods+40>, iface_user_data=0x5555559ed690) at gdbus/object.c:259
#11 0x00005555556c8305 in generic_message (connection=0x55555596b8d0, message=0x5555559719e0,
    user_data=0x5555559bf3b0) at gdbus/object.c:1070
#12 0x00007ffff7899733 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#13 0x00007ffff788ad84 in dbus_connection_dispatch () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#14 0x00005555556c41b4 in message_dispatch (data=0x55555596b8d0) at gdbus/mainloop.c:72
#15 0x00007ffff7b0f6aa in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#16 0x00007ffff7b0fa60 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007ffff7b0fd82 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x000055555564cb3a in main (argc=1, argv=0x7fffffffec18) at src/main.c:306

(gdb) frame 1
#1  0x00005555556acea0 in sim_fs_op_free (pointer=0x555555974270) at src/simfs.c:101
101             if (fs->session && g_queue_is_empty(fs->op_q)) {
(gdb) print fs
$1 = (struct sim_fs *) 0x555555a7aad0
(gdb) print *fs
$2 = {op_q = 0x1b0, op_source = 32,
  bitmap = "\000\000\000\000\200֖UUU", '\000' <repeats 11 times>, "O\261\367\377\177\000\000!\000\000", fd = 0,
  sim = 0x7ffff72a0030, driver = 0x55555599c2b0, contexts = 0x20, session = 0x51, session_id = 1937339183,
  watch_id = 1986356271}

I'd say the fs pointer contains garbage, as if it was freed already before that point.

#3 Updated by pespin 9 months ago

I rebased our osmo-gsm-tester branch on top of ofono 1.23 upstream because I saw some sim related fixes, but the bug is still there:

#0  0x00007ffff7b20517 in g_queue_is_empty () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#1  0x00005555556adcdd in sim_fs_op_free (pointer=0x5555559cb990) at src/simfs.c:101
#2  0x00007ffff7b205fc in g_queue_foreach () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007ffff7b2065b in g_queue_free_full () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00005555556add81 in sim_fs_free (fs=0x5555559c0780) at src/simfs.c:125
#5  0x00005555556828f3 in sim_remove (atom=0x5555559cb000) at src/sim.c:3175
#6  0x000055555564f16f in flush_atoms (modem=0x555555a8fb00, new_state=MODEM_STATE_POWER_OFF) at src/modem.c:432
#7  0x000055555564f3bd in modem_change_state (modem=0x555555a8fb00, new_state=MODEM_STATE_POWER_OFF)
    at src/modem.c:510
#8  0x000055555564ff99 in set_powered (modem=0x555555a8fb00, powered=0) at src/modem.c:896
#9  0x000055555565074c in modem_set_property (conn=0x55555596c8d0, msg=0x55555596e460, data=0x555555a8fb00)
    at src/modem.c:1120
#10 0x00005555556c76b6 in process_message (connection=0x55555596c8d0, message=0x55555596e460,
    method=0x555555941128 <modem_methods+40>, iface_user_data=0x555555a8fb00) at gdbus/object.c:259
#11 0x00005555556c91bb in generic_message (connection=0x55555596c8d0, message=0x55555596e460,
    user_data=0x5555559e0430) at gdbus/object.c:1070
#12 0x00007ffff7899733 in ?? () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#13 0x00007ffff788ad84 in dbus_connection_dispatch () from /lib/x86_64-linux-gnu/libdbus-1.so.3
#14 0x00005555556c506a in message_dispatch (data=0x55555596c8d0) at gdbus/mainloop.c:72
#15 0x00007ffff7b0f6aa in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#16 0x00007ffff7b0fa60 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#17 0x00007ffff7b0fd82 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#18 0x000055555564d901 in main (argc=1, argv=0x7fffffffebf8) at src/main.c:306
(gdb) frame 1
#1  0x00005555556adcdd in sim_fs_op_free (pointer=0x5555559cb990) at src/simfs.c:101
101             if (fs->session && g_queue_is_empty(fs->op_q)) {
(gdb) print fs
$1 = (struct sim_fs *) 0x5555559c8fd0
(gdb) print *fs
$2 = {op_q = 0x150, op_source = 32,
  bitmap = "\000\000\000\000\220\t\227UUU", '\000' <repeats 11 times>, "O\261\367\377\177\000\000\001\002\000",
  fd = 0, sim = 0x5555559c9160, driver = 0x0, contexts = 0x5555559c90a0, session = 0xffffff9c, session_id = 0,
  watch_id = 0}

#4 Updated by pespin 9 months ago

I can easily reproduce this running ussd:sysmo in RnD.
I guess we only see the issue with tests running with 1 MS, since when 2 MS are used there's more time in between we used the SIM service and we power off the same MS.

#5 Updated by pespin 9 months ago

According to ofono maintainer, it may be a similar issue as discussed in https://www.mail-archive.com/ofono@ofono.org/msg17488.html

#6 Updated by pespin 9 months ago

I added a patch to hopefully workaround the issue until we and/or upstream find a proper fix: https://gerrit.osmocom.org/#/c/7291/
I'll leave it running tests in RnD for a while and see if I can still crash ofono with it. I'll try to change the prod jenkins job this night to use branch pespin/workaround-3064 which contains the workaround on top of current master, and see if tests pass fine during night.

#7 Updated by pespin 9 months ago

I didn't see the crash since I added the 1sec delay between using SImManager and powering off. Merging it for now as a workaround, but the issue is still there and should be fixed.

Way to reproduce: Remove the sleep 1 before set_power(false) in modem.py power_cycle(), then run in a loop ussd suite and it will eventually crash after a few runs. A standalone python script to cause the crash can probably be generated too, by SetProperty Powered true, then call GetProperties() until the property containing the IMSI appears, then call setProperty Powered False.

#8 Updated by pespin 9 months ago

Denis shared following patch which seems to be fixing the issue: https://pastebin.ca/4003426

I expect it to be merged soon.

    diff --git a/src/simfs.c b/src/simfs.c
    index efee13e5..c7ddbb38 100644
    --- a/src/simfs.c
    +++ b/src/simfs.c
    @@ -98,10 +98,8 @@ static void sim_fs_op_free(gpointer pointer)
            struct sim_fs *fs = node->context->fs;

            /* only release the session if there are no pending reads */
    -       if (fs->session && g_queue_is_empty(fs->op_q)) {
    +       if (fs->watch_id && g_queue_is_empty(fs->op_q))
                   __ofono_sim_remove_session_watch(fs->session, fs->watch_id);
    -              fs->watch_id = 0;
    -       }

            g_free(node->buffer);
            g_free(node);
    @@ -900,6 +898,13 @@ static void session_read_info_cb(const struct ofono_error *error,
            }
     }

    +static void session_destroy_cb(void *userdata)
    +{
    +       struct sim_fs *fs = userdata;
    +
    +       fs->watch_id = 0;
    +}
    +
     static void get_session_cb(ofono_bool_t active, int session_id,
                   void *data)
     {
    @@ -955,7 +960,7 @@ static gboolean sim_fs_op_next(gpointer user_data)
                          else
                                 fs->watch_id = __ofono_sim_add_session_watch(
                                               fs->session, get_session_cb,
    -                                          fs, NULL);
    +                                          fs, session_destroy_cb);
                   }
            } else {
                   switch (op->structure) {

#9 Updated by pespin 9 months ago

Merged in https://git.kernel.org/pub/scm/network/ofono/ofono.git/commit/?id=a99c0be535410a92773ffdfbebb766bec66b66fe.

TODO:
- Rebase osmo-gsm-tester branch on top of this commit, and deploy it in RnD and prod. It was already tested in RnD that indeed it fixes the crash.
- Submit patch to gerrit reverting osmo-gsm-tester f8d12196f05b20aa28b3103db26ea37ba6849362

#10 Updated by lynxis 8 months ago

  • Assignee changed from lynxis to pespin

pespin: I'm unsure, what I should do here. It seems to fixed everything :).

#11 Updated by pespin 8 months ago

  • Assignee changed from pespin to lynxis

The TODO list is still in TODO ;)

- Rebase osmo-gsm-tester branch on top of this commit <- this may have already been done. However, there were several QMI mem leaks fixed recently, I think it makes sense to rebase on top of master.
- Deploy it in RnD and prod. It was already tested in RnD that indeed it fixes the crash. <- Once rebased, this needs to be done.
- Submit patch to gerrit reverting osmo-gsm-tester f8d12196f05b20aa28b3103db26ea37ba6849362 <- Once new ofono is in place, this needs to be done.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)