Project

General

Profile

Bug #4097

osmo-remsim-bankd leaking memory

Added by roh 5 months ago. Updated 8 days ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
Start date:
07/10/2019
Due date:
% Done:

60%


Description

maybe this is similar to #4096

Version 0.1.0.17.6d8a

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
after ~40days:
root       797  0.0 34.1 764612 348240 ?       Ssl  May29   1:04 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5
after restart:
root     24975  0.3  0.4 420884  4244 ?        Ssl  18:22   0:00 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5

Version 0.1.0.21.c8bc after restart:

root     27575  0.0  0.4 420884  4168 ?        Ssl  18:26   0:00 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5


Related issues

Related to osmo-remsim - Bug #4096: osmo-remsim-server leaking memoryClosed07/10/2019

Related to osmo-remsim - Bug #4119: osmo-remsim-client-st2 leaking memoryClosed07/18/2019

History

#1 Updated by laforge 5 months ago

remsim-bankd has the usual 'SIGUSR1' handler of most osmocom projects, i.e. sending
a SIGURS1 to the process should render a talloc_report_full to stdout. From there
it's probably quite easy to see which objects are leaking.

#2 Updated by roh 5 months ago

talloc output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     27575  0.0  1.1 428140 11952 ?        Ssl  Jul10   0:02 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5

Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of main thread:
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'global' (total    889 bytes in   6 blocks)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     logging                        contains    889 bytes in   5 blocks (ref 0) 0x564da7e8d5f0
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:         struct log_target              contains    208 bytes in   2 blocks (ref 0) 0x564da7e8d9e0
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:             struct log_category            contains     40 bytes in   1 blocks (ref 0) 0x564da7e8daf0
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:         struct log_info                contains    680 bytes in   2 blocks (ref 0) 0x564da7e8d660
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:             struct log_info_cat            contains    640 bytes in   1 blocks (ref 0) 0x564da7e8d6f0
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(0)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     bankd-worker(0)                contains     16 bytes in   1 blocks (ref 0) 0x7faf10000a00
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7faf10000990
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(1)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     bankd-worker(1)                contains     16 bytes in   1 blocks (ref 0) 0x7faf14000a00
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7faf14000990
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(2)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     bankd-worker(2)                contains     16 bytes in   1 blocks (ref 0) 0x7faf1c000a00
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7faf1c000990
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(3)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     bankd-worker(3)                contains     16 bytes in   1 blocks (ref 0) 0x7faf18000a00
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7faf18000990
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(4)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     bankd-worker(4)                contains     16 bytes in   1 blocks (ref 0) 0x7faf0c000a00
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7faf0c000990

seems fine to me...

i ran valgrind --leak-check=yes on this and those were the most interesting details of multiple runs

==28334== 4,305 bytes in 3 blocks are definitely lost in loss record 45 of 48
==28334==    at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28334==    by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8)
==28334==    by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28334==    by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28334==    by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28334==    by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2)
==28334==    by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28334==    by 0x10AFFB: main (bankd_main.c:337)
==28334== 
==28334== 7,175 bytes in 5 blocks are definitely lost in loss record 46 of 48
==28334==    at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28334==    by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8)
==28334==    by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28334==    by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28334==    by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28334==    by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2)
==28334==    by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28334==    by 0x10B002: main (bankd_main.c:337)

i stopped the systemd services for now and ran the bankd in screen and valgrind to see how what output i get tomorrow.

#3 Updated by roh 5 months ago

run from valgrind:

==28369== 20,527,675 bytes in 14,305 blocks are definitely lost in loss record 48 of 49
==28369==    at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28369==    by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8)
==28369==    by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28369==    by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28369==    by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28369==    by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2)
==28369==    by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28369==    by 0x10B002: main (bankd_main.c:337)
==28369== 
==28369== 20,534,850 bytes in 14,310 blocks are definitely lost in loss record 49 of 49
==28369==    at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==28369==    by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8)
==28369==    by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28369==    by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28369==    by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0)
==28369==    by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2)
==28369==    by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0)
==28369==    by 0x10AFFB: main (bankd_main.c:337)
==28369== 
==28369== LEAK SUMMARY:
==28369==    definitely lost: 41,062,525 bytes in 28,615 blocks
==28369==    indirectly lost: 0 bytes in 0 blocks
==28369==      possibly lost: 16,534 bytes in 57 blocks
==28369==    still reachable: 95,707 bytes in 703 blocks
==28369==         suppressed: 0 bytes in 0 blocks
==28369== Reachable blocks (those to which a pointer was found) are not shown.
==28369== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==28369== 
==28369== For counts of detected and suppressed errors, rerun with: -v
==28369== ERROR SUMMARY: 28 errors from 28 contexts (suppressed: 0 from 0)
Killed
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     28369  0.0 10.3 241736 105576 pts/1   Sl+  Jul11   0:38      \_ /usr/bin/valgrind.bin --leak-check=yes /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5

freshly started in valgrind:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     31277 34.1  4.3 180792 44524 pts/1    Sl+  16:53   0:04      \_ /usr/bin/valgrind.bin --leak-check=yes --leak-check=full --show-leak-kinds=all /usr/bin/osmo-remsim-bankd -i 10.9.99

#4 Updated by laforge 5 months ago

  • Assignee set to laforge
  • Priority changed from Normal to Urgent

#5 Updated by laforge 5 months ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 60

This one should be fixed by https://gerrit.osmocom.org/c/osmo-remsim/+/14831 roh feel free to test.

I pushed all the related patches to a laforge/memleak branch.

#6 Updated by laforge 5 months ago

  • Related to Bug #4096: osmo-remsim-server leaking memory added

#7 Updated by laforge 5 months ago

  • Related to Bug #4119: osmo-remsim-client-st2 leaking memory added

#8 Updated by fixeria 5 months ago

BTW, some talloc allocations can be done using NULL context. You won't see this in regular reports, but there is a special debug facility - just call talloc_enable_leak_report() in the main(), so then you will be able to call talloc_report(NULL, stderr) or talloc_report_full(NULL, stderr).

#9 Updated by laforge 5 months ago

On Thu, Jul 18, 2019 at 05:23:17PM +0000, fixeria [REDMINE] wrote:

BTW, some talloc allocations can be done using NULL context. You won't see this in regular reports, but there is a special debug facility - just call talloc_enable_leak_report() in the main(), so then you will be able to call talloc_report(NULL, stderr) or talloc_report_full(NULL, stderr).

Please don't suggest to do this in multithreaded projects. We explicitly disable any form of
NULL tracking in remsim-server and remsim-bankd because it's not thread safe.

#10 Updated by roh 5 months ago

talloc from the binary of yesterday:

Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of main thread:
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'global' (total    889 bytes in   6 blocks)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     logging                        contains    889 bytes in   5 blocks (ref 0) 0x5632f3fda5f0
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:         struct log_target              contains    208 bytes in   2 blocks (ref 0) 0x5632f3fda9e0
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:             struct log_category            contains     40 bytes in   1 blocks (ref 0) 0x5632f3fdaaf0
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:         struct log_info                contains    680 bytes in   2 blocks (ref 0) 0x5632f3fda660
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:             struct log_info_cat            contains    640 bytes in   1 blocks (ref 0) 0x5632f3fda6f0
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(0)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     bankd-worker(0)                contains     16 bytes in   1 blocks (ref 0) 0x7f7cbc000a00
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7f7cbc000990
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(1)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     bankd-worker(1)                contains     16 bytes in   1 blocks (ref 0) 0x7f7cb8000a00
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7f7cb8000990
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(2)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     bankd-worker(2)                contains     16 bytes in   1 blocks (ref 0) 0x7f7cc4000a00
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7f7cc4000990
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(3)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     bankd-worker(3)                contains     16 bytes in   1 blocks (ref 0) 0x7f7cc0000a00
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7f7cc0000990
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(4)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total     16 bytes in   3 blocks)
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     bankd-worker(4)                contains     16 bytes in   1 blocks (ref 0) 0x7f7cc8000a00
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]:     asn1                           contains      0 bytes in   1 blocks (ref 0) 0x7f7cc8000990

#11 Updated by laforge 3 months ago

  • Status changed from In Progress to Feedback
  • Assignee changed from laforge to roh

as the leak wasn't visible in talloc report before, the new talloc output isn't really going to confirm or deny the leak has been fixed. You have to use the same method to verify as before where you saw the leak.

#12 Updated by roh 3 months ago

this one is running for 2 weeks now...

root 314 0.0 0.4 420900 4624 ? Ssl Sep03 0:32 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5

i think this is done

#13 Updated by laforge 8 days ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)