Bug #4097
closedosmo-remsim-bankd leaking memory
60%
Description
maybe this is similar to #4096
Version 0.1.0.17.6d8a
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND after ~40days: root 797 0.0 34.1 764612 348240 ? Ssl May29 1:04 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5 after restart: root 24975 0.3 0.4 420884 4244 ? Ssl 18:22 0:00 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5
Version 0.1.0.21.c8bc after restart:
root 27575 0.0 0.4 420884 4168 ? Ssl 18:26 0:00 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5
Related issues
Updated by laforge almost 5 years ago
remsim-bankd has the usual 'SIGUSR1' handler of most osmocom projects, i.e. sending
a SIGURS1 to the process should render a talloc_report_full to stdout. From there
it's probably quite easy to see which objects are leaking.
Updated by roh almost 5 years ago
talloc output:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 27575 0.0 1.1 428140 11952 ? Ssl Jul10 0:02 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5
Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of main thread: Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'global' (total 889 bytes in 6 blocks) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: logging contains 889 bytes in 5 blocks (ref 0) 0x564da7e8d5f0 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: struct log_target contains 208 bytes in 2 blocks (ref 0) 0x564da7e8d9e0 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: struct log_category contains 40 bytes in 1 blocks (ref 0) 0x564da7e8daf0 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: struct log_info contains 680 bytes in 2 blocks (ref 0) 0x564da7e8d660 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: struct log_info_cat contains 640 bytes in 1 blocks (ref 0) 0x564da7e8d6f0 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(0) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: bankd-worker(0) contains 16 bytes in 1 blocks (ref 0) 0x7faf10000a00 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7faf10000990 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(1) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: bankd-worker(1) contains 16 bytes in 1 blocks (ref 0) 0x7faf14000a00 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7faf14000990 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(2) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: bankd-worker(2) contains 16 bytes in 1 blocks (ref 0) 0x7faf1c000a00 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7faf1c000990 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(3) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: bankd-worker(3) contains 16 bytes in 1 blocks (ref 0) 0x7faf18000a00 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7faf18000990 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: === Talloc Report of bankd-worker(4) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: bankd-worker(4) contains 16 bytes in 1 blocks (ref 0) 0x7faf0c000a00 Jul 11 16:06:06 remsimbank osmo-remsim-bankd[27575]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7faf0c000990
seems fine to me...
i ran valgrind --leak-check=yes on this and those were the most interesting details of multiple runs
==28334== 4,305 bytes in 3 blocks are definitely lost in loss record 45 of 48 ==28334== at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28334== by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8) ==28334== by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28334== by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28334== by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28334== by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2) ==28334== by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28334== by 0x10AFFB: main (bankd_main.c:337) ==28334== ==28334== 7,175 bytes in 5 blocks are definitely lost in loss record 46 of 48 ==28334== at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28334== by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8) ==28334== by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28334== by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28334== by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28334== by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2) ==28334== by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28334== by 0x10B002: main (bankd_main.c:337)
i stopped the systemd services for now and ran the bankd in screen and valgrind to see how what output i get tomorrow.
Updated by roh almost 5 years ago
run from valgrind:
==28369== 20,527,675 bytes in 14,305 blocks are definitely lost in loss record 48 of 49 ==28369== at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28369== by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8) ==28369== by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28369== by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28369== by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28369== by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2) ==28369== by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28369== by 0x10B002: main (bankd_main.c:337) ==28369== ==28369== 20,534,850 bytes in 14,310 blocks are definitely lost in loss record 49 of 49 ==28369== at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==28369== by 0x52C0661: talloc_named_const (in /usr/lib/x86_64-linux-gnu/libtalloc.so.2.1.8) ==28369== by 0x56F2B61: msgb_alloc_c (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28369== by 0x4E6C1C8: ipa_msg_alloc (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28369== by 0x4E6CB3A: ipa_msg_recv_buffered (in /usr/lib/x86_64-linux-gnu/libosmogsm.so.12.0.0) ==28369== by 0x50A99F5: ??? (in /usr/lib/x86_64-linux-gnu/libosmoabis.so.6.0.2) ==28369== by 0x56F26F6: osmo_select_main (in /usr/lib/x86_64-linux-gnu/libosmocore.so.12.1.0) ==28369== by 0x10AFFB: main (bankd_main.c:337) ==28369== ==28369== LEAK SUMMARY: ==28369== definitely lost: 41,062,525 bytes in 28,615 blocks ==28369== indirectly lost: 0 bytes in 0 blocks ==28369== possibly lost: 16,534 bytes in 57 blocks ==28369== still reachable: 95,707 bytes in 703 blocks ==28369== suppressed: 0 bytes in 0 blocks ==28369== Reachable blocks (those to which a pointer was found) are not shown. ==28369== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==28369== ==28369== For counts of detected and suppressed errors, rerun with: -v ==28369== ERROR SUMMARY: 28 errors from 28 contexts (suppressed: 0 from 0) Killed
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 28369 0.0 10.3 241736 105576 pts/1 Sl+ Jul11 0:38 \_ /usr/bin/valgrind.bin --leak-check=yes /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5
freshly started in valgrind:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 31277 34.1 4.3 180792 44524 pts/1 Sl+ 16:53 0:04 \_ /usr/bin/valgrind.bin --leak-check=yes --leak-check=full --show-leak-kinds=all /usr/bin/osmo-remsim-bankd -i 10.9.99
Updated by laforge almost 5 years ago
- Assignee set to laforge
- Priority changed from Normal to Urgent
Updated by laforge almost 5 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 60
This one should be fixed by https://gerrit.osmocom.org/c/osmo-remsim/+/14831 roh feel free to test.
I pushed all the related patches to a laforge/memleak
branch.
Updated by laforge almost 5 years ago
- Related to Bug #4096: osmo-remsim-server leaking memory added
Updated by laforge almost 5 years ago
- Related to Bug #4119: osmo-remsim-client-st2 leaking memory added
Updated by fixeria almost 5 years ago
BTW, some talloc allocations can be done using NULL context. You won't see this in regular reports, but there is a special debug facility - just call talloc_enable_leak_report() in the main(), so then you will be able to call talloc_report(NULL, stderr) or talloc_report_full(NULL, stderr).
Updated by laforge almost 5 years ago
On Thu, Jul 18, 2019 at 05:23:17PM +0000, fixeria [REDMINE] wrote:
BTW, some talloc allocations can be done using NULL context. You won't see this in regular reports, but there is a special debug facility - just call talloc_enable_leak_report() in the main(), so then you will be able to call talloc_report(NULL, stderr) or talloc_report_full(NULL, stderr).
Please don't suggest to do this in multithreaded projects. We explicitly disable any form of
NULL tracking in remsim-server and remsim-bankd because it's not thread safe.
Updated by roh over 4 years ago
talloc from the binary of yesterday:
Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of main thread: Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'global' (total 889 bytes in 6 blocks) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: logging contains 889 bytes in 5 blocks (ref 0) 0x5632f3fda5f0 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: struct log_target contains 208 bytes in 2 blocks (ref 0) 0x5632f3fda9e0 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: struct log_category contains 40 bytes in 1 blocks (ref 0) 0x5632f3fdaaf0 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: struct log_info contains 680 bytes in 2 blocks (ref 0) 0x5632f3fda660 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: struct log_info_cat contains 640 bytes in 1 blocks (ref 0) 0x5632f3fda6f0 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(0) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: bankd-worker(0) contains 16 bytes in 1 blocks (ref 0) 0x7f7cbc000a00 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7f7cbc000990 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(1) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: bankd-worker(1) contains 16 bytes in 1 blocks (ref 0) 0x7f7cb8000a00 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7f7cb8000990 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(2) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: bankd-worker(2) contains 16 bytes in 1 blocks (ref 0) 0x7f7cc4000a00 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7f7cc4000990 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(3) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: bankd-worker(3) contains 16 bytes in 1 blocks (ref 0) 0x7f7cc0000a00 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7f7cc0000990 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: === Talloc Report of bankd-worker(4) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: full talloc report on 'top' (total 16 bytes in 3 blocks) Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: bankd-worker(4) contains 16 bytes in 1 blocks (ref 0) 0x7f7cc8000a00 Jul 24 16:23:45 remsimbank osmo-remsim-bankd[930]: asn1 contains 0 bytes in 1 blocks (ref 0) 0x7f7cc8000990
Updated by laforge over 4 years ago
- Status changed from In Progress to Feedback
- Assignee changed from laforge to roh
as the leak wasn't visible in talloc report before, the new talloc output isn't really going to confirm or deny the leak has been fixed. You have to use the same method to verify as before where you saw the leak.
Updated by roh over 4 years ago
this one is running for 2 weeks now...
root 314 0.0 0.4 420900 4624 ? Ssl Sep03 0:32 /usr/bin/osmo-remsim-bankd -i 10.9.99.10 -n 5
i think this is done