Bug #4136
closedosmo-trx-lms: makes kernel eat all system memory when run under realtime priority
100%
Description
Originally https://osmocom.org/issues/3971, I deleted it by error.
Initially found and described in detail here: https://osmocom.org/issues/3339?#note-15
Related LimeSuite github issue: https://github.com/myriadrf/LimeSuite/issues/263
My system totally freezes for 2-5 seconds before/during the time osmo-trx starts failing reading/writing on OS#3339. That happens about 30 second after starting osmo-trx-lms. My XServer blocks and music playing from a youtube video on the background also either stops or plays in a 1 sec loop. When I recover control of my system, I can see in the logs of osmo-trx the read/write failure from OS#3339.
Through htop one can easily see that upon starting osmo-trx-lms, memory suddenly grows until filling my 16GB, and then is when my system freezes and osmo-trx starts failing, during that time kernel is working heavily to free up memory.
Interestingly, if I strace the osmo-trx-lms I don't see this kind of issue, but it's true too that the CPU consumption drops a lot too. strace only shows heavy use of calls: accept(), poll() and select().
If I ctrl+z (SIGSTOP) the osmo-trx-lms, the kernel stops acquiring memory (and releases most of it). Once I use "fg" to SIGCONTINUE the process, it continues acquiring memory like crazy. Same if I use gdb to do the same kind of operation.
Allocation happens in kernel memory, not process-related memory:
kernel dynamic memory 10.2G 1009.3M 9.2G <-----!!!!!!! OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 26117200 26114592 99% 0.25K 816487 32 6531896K filp <---!!!!!!! 26120640 26118794 99% 0.06K 408135 64 1632540K kmalloc-64 <---!!!!!!!
- Reproducible both on LimeSDR-USB and LimeSDR-mini HW.
- reproducible both on USB2 and USB3.
- Reproducible both on LimeSuite 18.10.* and 19.01.*
- Reproducible both on kernel 4.19.4-arch1-1-ARCH and 5.0.9-arch1-1-ARCH
- Reproducible on 1.0.22-1
- Reproducible both with ASan enabled or disabled.
---------------
I found how to reproduce it or avoid reproducing it on my system:
Add "rt-prio 18" on osmo-trx-lms.cfg -> BUG
remove it -> no memleak.
So somehow changing the process to use realtime priority makes the kernel not free stuff on time. looks like it's not really a memleak, since if you pause the process the memory is freed at some point a few secs later. But still looks like the kernel is not freeing memory quick enough to keep up with the allocation pace.
"rt-prio 18" in osmo-trx-lms.cfg basically means osmo-trx-lms is going to call this during startup:
struct sched_param param; memset(¶m, 0, sizeof(param)); param.sched_priority = 18; rc = sched_setscheduler(getpid(), SCHED_RR, ¶m);
----------------------------
Issue still happening on my Thinkpad x280 laptop with:
LimeSite: 19.04.0.r0.abf82140
kernel: 5.2.2-arch1-1-ARCH x86_64
libusb: 1.0.22-1
A comment was added in related github issue from LimeSuite developer:
If it libusb ralated, then it might be "libusb_handle_events_timeout_completed()" that is being run in a loop (the loop is started when opening device). The timeout is set to 250ms in LimeSuite so it blocks for up to 250ms if there is no USB events. I don't know what it does internally while waiting for events.
Attached a perf record of osmo-trx-lms running with rt-prio and seeing memory growing (around 2GB till stopped iirc).
Samples: 106K of event 'cycles', Event count (approx.): 88212958338 Overhead Command Shared Object Symbol 10.52% osmo-trx-lms [kernel.kallsyms] [k] do_syscall_64 ◆ 5.61% osmo-trx-lms [kernel.kallsyms] [k] syscall_return_via_sysret ▒ 5.48% osmo-trx-lms [kernel.kallsyms] [k] entry_SYSCALL_64 ▒ 5.20% osmo-trx-lms [kernel.kallsyms] [k] preempt_count_add ▒ 4.72% osmo-trx-lms [kernel.kallsyms] [k] _raw_spin_lock ▒ 4.04% osmo-trx-lms [kernel.kallsyms] [k] kmem_cache_alloc ▒ 2.67% osmo-trx-lms [kernel.kallsyms] [k] __alloc_file ▒ 2.24% osmo-trx-lms [kernel.kallsyms] [k] preempt_count_sub ▒ 1.89% osmo-trx-lms [kernel.kallsyms] [k] get_mem_cgroup_from_mm ▒ 1.86% osmo-trx-lms [kernel.kallsyms] [k] kmem_cache_free ▒ 1.74% osmo-trx-lms [kernel.kallsyms] [k] __call_rcu ▒ 1.60% osmo-trx-lms [kernel.kallsyms] [k] _raw_spin_lock_irq ▒ 1.58% osmo-trx-lms [kernel.kallsyms] [k] debug_smp_processor_id ▒ 1.50% osmo-trx-lms libpthread-2.29.so [.] __libc_accept ▒ 1.45% osmo-trx-lms [kernel.kallsyms] [k] truncate_inode_pages_range ▒ 1.41% osmo-trx-lms [kernel.kallsyms] [k] in_lock_functions ▒ 1.39% osmo-trx-lms [kernel.kallsyms] [k] memcg_kmem_get_cache ▒ 1.34% osmo-trx-lms [kernel.kallsyms] [k] __sys_accept4 ▒ 1.31% osmo-trx-lms [kernel.kallsyms] [k] __fput ▒ 1.18% osmo-trx-lms [kernel.kallsyms] [k] task_work_run ▒ 1.06% osmo-trx-lms [kernel.kallsyms] [k] task_work_add ▒ 1.05% osmo-trx-lms [kernel.kallsyms] [k] __fget ▒ 0.98% osmo-trx-lms [kernel.kallsyms] [k] _raw_spin_lock_bh ▒ 0.97% osmo-trx-lms [kernel.kallsyms] [k] rcu_segcblist_enqueue ▒ 0.90% osmo-trx-lms [kernel.kallsyms] [k] inet_csk_accept ▒ 0.88% osmo-trx-lms [kernel.kallsyms] [k] fput_many ▒ 0.87% osmo-trx-lms [kernel.kallsyms] [k] new_slab ▒ 0.81% osmo-trx-lms [kernel.kallsyms] [k] do_softirq.part.0 ▒ 0.78% osmo-trx-lms [kernel.kallsyms] [k] __fsnotify_parent ▒ 0.76% osmo-trx-lms [kernel.kallsyms] [k] exit_to_usermode_loop ▒ 0.75% osmo-trx-lms [kernel.kallsyms] [k] __srcu_read_lock ▒ 0.72% osmo-trx-lms [kernel.kallsyms] [k] syscall_trace_enter ▒ 0.70% osmo-trx-lms [kernel.kallsyms] [k] __rcu_read_unlock ▒ 0.70% osmo-trx-lms [kernel.kallsyms] [k] inet_release ▒ 0.62% osmo-trx-lms [kernel.kallsyms] [k] _raw_spin_unlock ▒ 0.60% osmo-trx-lms [kernel.kallsyms] [k] __fget_light ▒ 0.59% osmo-trx-lms [kernel.kallsyms] [k] __audit_syscall_exit ▒ 0.58% osmo-trx-lms [kernel.kallsyms] [k] memset_erms ▒ 0.56% osmo-trx-lms [kernel.kallsyms] [k] __local_bh_enable_ip ▒ 0.56% osmo-trx-lms libpthread-2.29.so [.] __pthread_enable_asynccancel ▒ 0.56% osmo-trx-lms [kernel.kallsyms] [k] __srcu_read_unlock ▒ 0.55% osmo-trx-lms [kernel.kallsyms] [k] inet_accept ▒ 0.55% osmo-trx-lms [kernel.kallsyms] [k] _raw_spin_trylock ▒ 0.55% osmo-trx-lms [kernel.kallsyms] [k] __rcu_read_lock ▒ 0.55% osmo-trx-lms [kernel.kallsyms] [k] __alloc_fd ▒ 0.54% osmo-trx-lms libpthread-2.29.so [.] __pthread_disable_asynccancel ▒ 0.53% osmo-trx-lms [kernel.kallsyms] [k] sock_alloc_inode ▒ 0.53% osmo-trx-lms [kernel.kallsyms] [k] kmem_cache_alloc_trace ▒ 0.52% osmo-trx-lms [kernel.kallsyms] [k] sock_close ▒ 0.51% osmo-trx-lms [kernel.kallsyms] [k] _atomic_dec_and_lock ▒ 0.50% osmo-trx-lms [kernel.kallsyms] [k] fsnotify ▒ 0.49% osmo-trx-lms [kernel.kallsyms] [k] __this_cpu_preempt_check ▒ 0.48% osmo-trx-lms [kernel.kallsyms] [k] down_write ▒ 0.48% osmo-trx-lms [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe ▒ 0.48% osmo-trx-lms [kernel.kallsyms] [k] __d_instantiate ▒ 0.48% osmo-trx-lms [kernel.kallsyms] [k] up_write ▒ 0.47% osmo-trx-lms [kernel.kallsyms] [k] percpu_counter_add_batch ▒ 0.47% osmo-trx-lms [kernel.kallsyms] [k] memcg_kmem_put_cache ▒ 0.46% osmo-trx-lms [kernel.kallsyms] [k] __x64_sys_accept ▒ 0.45% osmo-trx-lms [kernel.kallsyms] [k] inode_init_always ▒ 0.44% osmo-trx-lms [kernel.kallsyms] [k] __x86_indirect_thunk_rax ▒ 0.44% osmo-trx-lms [kernel.kallsyms] [k] _raw_spin_unlock_irq ▒ 0.44% osmo-trx-lms [kernel.kallsyms] [k] lockref_put_return ▒ 0.43% osmo-trx-lms [kernel.kallsyms] [k] truncate_inode_pages_final ▒ 0.41% osmo-trx-lms [kernel.kallsyms] [k] get_random_u32 ▒ 0.40% osmo-trx-lms [kernel.kallsyms] [k] tcp_release_cb ▒ 0.39% osmo-trx-lms [kernel.kallsyms] [k] __audit_syscall_entry ▒ 0.39% osmo-trx-lms [kernel.kallsyms] [k] map_id_range_down ▒ 0.37% osmo-trx-lms [kernel.kallsyms] [k] __sock_release ▒ 0.37% osmo-trx-lms [kernel.kallsyms] [k] memcpy_erms ▒ 0.34% osmo-trx-lms [kernel.kallsyms] [k] evict ▒ 0.34% osmo-trx-lms [kernel.kallsyms] [k] __d_alloc ▒ 0.31% osmo-trx-lms [kernel.kallsyms] [k] alloc_file ▒ 0.31% osmo-trx-lms [kernel.kallsyms] [k] __dentry_kill ▒ 0.29% osmo-trx-lms [kernel.kallsyms] [k] alloc_file_pseudo ▒ 0.27% osmo-trx-lms [kernel.kallsyms] [k] ksoftirqd_running ▒ 0.25% osmo-trx-lms [kernel.kallsyms] [k] sock_alloc ▒ 0.24% osmo-trx-lms [kernel.kallsyms] [k] locks_remove_file ▒ 0.23% osmo-trx-lms [kernel.kallsyms] [k] dput ▒ 0.23% osmo-trx-lms [kernel.kallsyms] [k] get_next_ino ▒ 0.23% osmo-trx-lms [kernel.kallsyms] [k] put_unused_fd ▒ 0.22% osmo-trx-lms [kernel.kallsyms] [k] release_sock ▒ 0.22% osmo-trx-lms [kernel.kallsyms] [k] __kthread_should_park ▒ 0.22% osmo-trx-lms [kernel.kallsyms] [k] iput ▒ 0.21% osmo-trx-lms [kernel.kallsyms] [k] dentry_unlink_inode ▒ 0.21% osmo-trx-lms [kernel.kallsyms] [k] d_set_d_op ▒ 0.20% osmo-trx-lms [kernel.kallsyms] [k] ____fput ▒ 0.20% osmo-trx-lms [kernel.kallsyms] [k] find_next_zero_bit ▒ 0.20% osmo-trx-lms [kernel.kallsyms] [k] get_page_from_freelist ▒
Files