Project

General

Profile

Actions

Bug #5774

open

Drop pcu_sock conn if queue length grows up to a given threshold

Added by pespin 10 days ago.

Status:
New
Priority:
Normal
Assignee:
sysmocom
Category:
-
Target version:
-
Start date:
11/18/2022
Due date:
% Done:

0%

Spec Reference:

Description

<pespin_> we seem to have some memleak or mem-kept problem with osmo-bts-trx after a while when the pcu is paused in gdb
<pespin_> my osmo-bts-trx was taking 35% of my 16GB RAM
<LaF0rge> pespin_: well, probably lots of primitives queued for the pcu socket ;)
<pespin_> probably osmo-bts-trx sees the PCU socket still alive (process paused by gdb) and keeps queueing stuff forever in the PCUIF queue
<pespin_> yeah
<fixeria> maybe it's queueing the PCUIF messages somewhere?
<pespin_> we should block the queue from growing like that
<pespin_> keep it at a meaningful maximum size and drop older messages when adding new ones if the threshold is reached
<pespin_> I'll create a ticket to remind us
<fixeria> libosmocore's write_queue allows to set the limit, AFAIR
<pespin_> yeah but it's probably not dropping the old ones
<Hoernchen_> just drop all, no point trying to fix it by keeping "recent" messages, since the whole relationship of messages is fucked up anyway
<fixeria> but it will be rejecting new messages, not old
<pespin_> Hoernchen_, still it may be able to recover that way
<Hoernchen_> how?
<pespin_> just by keeping processing new messages?
<Hoernchen_> yeah,new ones, but why bother with the ones in the queue?
<pespin_> that's why I said "up to a meaningul length" 
<Hoernchen_> it just adds complexity keping track without actually affecting the probability of the "catching up" part after missing a lot of messages
<pespin_> no need to keep 1000 of them
<fixeria> the messages are queued because the fd is not writeable, so by processing out-of-queue you'll simply block the process
<pespin_> not much complexity: if (len > 10), llist_del(list_head->next); llist_add_tail(new_msg)
<LaF0rge> i would probably simply close the pcu_socket if we detect long queue lengths
<LaF0rge> let the pcu reconnect if it recovers/restarts
<LaF0rge> so have a osmo_wqueue, and if it overflows, close the fd, done.
<LaF0rge> [and flush the queue of course when closing]
<pespin_> ack makes sense

So let's establish a good threshold (configurable to allow debugging pcu without breaking the conn?) at which osmo-bts-trx can decide the PCU is stuck and can close the conn and drop the messages until it can reconnect again successfuly.

No data to display

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)