Actions
Bug #5774
openDrop pcu_sock conn if queue length grows up to a given threshold
Start date:
11/18/2022
Due date:
% Done:
90%
Spec Reference:
Description
<pespin_> we seem to have some memleak or mem-kept problem with osmo-bts-trx after a while when the pcu is paused in gdb <pespin_> my osmo-bts-trx was taking 35% of my 16GB RAM <LaF0rge> pespin_: well, probably lots of primitives queued for the pcu socket ;) <pespin_> probably osmo-bts-trx sees the PCU socket still alive (process paused by gdb) and keeps queueing stuff forever in the PCUIF queue <pespin_> yeah <fixeria> maybe it's queueing the PCUIF messages somewhere? <pespin_> we should block the queue from growing like that <pespin_> keep it at a meaningful maximum size and drop older messages when adding new ones if the threshold is reached <pespin_> I'll create a ticket to remind us <fixeria> libosmocore's write_queue allows to set the limit, AFAIR <pespin_> yeah but it's probably not dropping the old ones <Hoernchen_> just drop all, no point trying to fix it by keeping "recent" messages, since the whole relationship of messages is fucked up anyway <fixeria> but it will be rejecting new messages, not old <pespin_> Hoernchen_, still it may be able to recover that way <Hoernchen_> how? <pespin_> just by keeping processing new messages? <Hoernchen_> yeah,new ones, but why bother with the ones in the queue? <pespin_> that's why I said "up to a meaningul length" <Hoernchen_> it just adds complexity keping track without actually affecting the probability of the "catching up" part after missing a lot of messages <pespin_> no need to keep 1000 of them <fixeria> the messages are queued because the fd is not writeable, so by processing out-of-queue you'll simply block the process <pespin_> not much complexity: if (len > 10), llist_del(list_head->next); llist_add_tail(new_msg) <LaF0rge> i would probably simply close the pcu_socket if we detect long queue lengths <LaF0rge> let the pcu reconnect if it recovers/restarts <LaF0rge> so have a osmo_wqueue, and if it overflows, close the fd, done. <LaF0rge> [and flush the queue of course when closing] <pespin_> ack makes sense
So let's establish a good threshold (configurable to allow debugging pcu without breaking the conn?) at which osmo-bts-trx can decide the PCU is stuck and can close the conn and drop the messages until it can reconnect again successfuly.
Files
Actions