Project

General

Profile

Bug #1934

osmo-sip-connector log message is too verbose

Added by oramadan 12 months ago. Updated 9 months ago.

Status:
Closed
Priority:
Urgent
Assignee:
Target version:
-
Start date:
02/01/2017
Due date:
% Done:

0%

Resolution:

Description

A never ending stream of error messages is generated if a call is placed and the SIP server is down. It looks like the messages are generated from sofia-sip and I have managed to suppress the messages by setting the environment variable SU_DEBUG < 3: http://sofia-sip.sourcearchive.com/documentation/1.12.7/tport__internal_8h_e540a66790cd6f3122cd3590200b5429.html

These messages should be disabled by default and the call properly terminated.

This is the error stream that filled disks in production:

58): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f25

History

#1 Updated by zecke 12 months ago

  • Assignee changed from sysmocom to zecke

Only thing is.. tport_recv_dgram is not called directly by the osmo-sip-connector but through the sofia-glib integration. Is there an easy way for you to reproduce it? It is something I have seen myself but didn't use the opportunity to analyze back then. :(

#2 Updated by laforge 11 months ago

#3 Updated by laforge 11 months ago

Hi Holger, I'm wondering if you will have time to investigate this in the short term. If not (this is perfectly fine), we need to re-assign the ticket as it is considered Urgent. Thanks.

#4 Updated by zecke 11 months ago

On 11 Feb 2017, at 00:57, laforge [REDMINE] <> wrote:

Issue #1934 has been updated by laforge.

Hi Holger, I'm wondering if you will have time to investigate this in the short term. If not (this is perfectly fine), we need to re-assign the ticket as it is considered Urgent. Thanks.

I intend to look at it but my biggest issue is to reproduce it. My first "task" is to use your python MNCC code to emulate a NITB and send some pseudo SIP messages back. But most likely it is not enough to trigger the issue. We have also never seen this with LCR that is using sofia sip. which leads to..

a.) sofia sip glib is faulty
b.) I use it wrongly
c.) ???

holger

#5 Updated by zecke 11 months ago

sofia-sip seems unmaintained. FreeSWITCH includes a fork in their tree but I couldn't see a relevant patch, linphone seems to be the "best" new upstream and there is a debian ticket to build sofia-sip from this upstream. I sent a patch to debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854848) and linphone to add the \n to the log message.

tport_base_wakeup
 -> tport_recv_event
   -> tport_recv_data
    -> tport_recv_dgram

and tport_recv_dgram does

static int tport_base_wakeup(tport_t *self, int events)
{
...
  if ((events & SU_WAIT_IN) && !self->tp_closed)
    tport_recv_event(self);
...
}
...
int tport_recv_dgram(tport_t *self)
{
  N = (ssize_t)su_getmsgsize(self->tp_socket);
  if (N == -1) {
    int err = su_errno();
    SU_DEBUG_1(("%s(%p): su_getmsgsize(): %s (%d)\n", __func__, (void *)self,
                su_strerror(err), err));
    return -1;
  }
  if (N == 0) {
    su_recv(self->tp_socket, sample, 1, 0);
    SU_DEBUG_3(("tport(%p): zero length packet\n", (void *)self));
    return 0;
  }

#6 Updated by zecke 11 months ago

So based on this reading it seems like that my Osmocom glue or sofia-sip-glib consider the socket as readable but nothing can be read from it. As msgsize returns 0 the fd must still be valid... Maybe we have enabled an OOB event but sofia-sip can not read it?

#7 Updated by zecke 11 months ago

remote:   https://gerrit.osmocom.org/1795 evpoll: Use {} in the middle of the if/else if/else block
remote:   https://gerrit.osmocom.org/1796 evpoll: Always initialize revents for the output
remote:   https://gerrit.osmocom.org/1797 evpoll: Don't try to be more smart than g_poll

I pushed these changes. The first one is cosmetic, the second two could fix issues.. As I am not able to reproduce it right now.. please test.

#8 Updated by zecke 11 months ago

  • Status changed from New to Feedback

Okay. The issue is finally understood.

  • sofia-sip sets IP_RECVERR on the UDP socket to get IMCP (and other) errors to more quickly fail a call
  • By implementing poll using select we never set POLLERR on the pollfd
  • sofia-sip is using recvmsg that fails as the error must be read first (and will never be read because POLLERR is not set in revents).

=> Work around: https://gerrit.osmocom.org/#/c/1985/
=> Implement select using poll: https://gerrit.osmocom.org/#/c/1987/ https://gerrit.osmocom.org/#/c/1988/

Need to decide what to merge (or all of it)

#9 Updated by 9muir 10 months ago

zecke wrote:

Okay. The issue is finally understood.

  • sofia-sip sets IP_RECVERR on the UDP socket to get IMCP (and other) errors to more quickly fail a call
  • By implementing poll using select we never set POLLERR on the pollfd
  • sofia-sip is using recvmsg that fails as the error must be read first (and will never be read because POLLERR is not set in revents).

=> Work around: https://gerrit.osmocom.org/#/c/1985/
=> Implement select using poll: https://gerrit.osmocom.org/#/c/1987/ https://gerrit.osmocom.org/#/c/1988/

Need to decide what to merge (or all of it)

Thanks for getting to the bottom of this. I'll see if I can reproduce the originally bug using my test setup so that we can test the solution(s).

#10 Updated by zecke 10 months ago

On 9 Mar 2017, at 18:34, 9muir [REDMINE] <> wrote:

Issue #1934 has been updated by 9muir.

Need to decide what to merge (or all of it)

Thanks for getting to the bottom of this. I'll see if I can reproduce the originally bug using my test setup so that we can test the solution(s).

It should be enough to put "127.0.0.2" as remote PBX address or any IP that will trigger ICMP unreachable.

#11 Updated by 9muir 10 months ago

zecke wrote:

On 9 Mar 2017, at 18:34, 9muir [REDMINE] <> wrote:

Issue #1934 has been updated by 9muir.

Need to decide what to merge (or all of it)

Thanks for getting to the bottom of this. I'll see if I can reproduce the originally bug using my test setup so that we can test the solution(s).

It should be enough to put "127.0.0.2" as remote PBX address or any IP that will trigger ICMP unreachable.

That sounds about right. From memory I usually encountered this problem in my dev environment when I was floundering around trying to get the initial config created. I assume from your statement that you too are able to easily reproduce the problem now then?

#12 Updated by laforge 10 months ago

  • Project changed from OsmoBSC to osmo-sip-connector

#13 Updated by zecke 10 months ago

The fix itself is merged, the change from select->poll is not merged yet. I think we want to map select to ppoll. It would be nice if the pending branch could be tested too.

#14 Updated by zecke 9 months ago

  • Status changed from Feedback to Closed

Please re-open if you see this again.

Also available in: Atom PDF