Bug #1934
closed
osmo-sip-connector log message is too verbose
Added by oramadan about 7 years ago.
Updated about 7 years ago.
Description
A never ending stream of error messages is generated if a call is placed and the SIP server is down. It looks like the messages are generated from sofia-sip and I have managed to suppress the messages by setting the environment variable SU_DEBUG < 3: http://sofia-sip.sourcearchive.com/documentation/1.12.7/tport__internal_8h_e540a66790cd6f3122cd3590200b5429.html
These messages should be disabled by default and the call properly terminated.
This is the error stream that filled disks in production:
58): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f258): zero length packettport(0xb7e9f25
- Assignee changed from 4368 to zecke
Only thing is.. tport_recv_dgram is not called directly by the osmo-sip-connector but through the sofia-glib integration. Is there an easy way for you to reproduce it? It is something I have seen myself but didn't use the opportunity to analyze back then. :(
Hi Holger, I'm wondering if you will have time to investigate this in the short term. If not (this is perfectly fine), we need to re-assign the ticket as it is considered Urgent. Thanks.
On 11 Feb 2017, at 00:57, laforge [REDMINE] <redmine@lists.osmocom.org> wrote:
Issue #1934 has been updated by laforge.
Hi Holger, I'm wondering if you will have time to investigate this in the short term. If not (this is perfectly fine), we need to re-assign the ticket as it is considered Urgent. Thanks.
I intend to look at it but my biggest issue is to reproduce it. My first "task" is to use your python MNCC code to emulate a NITB and send some pseudo SIP messages back. But most likely it is not enough to trigger the issue. We have also never seen this with LCR that is using sofia sip. which leads to..
a.) sofia sip glib is faulty
b.) I use it wrongly
c.) ???
holger
sofia-sip seems unmaintained. FreeSWITCH includes a fork in their tree but I couldn't see a relevant patch, linphone seems to be the "best" new upstream and there is a debian ticket to build sofia-sip from this upstream. I sent a patch to debian (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854848) and linphone to add the \n to the log message.
tport_base_wakeup
-> tport_recv_event
-> tport_recv_data
-> tport_recv_dgram
and tport_recv_dgram does
static int tport_base_wakeup(tport_t *self, int events)
{
...
if ((events & SU_WAIT_IN) && !self->tp_closed)
tport_recv_event(self);
...
}
...
int tport_recv_dgram(tport_t *self)
{
N = (ssize_t)su_getmsgsize(self->tp_socket);
if (N == -1) {
int err = su_errno();
SU_DEBUG_1(("%s(%p): su_getmsgsize(): %s (%d)\n", __func__, (void *)self,
su_strerror(err), err));
return -1;
}
if (N == 0) {
su_recv(self->tp_socket, sample, 1, 0);
SU_DEBUG_3(("tport(%p): zero length packet\n", (void *)self));
return 0;
}
So based on this reading it seems like that my Osmocom glue or sofia-sip-glib consider the socket as readable but nothing can be read from it. As msgsize returns 0 the fd must still be valid... Maybe we have enabled an OOB event but sofia-sip can not read it?
remote: https://gerrit.osmocom.org/1795 evpoll: Use {} in the middle of the if/else if/else block
remote: https://gerrit.osmocom.org/1796 evpoll: Always initialize revents for the output
remote: https://gerrit.osmocom.org/1797 evpoll: Don't try to be more smart than g_poll
I pushed these changes. The first one is cosmetic, the second two could fix issues.. As I am not able to reproduce it right now.. please test.
- Status changed from New to Feedback
Okay. The issue is finally understood.
- sofia-sip sets IP_RECVERR on the UDP socket to get IMCP (and other) errors to more quickly fail a call
- By implementing poll using select we never set POLLERR on the pollfd
- sofia-sip is using recvmsg that fails as the error must be read first (and will never be read because POLLERR is not set in revents).
=> Work around: https://gerrit.osmocom.org/#/c/1985/
=> Implement select using poll: https://gerrit.osmocom.org/#/c/1987/ https://gerrit.osmocom.org/#/c/1988/
Need to decide what to merge (or all of it)
zecke wrote:
Okay. The issue is finally understood.
- sofia-sip sets IP_RECVERR on the UDP socket to get IMCP (and other) errors to more quickly fail a call
- By implementing poll using select we never set POLLERR on the pollfd
- sofia-sip is using recvmsg that fails as the error must be read first (and will never be read because POLLERR is not set in revents).
=> Work around: https://gerrit.osmocom.org/#/c/1985/
=> Implement select using poll: https://gerrit.osmocom.org/#/c/1987/ https://gerrit.osmocom.org/#/c/1988/
Need to decide what to merge (or all of it)
Thanks for getting to the bottom of this. I'll see if I can reproduce the originally bug using my test setup so that we can test the solution(s).
On 9 Mar 2017, at 18:34, 9muir [REDMINE] <redmine@lists.osmocom.org> wrote:
Issue #1934 has been updated by 9muir.
Need to decide what to merge (or all of it)
Thanks for getting to the bottom of this. I'll see if I can reproduce the originally bug using my test setup so that we can test the solution(s).
It should be enough to put "127.0.0.2" as remote PBX address or any IP that will trigger ICMP unreachable.
zecke wrote:
On 9 Mar 2017, at 18:34, 9muir [REDMINE] <redmine@lists.osmocom.org> wrote:
Issue #1934 has been updated by 9muir.
Need to decide what to merge (or all of it)
Thanks for getting to the bottom of this. I'll see if I can reproduce the originally bug using my test setup so that we can test the solution(s).
It should be enough to put "127.0.0.2" as remote PBX address or any IP that will trigger ICMP unreachable.
That sounds about right. From memory I usually encountered this problem in my dev environment when I was floundering around trying to get the initial config created. I assume from your statement that you too are able to easily reproduce the problem now then?
- Project changed from OsmoBSC to osmo-sip-connector
The fix itself is merged, the change from select->poll is not merged yet. I think we want to map select to ppoll. It would be nice if the pending branch could be tested too.
- Status changed from Feedback to Closed
Please re-open if you see this again.
Also available in: Atom
PDF