Project

General

Profile

Actions

Bug #6299

closed

logging: segmentation fault in _output_buf()

Added by fixeria 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
12/10/2023
Due date:
% Done:

100%

Spec Reference:

Description

ttcn3-bts-test is broken for a few days. The Console Output indicates that something is wrong with the virtphy container:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/2240/console

Error response from daemon: Cannot kill container: jenkins-ttcn3-bts-test-2240-virtphy: No such container: jenkins-ttcn3-bts-test-2240-virtphy

The artifacts contain a core file:

https://jenkins.osmocom.org/jenkins/view/TTCN3/job/ttcn3-bts-test/2240/artifact/logs/virtphy/

I can reproduce the problem locally, virtphy crashes immediately after being started:

(gdb) r
Starting program: /home/fixeria/projects/osmocom/bb/src/host/virt_phy/src/virtphy 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7d31468 in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7d31468 in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff7d0be16 in snprintf () from /usr/lib/libc.so.6
#2  0x00007ffff7d799c7 in ?? () from /usr/lib/libc.so.6
#3  0x00007ffff7d79b2f in ctime_r () from /usr/lib/libc.so.6
#4  0x00007ffff7f67037 in _output_buf (buf=buf@entry=0x7fffffffcc90 "", buf_len=buf_len@entry=4096, target=target@entry=0x555555588140, subsys=subsys@entry=2, level=level@entry=3, file=file@entry=0x5555555643ce "virtphy.c", 
    line=248, cont=0, format=0x5555555643e0 "Virtual physical layer starting up...\n", ap=0x7fffffffdd00) at ../../../../src/libosmocore/src/core/logging.c:520
#5  0x00007ffff7f675de in _output (target=target@entry=0x555555588140, subsys=subsys@entry=2, level=level@entry=3, file=file@entry=0x5555555643ce "virtphy.c", line=line@entry=248, cont=cont@entry=0, 
    format=0x5555555643e0 "Virtual physical layer starting up...\n", ap=0x7fffffffdd00) at ../../../../src/libosmocore/src/core/logging.c:606
#6  0x00007ffff7f678ee in osmo_vlogp (subsys=<optimized out>, level=3, file=0x5555555643ce "virtphy.c", line=248, cont=0, format=0x5555555643e0 "Virtual physical layer starting up...\n", ap=0x7fffffffdd70)
    at ../../../../src/libosmocore/src/core/logging.c:698
#7  0x00007ffff7f67adf in logp2 (subsys=<optimized out>, level=<optimized out>, file=<optimized out>, line=<optimized out>, cont=<optimized out>, format=<optimized out>) at ../../../../src/libosmocore/src/core/logging.c:734
#8  0x0000555555557bd8 in main (argc=1, argv=0x7fffffffdf78) at virtphy.c:248

Related issues

Related to Core testing infrastructure - Bug #6301: All hopping related tests of ttcn3-bts-test fail with "N/A"Rejectedfixeria12/11/2023

Actions
Actions #1

Updated by fixeria 5 months ago

  • Status changed from New to In Progress
  • Priority changed from Normal to High

It's a regression in libosmocore, reverting this patch makes the problem go away:

commit d71331bc20dd69ed092240c079bf2ac07463675e (origin/neels/log, gerrit/neels/log)
Author: Neels Hofmeyr <nhofmeyr@sysmocom.de>
Date:   Mon Dec 4 07:51:15 2023 +0100

    logging: fix nul octets in log output / use osmo_strbuf

Looks like only specific logging configurations are being affected, the problem also goes away if I:

  • either comment out log_set_print_timestamp(osmo_stderr_target, 1),
  • or comment out log_set_use_color(osmo_stderr_target, 0).

BTW, Coverity CID#336550 "Explicit null dereferenced" is exactly about a problem in _output_buf().

Actions #2

Updated by fixeria 5 months ago

  • Project changed from OsmocomBB to libosmocore
  • Subject changed from virtphy: segmentation fault in ms_log_init() to logging: segmentation fault in _output_buf()
  • Category deleted (virtphy)

Any other application using the logging API would also crash with this logging configuration:

log stderr
 logging filter all 1
 logging timestamp 1
 logging color 1

Moving this ticket to libosmocore.

Actions #3

Updated by fixeria 5 months ago

  • % Done changed from 0 to 80

The problem is that we're indeed passing NULL to ctime_r(), here is a patch:

https://gerrit.osmocom.org/c/libosmocore/+/35309 logging: fix NULL pointer dereference in _output_buf()

I also noticed that OSMO_STRBUF_REMAIN macro is:

  • safe against passing a struct osmo_strbuf with .buf being NULL, but
  • not safe against passing a struct osmo_strbuf with .pos being NULL.

So the following code:

char buf[256];
struct osmo_strbuf sb = { .buf = &buf[0], .len = sizeof(buf) };
printf("Remaining: %zu\n", OSMO_STRBUF_REMAIN(sb));

gives me the following output:

Remaining: 140733300489524

I will also propose a patch while on it.

Actions #4

Updated by fixeria 5 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 80 to 100

The fix has been merged by Neels:

https://cgit.osmocom.org/libosmocore/commit/?id=9c603e64bf553ac282af8534a2d7592f719d092f

fixeria wrote in #note-3:

I also noticed that OSMO_STRBUF_REMAIN macro is:

  • safe against passing a struct osmo_strbuf with .buf being NULL, but
  • not safe against passing a struct osmo_strbuf with .pos being NULL.

https://gerrit.osmocom.org/c/libosmocore/+/35312 utils: fix OSMO_STRBUF_REMAIN to handle sb.pos == NULL correctly
https://gerrit.osmocom.org/c/libosmocore/+/35313 utils: improve readability of OSMO_STRBUF_CHAR_COUNT

I expect ttcn3-bts-test to recover tomorrow, marking as resolved.

Actions #5

Updated by fixeria 5 months ago

  • Related to Bug #6301: All hopping related tests of ttcn3-bts-test fail with "N/A" added
Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)