Project

General

Profile

Actions

Bug #3764

closed

libosmocore 1.0.x makes gbproxy_test of older osmo-sgsn fail

Added by laforge about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
libosmogb
Target version:
-
Start date:
01/22/2019
Due date:
% Done:

0%

Spec Reference:

Description

When building osmo-sgsn 1.3.0 against libosmocore-1.0.x I get issues like
https://build.opensuse.org/package/live_build_log/network:osmocom:latest/osmo-sgsn/Debian_9.0/x86_64

where "make check" suddenly fails with things like

[   80s] -==> got signal NS_UNBLOCK, NS-VC 0x2001/1.2.3.4:1111
[   80s] -
[   80s]  MESSAGE to BSS at 0x01020304:1111, msg length 1
[   80s]  07
[   80s]
[   80s] +==> got signal NS_UNBLOCK, NS-VC 0x2001/1.2.3.4:1111
[   80s] +
[   80s]  result (UNBLOCK) = 1

as the diff. So clearly some re-ordering happening here.

This appears when building the "latest" osmo-sgsn version (1.3.0) against a current libosmocore. As libosmocore remains API compatible, this is building fine, as expected.

Actions #1

Updated by laforge about 5 years ago

However, ther is commit·

commit 797558ea1768e464f9559c5f7a4f3f4285c5de25
Author: Stefan Sperling <ssperling@sysmocom.de>
Date:   Mon Nov 19 17:18:41 2018 +0100

    send NS_POUT_UNBLOCK_ACK before signalling S_NS_UNBLOCK

    In gprs_ns_process_msg(), we were dispatching the S_NS_UNBLOCK
    signal before sending out the NS_POUT_UNBLOCK_ACK message.

    Signal handlers might send messages to the other side, assuming
    that NS is now unblocked. However, since such messages will arrive
    before the UNBLOCK_ACK message the receiver might discard them.
    This problem has been observed with our TTCN3 BSSGP_Emulation
    as a peer to osmo-pcu.

    This patch makes TTCN3 PCU TC_paging() test pass regardless of
    whether the test or osmo-pcu is started first. Before this patch,
    this test would only pass if the test was started before osmo-pcu.
    A remaining problem is that the test does not yet keep passing
    reliably unless osmo-pcu is restarted between test runs.

    Change-Id: I3af54a14bb6bcfa167c9a9d9f67835e7f5b9f1bb
    Related: OS#2890
    Related: OS#2388

which is a fix, and is appreciated. However, it causes breakage, at least in unit tests that rely on testing the old behavior, such as the gbproxy test.

This immediately showed up, and one day later we have the following commit

commit eefb70df2c6aa7b1362d5e8decf52af3ff95ecb9
Author: Stefan Sperling <ssperling@sysmocom.de>
Date:   Tue Nov 20 11:33:17 2018 +0100

    update gbproxy test expected output

    libosmocore commit 797558ea1768e464f9559c5f7a4f3f4285c5de25
    changed the order of NS_UNBLOCK_ACK transmission dispatching
    of the NS_UNBLOCK signal. Update expected output of gbproxy
    tests accordingly to make these tests pass again.

    Change-Id: Ia3df811755b1c88cf7a27a466677b24a6c32fd8e
    Related: OS#2388

which is making sure the test passes from now onwards, but which of course doesn't do anything about older, already-released versions of osmo-sgsn, which will all fail to pass "make check" - forever. There may be people who want to rely on older versions of osmo-sgsn in the future, and they will think they have non-working software if "make check" doesn't pass :/

So what do we take from this? Be careful with changing existing library code semantics. Even if the function signature is not modified, there are existing applications and/or test cases that make assumptions. And we should not break those assumptions.

If something like this shows up ever again, I think the correct approach is to

  1. keep the old semantics as it is, making sure that old applications/tests will continue to work/pass, even on modern library versions or master of the day
  2. introduce a new function/symbol implementing the new semantics. Or, in this case (as it's some library-internal function dispatching events/signals), probably some kind of flag is needed. New code, depending on new library versions, may then set that flag and elicit the new behavior, while old code gets the old behavior.

Please let's make sure we don't fall into the same trap in the future again.

The process of avoiding this can be helped by having automatic build testing of old/previous applications against "master of the day" of the libraries. Let's track the latter in #3765.

Actions #2

Updated by laforge about 5 years ago

  • Status changed from In Progress to Closed

we have to close this ticket without being able to fix it. Let's focus on #3765 to prevent this from ever happening again!

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)