Bug #3764
closedlibosmocore 1.0.x makes gbproxy_test of older osmo-sgsn fail
0%
Description
When building osmo-sgsn 1.3.0 against libosmocore-1.0.x I get issues like
https://build.opensuse.org/package/live_build_log/network:osmocom:latest/osmo-sgsn/Debian_9.0/x86_64
where "make check" suddenly fails with things like
[ 80s] -==> got signal NS_UNBLOCK, NS-VC 0x2001/1.2.3.4:1111 [ 80s] - [ 80s] MESSAGE to BSS at 0x01020304:1111, msg length 1 [ 80s] 07 [ 80s] [ 80s] +==> got signal NS_UNBLOCK, NS-VC 0x2001/1.2.3.4:1111 [ 80s] + [ 80s] result (UNBLOCK) = 1
as the diff. So clearly some re-ordering happening here.
This appears when building the "latest" osmo-sgsn version (1.3.0) against a current libosmocore. As libosmocore remains API compatible, this is building fine, as expected.
Updated by laforge over 5 years ago
However, ther is commit·
commit 797558ea1768e464f9559c5f7a4f3f4285c5de25 Author: Stefan Sperling <ssperling@sysmocom.de> Date: Mon Nov 19 17:18:41 2018 +0100 send NS_POUT_UNBLOCK_ACK before signalling S_NS_UNBLOCK In gprs_ns_process_msg(), we were dispatching the S_NS_UNBLOCK signal before sending out the NS_POUT_UNBLOCK_ACK message. Signal handlers might send messages to the other side, assuming that NS is now unblocked. However, since such messages will arrive before the UNBLOCK_ACK message the receiver might discard them. This problem has been observed with our TTCN3 BSSGP_Emulation as a peer to osmo-pcu. This patch makes TTCN3 PCU TC_paging() test pass regardless of whether the test or osmo-pcu is started first. Before this patch, this test would only pass if the test was started before osmo-pcu. A remaining problem is that the test does not yet keep passing reliably unless osmo-pcu is restarted between test runs. Change-Id: I3af54a14bb6bcfa167c9a9d9f67835e7f5b9f1bb Related: OS#2890 Related: OS#2388
which is a fix, and is appreciated. However, it causes breakage, at least in unit tests that rely on testing the old behavior, such as the gbproxy test.
This immediately showed up, and one day later we have the following commit
commit eefb70df2c6aa7b1362d5e8decf52af3ff95ecb9 Author: Stefan Sperling <ssperling@sysmocom.de> Date: Tue Nov 20 11:33:17 2018 +0100 update gbproxy test expected output libosmocore commit 797558ea1768e464f9559c5f7a4f3f4285c5de25 changed the order of NS_UNBLOCK_ACK transmission dispatching of the NS_UNBLOCK signal. Update expected output of gbproxy tests accordingly to make these tests pass again. Change-Id: Ia3df811755b1c88cf7a27a466677b24a6c32fd8e Related: OS#2388
which is making sure the test passes from now onwards, but which of course doesn't do anything about older, already-released versions of osmo-sgsn, which will all fail to pass "make check" - forever. There may be people who want to rely on older versions of osmo-sgsn in the future, and they will think they have non-working software if "make check" doesn't pass :/
So what do we take from this? Be careful with changing existing library code semantics. Even if the function signature is not modified, there are existing applications and/or test cases that make assumptions. And we should not break those assumptions.
If something like this shows up ever again, I think the correct approach is to
- keep the old semantics as it is, making sure that old applications/tests will continue to work/pass, even on modern library versions or master of the day
- introduce a new function/symbol implementing the new semantics. Or, in this case (as it's some library-internal function dispatching events/signals), probably some kind of flag is needed. New code, depending on new library versions, may then set that flag and elicit the new behavior, while old code gets the old behavior.
Please let's make sure we don't fall into the same trap in the future again.
The process of avoiding this can be helped by having automatic build testing of old/previous applications against "master of the day" of the libraries. Let's track the latter in #3765.
Updated by laforge over 5 years ago
- Status changed from In Progress to Closed
we have to close this ticket without being able to fix it. Let's focus on #3765 to prevent this from ever happening again!