recent failures of HLR_Tests.ttcn for both master and latest
I'd like to ask you to look into sudden instability of the HLR_Tests.ttcn
test suite during the last five builds. Of course it can be a pure coincidence,
but it looks like nothing else was changed in the HLR (or its tests) beyond
the changes you made related to "subscriber create on demand".
The last 5 builds (since 491) we're seeing failures related to "timeout waiting
for VTY prompt" or "unexpected VTY response", see
Interestingly, the "latest" tests also start to fail around the same time,
hinting that it's not the HLR that has a regression, but some changs in the tests?
Thanks for looking into this.
- % Done changed from 0 to 10
I looked through the check IMEI test / create subscriber on demand test patches again, and did not notice anything that could cause other tests to fail. The new tests get executed after the existing tests. No existing code was modified, with the exception of Check IMEI related IEs.
With that being said, it is strange that the TTCN3 HLR tests were passing at least 25 times in a row, and then shortly after the new tests were added, random tests start failing. So maybe it is related and I'm overlooking something.
Here's an overview of what failed:
## master 491: TC_gsup_check_imei_invalid_len => (expected failure, because related fix was not merged yet) 492: TC_mo_sss_reject => g_Tguard timeout 493: - 494: TC_gsup_purge_cs => VTY timeout for prompt 495: TC_gsup_ul => VTY timeout for prompt 496: - ## latest 241: TC_gsup_purge_ps => VTY timeout for prompt 242: - 243: - 244: TC_gsup_purge_ps, TC_gsup_ul => VTY timeout for prompt (both) 245: -
I was not able to reproduce any of the failures locally, whenever I run the tests, all of them pass (with and without docker). Then I took an in-depth look at two recent ones, TC_gsup_ul and TC_gsup_purge_cs, and they ran into a 2s VTY timeout after sending these commands:
subscriber imsi 262420176541756 create
subscriber imsi 262428655547458 update msisdn 491613408534
There was nothing else in the TTCN3 logs, which hinted at why this was failing. I suspect, that OsmoHLR fails to notify the connected client in time ("Osmocom TTCN-3 GSUP Simulator") in hlr.c:osmo_hlr_subscriber_update_notify(). Unfortunately we don't have the logs of osmo-hlr in the failed test runs, so I'm not sure.
Here is a patch to add logging, in case this happens again (in the last jenkins run, all tests passed):