Project

General

Profile

Bug #2325

sporadic crash of osmo-bts-trx in osmo-gsm-tester runs

Added by neels 14 days ago. Updated 3 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
osmo-bts-trx
Target version:
-
Start date:
06/13/2017
Due date:
% Done:

0%

Spec Reference:

Description

The osmo-bts-trx process sometimes dies prematurely during osmo-gsm-tester runs using the Ettus B210.
The cause is not clear yet, no core file seems to be created.
For example, see http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/591/


Related issues

Related to OsmoGSMTester - Feature #2327: document NTP as cause for failing osmo-bts-trx runs in osmo-gsm-manuals? Rejected 06/14/2017
Related to OsmoBTS - Bug #2339: osmo-bts-trx uses non-monotonic system clock for frame number timer In Progress 06/24/2017

History

#1 Updated by neels 13 days ago

This happens quite frequently, apparently more than one out of ten runs

#2 Updated by neels 13 days ago

instances of this are seen in abovementioned run 591, again in 595, 597, 599.
Each run has two trx runs, so, roughly: out of 20 runs, four hit the "Process ended prematurely" for osmo-bts-trx.

#3 Updated by neels 13 days ago

  • Related to Bug #2321: osmo-gsm-tester: store properly coredump files when a process crashes added

#4 Updated by neels 13 days ago

  • Related to deleted (Bug #2321: osmo-gsm-tester: store properly coredump files when a process crashes)

#5 Updated by neels 13 days ago

The sporadic "crashes" are actually intentional shutdown by osmo-bts-trx:

20170614021016906 DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx

The end of the log shows "Shutdown timer expired", which has a three second expiry time.
About 30 logging lines above that, the shutdown reason is logged.

The "No clock" is logged about two seconds after the OML has successfully set up the BTS.
Immediately following that, the OML is torn down again, concluding in

Shutdown timer expired

I am not so sure about how to fix this, will have to ask osmo-trx guys.

#6 Updated by neels 13 days ago

  • Status changed from New to In Progress
  • Assignee set to neels

https://gerrit.osmocom.org/2909 <-- sets l1c logging level to notice

results:

20170614032014399 DRSL <0000> rsl.c:2333 (bts=0,trx=0,ts=0,ss=2) Fwd RLL msg EST_IND from LAPDm to A-bis
20170614032018533 DL1C <0006> scheduler_trx.c:1451 PC clock skew: elapsed uS 4136730
20170614032018533 DOML <0001> bts.c:208 Shutting down BTS 0, Reason No clock from osmo-trx
20170614032018533 DL1C <0006> scheduler.c:240 Exit scheduler for trx=0
20170614032018533 DL1C <0006> scheduler.c:216 Init scheduler for trx=0
20170614032018533 DOML <0001> oml.c:280 OC=RADIO-CARRIER INST=(00,00,ff) AVAIL STATE OK -> Off line
[...]
Shutdown timer expired

I asked on openbsc@: http://lists.osmocom.org/pipermail/openbsc/2017-June/010802.html

#7 Updated by neels 13 days ago

  • Related to Feature #2327: document NTP as cause for failing osmo-bts-trx runs in osmo-gsm-manuals? added

#8 Updated by neels 13 days ago

Vadim indicated NTP as possible cause for clock skews. I switched off the ntp service on the main unit now. It will come back upon reboot, so we may need to switch it off again. If this proves to be the cause of failures, we can uninstall ntp completely or something.

#9 Updated by laforge 3 days ago

  • Category set to osmo-bts-trx
FYI:
  • osmo-bts uses gettimeofday() to determine how much system time has expired between two FN indications from OSmoTRX
  • the maximum permitted value is 50*4.6ms, i.e. 50 TDMA frames, totalling to 230750, i.e. 230.75ms
  • the actual skew as reported here is 4.13 seconds

I assume that the -r or --realtime argument is used when starting osmo-bts-trx, at least http://jenkins.osmocom.org/jenkins/view/osmo-gsm-tester/job/osmo-gsm-tester_run/591/ indicates so.

#10 Updated by laforge 3 days ago

  • Related to Bug #2339: osmo-bts-trx uses non-monotonic system clock for frame number timer added

#11 Updated by laforge 3 days ago

Also available in: Atom PDF