Bug #4379
closedttcn3-ggsn-test(-latest) sporadic all-test failures
100%
Description
See for instance ttcn3-ggsn-test-latest run 467 or ttcn3-ggsn-test 928.
All tests fail with:
Verdict: fail reason: VTY Timeout for prompt
That's because osmo-ggsn is not running. It happens because osmo-ggsn is not started (there's no ggsn.log created) by the osmo-ggsn-{master,latest} docker container. It doesn't start osmo-ggsn because the docker container fail during setup:
+ docker run --cap-add=NET_ADMIN --device /dev/net/tun:/dev/net/tun --sysctl net.ipv6.conf.all.disable_ipv6=0 --rm --network ttcn3-ggsn-test --ip 172.18.3.201 -v /home/osmocom-build/jenkins/workspace/ttcn3-ggsn-test-latest/logs/ggsn:/data --name jenkins-ttcn3-ggsn-test-latest-467-ggsn -d osmocom-build/osmo-ggsn-latest /bin/sh -c osmo-ggsn -c /data/osmo-ggsn.cfg >/data/osmo-ggsn.log 2>&1 4c6702e4fed587d1c44687f7cb247d872ed4e0f5e0b37cf151b011d74367f482 docker: Error response from daemon: linux runtime spec devices: error gathering device information while adding custom device "/dev/net/tun": no such file or directory.
The error line doesn't appear when tests run fine. So docker container is really failing to start.
I could find this same error here: https://drone.spritsail.io/spritsail/iodine/4/1/5
The error line seems to suggest that /dev/net/tun doesn't exist in the host running docker, which is a bit strange because some times it does have it (tests pass) and sometimes doesn't.
The issue seems to be that when tests succeed, they are run on:
Building remotely on build2-deb9build-ansible (ttcn3 obs osmo-gsm-tester-build osmocom-gerrit-debian9 osmocom-master-debian9 coverity) in workspace /home/osmocom-build/jenkins/workspace/ttcn3-ggsn-test-latest
But when they fail, they are run on:
Building remotely on admin2-deb9build (ttcn3 obs osmo-gsm-tester-build osmocom-gerrit-debian9 osmocom-master-debian9 coverity) in workspace /home/osmocom-build/jenkins/workspace/ttcn3-ggsn-test-latest
So "build2-deb9build-ansible" vs "admin2-deb9build" . I'd assume admin2-deb9build is missing the tun kernel module (installation or modprobe).
Updated by pespin about 4 years ago
The host does have the run module loaded, but no /dev/net exists...
root@admin2-deb9build:~# lsmod | grep tun tun 28672 2 root@admin2-deb9build:~# ls /dev/net ls: cannot access '/dev/net': No such file or directory
Updated by pespin about 4 years ago
- Assignee changed from pespin to roh
I tried removing the module and loading it again, but it looks like a kernel upgrade took place or new kernel modules are not available?
# modprobe -r tun modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.9.0-9-amd64/modules.dep.bin' root@admin2-deb9build:/dev# uname -a Linux admin2-deb9build 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u2 (2019-05-13) x86_64 GNU/Linux root@admin2-deb9build:/dev# ls /lib/modules/4.9.0- 4.9.0-5-amd64/ 4.9.0-8-amd64/
Assigning the ticket to roh since he usually takes care of upgrading the systems and similar stuff.
Updated by roh about 4 years ago
this is a lxc containers, so module loading inside the vm doesn't happen.
the module is loaded for the host, and should be available to the vm as well.
suggests we can add some config to make it available.
Updated by roh about 4 years ago
- Status changed from New to Feedback
- Assignee changed from roh to pespin
- % Done changed from 0 to 50
added to the lxc vm config:
lxc.mount.entry = /dev/net/tun /var/lib/lxc/osmo-build-debian9/rootfs/dev/net/tun none bind,create=file
and the device node seems there now.
please test if it also is working/accessible or if we need to do something about permissions as well. if its working this can be closed
Updated by pespin about 4 years ago
- Status changed from Feedback to Resolved
- % Done changed from 50 to 100