Project

General

Profile

Actions

Bug #4060

closed

TTCN3 test run failures on jenkins:

Added by laforge almost 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
06/13/2019
Due date:
% Done:

60%

Spec Reference:

Description

We quite frequently see tests fail like this:

+ collect_logs
+ fix_perms
+ docker_images_require debian-stretch-build
+ local from_line
+ local pull_arg
+ [ -z  ]
+ pull_arg=--pull
+ grep ^FROM ../debian-stretch-build/Dockerfile
+ from_line=FROM debian:stretch
+ echo FROM debian:stretch
+ grep -q $USER
+ echo Building image: debian-stretch-build (export NO_DOCKER_IMAGE_BUILD=1 to prevent this)
Building image: debian-stretch-build (export NO_DOCKER_IMAGE_BUILD=1 to prevent this)
+ PULL=--pull make -C ../debian-stretch-build
make: Entering directory '/home/osmocom-build/jenkins/workspace/ttcn3-bscnat-test/debian-stretch-build'
docker build --build-arg USER=osmocom-build --build-arg OSMO_TTCN3_BRANCH=master \
    --build-arg OSMO_BSC_BRANCH=master \
    --build-arg OSMO_BTS_BRANCH=master \
    --build-arg OSMO_GGSN_BRANCH=master \
    --build-arg OSMO_HLR_BRANCH=master \
    --build-arg OSMO_IUH_BRANCH=master \
    --build-arg OSMO_MGW_BRANCH=master \
    --build-arg OSMO_MSC_BRANCH=master \
    --build-arg OSMO_NITB_BRANCH=master \
    --build-arg OSMO_PCU_BRANCH=master \
    --build-arg OSMO_SGSN_BRANCH=master \
    --build-arg OSMO_SIP_BRANCH=master \
    --build-arg OSMO_STP_BRANCH=master \
    --pull -t docker.io/osmocom-build/debian-stretch-build:latest .
Sending build context to Docker daemon  4.608kB

Step 1/3 : FROM debian:stretch
Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on 192.168.111.1:53: read udp 192.168.111.6:42724->192.168.111.1:53: i/o timeout
../make/Makefile:56: recipe for target 'docker-build' failed
make: *** [docker-build] Error 1
make: Leaving directory '/home/osmocom-build/jenkins/workspace/ttcn3-bscnat-test/debian-stretch-build'
+ exit 1
Build step 'Execute shell' marked build as failure
Recording test results
Sending e-mails to: laforge@gnumonks.org
Archiving artifacts
Finished: FAILURE
The odd parts about this are:
  • why is "192.168.111.1:53" used as DNS server despite the underlying operating system using "real" DNS server IP addresses (213.133.98.98, 213.133.98.99, 213.133.98.100)? Even if I manually start a container with "docker run --rm -it busybox", its /etc/resolv.conf are set "correct", i.e. don't show any 192.168.111.1 IP
  • why are we rebuilding that container image during the "collect logs" step? This means that we have invested significant time to execute an entire test suite, and then throw away all those results just because a reandom image for collecting log files hasn't been up to date.
Actions #1

Updated by laforge almost 5 years ago

I also cannot find any iptables nat rules or the like which would explain this 192.168.111.1.

Actions #2

Updated by laforge almost 5 years ago

ok, 192.168.111.1 is the IP address of the host operating system on the lxcbr0 device:

3: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:16:3e:00:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.112.1/24 scope global lxcbr0
       valid_lft forever preferred_lft forever

We're runnnig docker inside a debian9 lxc. So it seems actually lxc might be to blame for this. I will dig further.

Actions #3

Updated by laforge almost 5 years ago

  • Status changed from New to In Progress

Ok, so I was mistaken. I was looking at build slaves on admin2, whereas the failures were on build2. And indeed, the /etc/resolv.conf inside the lxc jails on build2 listed only 192.168.111.1 as DNS server. I'm fixing this now and checking other buildhosts.

Actions #4

Updated by laforge almost 5 years ago

  • % Done changed from 0 to 30

https://gerrit.osmocom.org/c/docker-playground/+/14433 should at least not make this problem appear again in the final stage during fix_perms/collect_logs.

Actions #5

Updated by laforge almost 5 years ago

  • % Done changed from 30 to 60

I've now ensured that /etc/resolv.conf contains the "real" name server IP addresses on all our current build slaves.

Actions #6

Updated by laforge over 4 years ago

  • Status changed from In Progress to Resolved

we haven't seen any DNS related failures for ~ 3 weeks now - yay!

Actions

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)