Project

General

Profile

Bug #4839

docker.io sometimes returns EOF, breaking our builds

Added by laforge about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
-
Start date:
11/02/2020
Due date:
% Done:

100%

Spec Reference:

Description

We have plenty of situations where docker.io seemingly returns EOF (i.e. nothing) when pulling a base image like debian:stretch. The failure to pull will cause our jenkins job (e.g. a TTCN3 test) to fail, despite no failure on our side.

This has appeared even before docker introduced rate limiting today, so it is unrelated to that.


Related issues

Related to Core testing infrastructure - Feature #4840: migrate osmo-gsm-tester docker images to registry.osmocom.orgNew11/02/2020

Related to osmo-gbproxy - Bug #4850: ttcn3-gbproxy-test* are not generated by jenkins-job-builderResolved11/05/2020

History

#1 Updated by laforge about 1 month ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 10

I tried to use our docker/registry instance at registry.sysmocom.de as a 'pull-throug cache' as documented at https://docs.docker.com/registry/recipes/mirror/

This is broken, it is a known bug in docker since 2016, see https://github.com/docker/distribution/issues/1486 and many other reports like https://www.reddit.com/r/docker/comments/bek6yv/how_do_you_do_registrymirror_with_auth/

So what we are moving towards is a setup where:
  • one jenkins job does a daily pull of all our base images from docker.io, and pushes them to the private registry
  • our jenkins jobs will then always pull directly from that private registry instead of the public one

If the pull from docker.io then fails occasionally, it will fail that re-sync jenkins job, but the (ttcn3 and other) jobs that verify osmocom software will not fail, and simply use the 1..N days old base image.

#2 Updated by laforge about 1 month ago

https://gerrit.osmocom.org/c/docker-playground/+/21019 prepares our Dockerfiles with a way to override the registry when building images.

#3 Updated by laforge about 1 month ago

  • Status changed from In Progress to Resolved
  • % Done changed from 10 to 100

Related patches all merged, hopefully those problems are now gone.

I've manually verified that the registry-update-base-images job works, and also executed ttcn3-stp-test once to see if it actually pulls from registry.osmocom.org now.

#4 Updated by laforge about 1 month ago

  • Related to Feature #4840: migrate osmo-gsm-tester docker images to registry.osmocom.org added

#5 Updated by laforge about 1 month ago

And of course, on day 1 of this new mechansim, we see:

  • the docker image update job failing:
    [registry-update-base-images] $ /bin/sh -xe /tmp/jenkins5987388568045535390.sh
    + REGISTRY=registry.osmocom.org
    + IMAGES=debian:stretch debian:buster debian:jessie debian:sid ubuntu:zesty centos:centos8
    + src=debian:stretch
    + dst=registry.osmocom.org/debian:stretch
    + echo
    
    + echo ======= debian:stretch
    ======= debian:stretch
    + docker pull debian:stretch
    Error response from daemon: Get https://registry-1.docker.io/v2/library/debian/manifests/stretch: EOF
    Build step 'Execute shell' marked build as failure
    

while all other builds succeed, using base images from registry.osmocom.org.

yay.

#6 Updated by laforge 28 days ago

  • Related to Bug #4850: ttcn3-gbproxy-test* are not generated by jenkins-job-builder added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 48.8 MB)