Ionelmc.ro

Dealing with Docker

2016-05-06

I'm giving Docker a shot again, after couple years, and this time I'm "dockerizing" an old Python
app that don't even have configuration management or orchestration. Docker is easy to install now. No more bullshit about
kernel features and filesystem support. They even claim it works seamlessly on OS X and Windows [*].

The good parts
*

Docker has several things going for it: a good name, huge community, huge amount of images you
can start from. Sure, most images are not great, even the official images can have issues but you can always take out the
Dockerfile and customize things.

Right now it's hard to find an alternative. CoreOS is trying to make an alternative called "rkt".
With a flikr-esque name at that! How do you even read it? ercati? erkit? rocket? rockit? rackit? reckit? You know what,
wreckit. If you think I'm lambasting the name unfairly then consider that most of the world don't speak very good English
and interpretable spelling is a problem.

Even if rkt didn't have such a terrible name, it still has a long way to reaching Docker's
convenience.

What irks me
*

Note

If it matters, I'm using Docker version 1.11.0. They call it the "Engine" now.

Docker has some strange oversights and quirks. There's a very strong focus on not breaking any interfaces, it's a bit depressing
to look at the bug tracker if you're the impatient type.

These things annoy me:

Documentation has no search. Seriously? I understand that there's Google but come on, I don't want to look at docs for old
Docker versions and other junk Google gives me.

Command line seem clumsy:

Inflexible parsing, eg: docker build . --help ain't valid. That makes no sense, there's only a single non-option argument,
why can't I have options after that single possible argument?

Most arguments have short form (eg: -i) but not --help. Nooo, not that. No one needs a short form for that, God
forbid!

Bad help for most options. Is this supposed to help?

Cause that sure as hell don't tell me anything about what I can pass in there. Now I have to go into the docs with no
search. After rummaging trough 10 useless pages of documentation I eventually I get to this:

What. Why can't that be in the command line?

If you got a run or build error in docker-compose you're usually left with half running containers and you have to
manually cleanup the mess. If you don't pay attention you're left wondering why docker-compose build doesn't do anything.
I did some changes, ran docker-compose build, why don't docker-compose up don't run new images? Cause some stuff is
already running, that's why!

If you stop a docker-compose up all services stop. If you stop a docker-compose up myservice it leaves all the
dependencies running around. Argh! Why the inconsistency?

There is no garbage collection. Leaves around huge piles of useless containers and images. Sure, there's
docker rmi $(docker images -q) and docker rm $(docker ps -a -q) but that's like cleaning your computer dust with
a water hose. It sure does the job but your computer probably don't work after that.

Dockerfiles are quirky and non-orthogonal. Similarly to command line interface, the Dockerfile syntax is a culmination
of ill-advised choices.

Support for multiline values is an afterthought. Dockerfiles are littered with "\" and "&&" all over the place.
Lots of noise, lots of mistakes.

There two ways to do the same thing. The parser takes JSON for most commands but if parse fails
then it takes whatever gunk is in there as a string.

For example the CMD and ENTRYPOINT take this to the extreme: they have two exec mode. Take a look at this
crazy table:

No ENTRYPOINT

No CMD

error, not allowed

There is clear overlap there. Maybe there are historical reasons for it but that's why we have version numbers. Why do we
need to keep all that cruft?

No volumes/mounts during docker build. I have found ways to deal with this (read on) but still, it's an irk.

If you look in the bug tracker you'll notice that lots of people have come up with ideas to improve Dockerfiles, but these
issues just don't seem important enough.

There's something rotten about how Docker run containers as root by default. Docker seems
to mitigate this by disabling some capabilities like ptrace by default. But this ain't purely a
security concern, it affects usability as well.

For example, once you install docker on your Ubuntu you're faced with a choice: use sudo all over the place or give
your user full access to the daemon:

Still not sure if that's better than a constant reminder that you're doing lots of stuff as root.

I know these might look like superficial complaints but they rile me up regardless. At least it's not this bad. </rant>

The ecosystem
*

There are a lot of images, and you'll probably find anything you'll ever want. And
there's even a set of specially branded images that pop up almost everywhere: the "official" images. They are pretty good: up
to date, they verify signatures for whatever they download, consistent presentation etc.

Though I have a problem with the Python images. For some reason they decided to just
compile Python themselves. Sure, it's the latest point-release but they don't include any of the patches the Debian
maintainers made. I don't like those patches either (they customize package install paths and the import system) but it's
worse without them:

Suddenly gdb stops working. Just try a docker run --rm -it python:2.7 sh:

Looks like there's some import path trampling going on. I don't want broken debug tools when my app is broken.

You can't use any Python package from the APT repos. Sure, most of them are old and easy to install with pip, but there are
exceptions [2].

Strange C.UTF-8 locale. I can understand they don't want to put a specific language in there but if you run any locale
using applications you'll run into issues.

What I ended up was using the ubuntu:xenial image (Xenial being the new LTS). It ships
latest point release for 2.7 so why compile it again? I took the good parts from python:2.7-slim and I got this
Dockerfile:

Everything works great, and there's some magic in those debug packages that make gdb give me some real nice commands
like py-bt [3]. Note that I snuck in some other tools to help debugging.

All I'm saying there is that even official images need scrutiny. Check their Dockerfile and decide if it really fits your
needs.

The challenges
*

Docker has some interesting challenges, or limitations. The build system has something called "layers" and there's a hard limit of
how many layers you can have. Each command in your Dockerfile makes a new layer. If you look at the "official" best practices guide you'll see most of the stuff there
revolves around this limitation. You inevitably end up with some damn ugly RUN commands.

There's a good thing about these layers - they are cached. However, the context is not. Never one layer needs all the context,
or the same part of the context. Layers should be able to have individual context, but alas, docker build wasn't designed
with that in mind.

Another limitation by design is that docker build doesn't allow any mounts or volumes during build. The only way to get
stuff in the container that eventually becomes the image is by network or by the "context".

What's this context?
*

When you run docker build foobar Docker will make an archive of foobar/* (minus what you have in .dockerignore) and
build an image according to what you have in foobar/Dockerfile. You can specify the context path and Dockerfile path
individually but, never too odd, that Dockerfile must be in the context. You can't get creative here.

Optimizing the build process
*

You can parametrize this build process but the lack of mounts or volumes exposes you to some pretty annoying slowness if you
have to build external packages for example. This problem is still pervasive in Python, most of the stuff in PyPI is just source packages. Even if now you can publish Linux binaries on PyPI it's still years till most packages will publish those manylinux1
wheels [1]. Even if we'd have wheels for everything there's still the question of network slowness. Setting up caching
proxies is inconvenient.

Most Dockerfiles I've seen have something like this:

Now for simple projects this is fine, because you only have a handful of dependencies. But for a larger projects, hundreds of
dependencies is order of the day. Changing them or upgrading versions (as you should always pin versions [4]) will introduce
serious delays in build times. Because the container running the build process is pretty insulated (no volumes or mount
remember?) pip can't really cache anything.

Staging the build process
*

A way to solve this is having a "builder image" that you run to build wheels for all your dependencies. When you run an
image you can use volumes and mounts.

Before jumping in, lets look briefly at file layout. I like to have a docker directory and then another level for each
kind of image. Quite similar to the layout the builder for official images have. And no weird filenames, just Dockerfile
everywhere:

In this scenario we'd deploy two images: web and worker. The inheritance chain would look like this:

buildpack-deps:xenial ➜ builder

ubuntu:xenial ➜ deps ➜ base ➜ web

ubuntu:xenial ➜ deps ➜ base ➜ worker

In which:

builder has development libraries, compilers and other stuff we don't want in production.

deps only has python and the dependencies installed.

base has the source code installed.

web and worker have specific customizations (like installing Nginx or different settings).

And in .dockerignore we'd have:

This layout might seem artificial but not quite:

Both the worker and web need the same source code.

The deps and base are not in the same image because their contexts are distinct: one needs a bunch of wheels and
the other one only needs the sources. This setup allows us to skip building the deps image if the requirement files did
not change.

The web and worker images do not need to have the source code in the context. This allows faster build times. For
development purposes we can just mount the sourcecode. More about that later.

In builder/Dockerfile there would be:

The interesting part here is the USER, UID and GID build arguments. Unless you do something special the processes
inside the container runs with root user. This is fine, right? That's the whole point of using a container, processes in
the container actually have all sort of limitations - so it don't matter what user runs inside. However, if you mount
something from the host inside the container then the owner any new file inside that mount is going to be the same user that
the container runs with. The result is that you're going to get a bunch of files owned by root in the host. Not nice.

Because I don't do development with a root account and because user namespaces are surprisingly inconvenient to use [6] I have resorted to
recreating my user inside the container. It needs to have the exact uid and git, otherwise I get files owned by
an account that don't exist.

Similarly to what was shown before, deps/Dockerfile would have:

And base/Dockerfile:

For <tt style='background:#F7F7F5; border-