Coreos.com

What Kubernetes users should know about the rkt container engine

2016-12-13

One year of rkt 1.x: pods, Kubernetes, and OCI

Since the release of rkt 1.0 at the beginning of this year, the project has powered ahead with over 20 new
stable versions on a regular release cycle. The goal of rkt has always been to provide a container engine that is not only reliable but also composable
and standards-driven, allowing easy operation and integration with other best-in-class tools in the container ecosystem. Today we wanted to provide
an update on the ongoing work to integrate rkt with two such projects - the Kubernetes cluster orchestration system, and the Open Container Initiative
(OCI) container standards - and chart the course for rkt's future in the year ahead.

rkt and pods: why is rkt’s design different?

From the start rkt was built as a pod native container engine. This means that the basic unit of execution is a pod, linking together resources and
user applications in a self-contained environment. rkt's pods naturally follow the same pod concept
popularised by Kubernetes.

To provide a smooth experience, rkt takes care to set up application context in a way that resembles as much as possible the usual Linux
environment. Because of this, and the facilities that a pod-native container engine provides, an application developer doesn’t have to worry about:

Bundling multiple applications in a single container image

Supervising and orchestrating additional helper processes

Running apps as PID1 with unexpected signal semantics and children duties

At the same time, rkt offers a comfortable tool for SREs and systems administrators by supporting daily operational needs:

Logging, cgroups, and service integration with existing tools (eg. runit or systemd)

Isolating unrelated pods and workloads, segregating them as decoupled Linux processes

Customizable, modular networking configuration, decoupled from the container runtime

The technical implementation of rkt is even internally aligned with this concept, as the engine harnesses modular components, like the
Container Network Interface (CNI), systemd
for service management, and systemd-journald for logging. These modules are independently developed by dedicated communities to handle
specific tasks.

A kubelet-first runtime with CRI

Kubernetes is a cluster orchestration system, and upstream Kubernetes is a central component of CoreOS Tectonic. Kubernetes runs all
applications in pods of containers, and it achieves this by delegating runtime tasks to a container engine. As Kubernetes has matured, users have
requested the ability to use different execution engines like rkt or Hyper in a Kubernetes cluster. Earlier this
year we introduced the initial version of rktnetes, a
project to add support for rkt as the first alternative container engine. This process involved a considerable amount of work to make the Kubernetes
Kubelet codebase less Docker-specific, removing assumptions and special cases from the source to form a truly modular abstraction.

This work combined with community effort and discussion led to the creation of the Container Runtime Interface (CRI), an API specification for
low-level interaction between the kubelet and container runtimes. For more details on the history of CRI, see the
original proposal.
For more about what this means for rkt and Kubernetes, read on.

Introducing pod sandboxes

The introduction of the CRI into Kubernetes brings interesting new possibilities for pods, with granular control over the lifecycle of individual
applications. This increased flexibility enables a variety of new use cases, like updating single applications within running pods, or dynamically
injecting debugging capabilities. Notably, it implies that pods are now mutable, and that empty pods can both be created, and continue to exist after
their applications exit. In CRI terms, this concept is called a “pod sandbox”.

Rkt has already introduced support for pod sandboxes. A new, experimental subcommand (currently called rkt app and enabled by an
RKT_EXPERIMENT_APP environment variable) allows the creation and manipulation of mutable pod sandboxes. It can start a new empty
environment, and then allow users to add, start, stop, and remove applications within a running pod sandbox. This was first introduced in
rkt 1.19.0 and it is currently on its way to stabilization, with more documentation following soon.

Interactivity and attach functionality

Multiple applications are typically run side-by-side in a pod, with each application’s output and error stream (stdout/stderr) used for logging
purposes. Historically, rkt has taken care of multiplexing the pod’s I/O to the outside world by using systemd-journald. Because of this, there was
only limited support for attaching to applications directly, or redirecting their I/O.

The Kubernetes CRI allows for more sophisticated scenarios, like piping input to applications and attaching to running processes. To satisfy these
requirements we contributed streaming support to systemd itself and are in the process of adding
the following selectable I/O modes to rkt:

interactive: the application runs under the TTY of the invoking parent process, i.e. an interactive user terminal. Being limited to at most one app per pod it allows the user to interact with the running container.

TTY: the application runs with a newly allocated TTY, with full terminal capabilities. This allows to attach to already running applications.

streaming: the application’s output or input is supervised by a separate multiplexer running in the pod context. This allows for
attaching/detaching/piping, even without a dedicated TTY. The TTY demystified post touches into
terminal-related topics with more details.

logging: the application’s output is supervised by a separate logging process running in the pod, and its output lines are processed as individual
log entries. This is the original default mode for applications which don’t require interactivity.

null: the application stream will simply be closed, and any output discarded.

These modes are configurable per application, and individually for each app’s stdin/stdout/stderr streams, to offer the most flexibility. Look for this
new feature in an upcoming rkt release in the near future.

An OCI-first runtime

At CoreOS we believe properly designed and employed standards are key to unlocking the power of open source. Software standards mean
developers and teams can write software tools to compose and interoperate in predictable, consistent ways, without being beholden to particular
implementations.

The Open Container Initiative (OCI) is a Linux foundation effort to create a truly portable software container. In the last twelve months we have seen
the two key OCI specifications, "image-spec" and "runtime-spec", march towards their important 1.0 releases.

OCI image spec: the new standard image format

Pursuing our commitment to open specifications, rkt is currently transitioning its internal architecture to the new OCI standard - starting with the
OCI Image Specification. rkt developers are ramping up efforts for native support of OCI images,
including fetching, storing, and running. The upstream roadmap details
technical adjustments that will happen in the following weeks, and a tracker project maps the ongoing effort status.

However, as we work through this transition internally, users can rest assured that our compatibility guarantees will still apply. Moving forward, we
recommend the building and usage of images, but the rkt 1.x series will continue to fully support the retrieval and execution of ACIs the same as today.

The overarching goal is to remove internal ACI translation and embrace OCI natively as soon as the
image-spec is finalized and the format stable enough for production usages.

OCI runtime spec: executing Linux containers

In parallel to the image format, OCI is developing the so-called runtime-spec for describing the
runtime execution environment that container engines should provide. This specification is being developed in close tandem with
runc, a shared community effort at creating a reference implementation of the specification.

To provide the best possible support for the OCI runtime specification, rkt is gaining better integration with
runc as its internal application executor. This is made possible by rkt’s modular architecture, which allows
runc to be integrated as an alternative stage1 environment. This architecture allows the new implementation to be developed and utilized without any
disruptive impact on the users, but it will allow rkt to reduce feature duplication and be better aligned with the rest of the ecosystem on its OCI journey.

rkt is home to innovative thinking about how a container engine works in the world of orchestrated clusters. From the CNI abstraction for making
network interaction modular, to the CRI formalizing the way clusters interact with the container engine on each member node to run applications
securely, simply, scalable, and reliably, rkt has been a center of cutting-edge code and a source of productive discussion with the wider community.
Join us today in improving both Kubernetes and the standards that define the containers and pods that package software today. If you’re new to rkt,
check out these introductory rkt videos on our blog. If you’re a veteran container cluster admin or
developer, take a look at the rkt documentation to start experimenting, or clone the
rkt repository on github and start hacking.