2016-05-09

Consider, for a moment, the following scenarios associated with installing and running a desktop based application on your own computer:

a learner installing software for a distance education course: course materials are produced in advance of the course and may be written with a particular version of the software in mind, distributed as part of the course materials. Learners may have arbitrary O/S (various versions of Windows and OS/X), may be be working on work computers with aggressive IT enforced security policies, or may be working on shared/public computers. Some courses may require links between different applications (for example, a data analysis packages and a database system); in addition, some students may not be able to install any software on their own computer – how can we support them?

academic research environment: much academic software is difficult to install and may require an element of sysadmin skills, as well as a particular o/s and particular version so supporting libraries. Why should a digital humanities researcher who want to work with text analysis tools provided in a particular text analysis package also have to learn sys admin skills to install the software before they can use the functions that actually matter to them? Or consider a research group environment, where it’s important that research group members have access to the same software configuration but on their own machines.

data journalism environment: another twist on the research environment, data journalists may want to compartmentalise and preserve a particular analysis of a particular dataset, along with the tools associated with running those analyses, as “evidence”, in case the story they write on it is challenged in court. Or maybe they need to fire up a particular suite of interlinked tools for producing a particular story in quick time (from accessing the raw data for the first time to publishing the story within a few hours), making sure they work from a clean set up each time.

What we have here is a packaging problem. We also have a situation where the responsibility for installing a single copy of the application or linked applications is an individual user or small team working on an arbitrary platform with few, if any, sys admin skills.

So can Docker help?

A couple of recent posts on the Docker blog set out to explore what Docker is not.

The first – Containers are not VMs – argues that Docker “is not a virtualization technology, it’s an application delivery technology”. The post goes on:

In a VM-centered world, the unit of abstraction is a monolithic VM that stores not only application code, but often its stateful data. A VM takes everything that used to sit on a physical server and just packs it into a single binary so it can be moved around. But it is still the same thing. With containers the abstraction is the application; or more accurately a service that helps to make up the application.

With containers, typically many services (each represented as a single container) comprise an application. Applications are now able to be deconstructed into much smaller components which fundamentally changes the way they are managed in production.

So, how do you backup your container, you don’t. Your data doesn’t live in the container, it lives in a named volume that is shared between 1-N containers that you define. You backup the data volume, and forget about the container. Optimally your containers are completely stateless and immutable.

The key idea here is that with Docker we have a “something” (in the form of a self-contained container) that implements an application’s logic and publishes the application as a service, but isn’t really all that interested in preserving the state of, or any data associated with, the application. If you want to preserve data or state, you need to store it in a separate persistent data container, or alternative data storage service, that is linked to application containers that want to call on it.

The second post – There’s Application Virtualization and There’s Docker – suggests that “Docker is not application virtualization” in the sense of “put[ting] the application inside of a sandbox that includes the app and all its necessary DLLs. Or, … hosting the application on a server, and serving it up remotely…”, but I think I take issue with this in the way it can be misinterpreted as a generality.

The post explicitly considers such application virtualisation in the context of applications that are “monolithic in that they contain their own GUI (vs. a web app that is accessed via a browser)”, things like Microsoft Office or other “traditional” desktop based applications, for example.

But many of the applications I am interest in are ones that publish their user interface as a service, of sorts, over HTTP in the form of a browser based HTML API, or that are accessed via a the commandline. For these sorts of applications, I believe that Docker represents a powerful environment for personal, disposable, application virtualisation. For example, dedicated readers of this blog may already be aware of my demonstrations of how to:

run Jupyter notebooks, OpenRefine, RStudio, R Shiny apps, either on their own or as linked applications, on the desktop or in the cloud, and access them via an http UI;

run command-line apps in Docker containers and access the them from a host command line;

run a desktop app in a container and expose it via a browser using guacamole desktop virtualisation.

Via Paul Murrell, I also note this approach for defining a pipeline approach for running docker containers: An OpenAPI Pipeline for NZ Crime Data. Pipeline steps are defined in separate XML modules, and the whole pipeline defined in another XML file. For example, the module step OpenAPI/NZcrime/region-Glendowie.xml runs a specified R command in a Docker container fired up to execute just that command. The pipeline definition file identifies the component modules as nodes in some sort of execution graph, along with the edges connecting them as steps in the pipeline. The pipeline manager handles the execution of the steps in order and passes state between the step in one of several ways (for example, via a shared file or a passed variable). (Further work on the OpenAPI pipeline approach is described in An Improved Pipeline for CPI Data.)

What these examples show is that as well as providing devops provisioning support for scaleable applications, as well as an environment for effective testing and rapid development of applications, Docker containers may also have a role to play in “user-pulled” applications.

This is not so much thinking of Docker from an enterprise perspective as an environment that supports development and auto-scaled deployment of containerised applications and services, nor is it a view that a web hosting service might take of Docker images providing an appropriate packaging format for the self-service deployment of long-lived services, such as blogs or wiki applications, (a Docker hub and deployment system to rival cPanel, for example).

Instead, it views containers from a single user, desktop perspective, seeing Docker and its ilk as providing an environment that can support off-the-shelf, ready to run tools and applications, that can be run locally or in the cloud, individually or in concert with each other.

Next up in this series: a reflection on the possibilities of a “Digital Library Application Shelf”.

Show more