Perlsphere.net

Ansible: A Primer

Ansible is a very pragmatic and powerful configuration management system that
is easy to get started with.

Connections and Inventory

Ansible is typically used to connect to one or more remote hosts
via ssh and bring them into a desired state. The connection method is
pluggable: other methods include local, which simply invokes the commands on
the local host instead, and docker, which connects through the Docker daemon
to configure a running container.

To tell Ansible where and how to connect, you write an inventory file,
called hosts by default. In the inventory file, you can define hosts and
groups of hosts, and also set variables that control how to connect to them.

(In versions prior to Ansible 2.0, you have to use ansible_ssh_user instead
of ansible_user). See the introduction to inventory
files for more
information.

To test the connection, you can use the ping module on the command line:

Let's break the command line down into its components: -i myinventory
tells Ansible to use the myinventory file as inventory. web tells
Ansible which hosts to work on. It can be a group, as in this example, or a
single host, or several such things separated by a colon. For example,
www01.example.com:database would select one of the web servers and all of
the database servers. Finally, -m ping tells Ansible which module to
execute. ping is probably the simplest module, it simply sends the
response "pong" and that the remote host hasn't changed.

These commands run in parallel on the different hosts, so the order in which
these responses are printed can vary.

If there is a problem with connecting to a host, add the option -vvv to get
more output.

Ansible implicitly gives you the group all which -- you guessed it --
contains all the hosts configured in the inventory file.

Modules

Whenever you want to do something on a host through Ansible, you invoke a
module to do that. Modules usually take arguments that specify what exactly
should happen. On the command line, you can add those arguments with `ansible
-m module -a 'arguments', for example

Ansible comes with a wealth of built-in modules and an ecosystem of
third-party modules as well. Here I want to present just a few, commonly-used
modules.

The shell Module

The shell module
executes a shell command on the host and accepts some options such as chdir
to change into another working directory first:

It is pretty generic, but also an option of last resort. If there is a more
specific module for the task at hand, you should prefer the more specific
module. For example you could ensure that system users exist using the shell
module, but the more specialized user
module is much easier to
use for that, and likely does a better job than an improvised shell script.

The copy Module

With copy you can
copy files verbatim from the local to the remote machine:

The template Module

template mostly
works like copy, but it interprets the source file as a Jinja2
template before transferring it to the
remote host.

This is commonly used to create configuration files and to incorporate
information from variables (more on that later).

Templates cannot be used directly from the command line, but rather in
playbooks, so here is an example of a simple playbook.

More on playbooks later, but what you can see is that this defines a variable
team, sets it to the value Slacker, and the template interpolates this
variable.

When you run the playbook with

It creates a file /etc/motd on the database server with the contents

The file Module

The file module manages
attributes of file names, such as permissions, but also allows you create
directories, soft and hard links.

The apt Module

On Debian and derived distributions, such as Ubuntu, installing and removing
packages is generally done with package managers from the apt family, such
as apt-get, aptitude, and in newer versions, the apt binary directly.

The apt module manages
this from within Ansible:

Here the screen package was already installed, so the module didn't change
the state of the system.

Separate modules are available for managing
apt-keys with which
repositories are cryptographically verified, and for managing the
repositories themselves.

The yum and zypper Modules

For RPM-based Linux distributions, the yum
module (core) and zypper
module (not in core, so
must be installed separately) are available. They manage package installation
via the package managers of the same name.

The package Module

The package module
tries to use whatever package manager it detects. It is thus more generic than
the apt and yum modules, but supports far fewer features. For example in
the case of apt, it does not provide any control over whether to run apt-get
update before doing anything else.

Application-Specific Modules

The modules presented so far are fairly close to the system, but there are
also modules for achieving common, application specific tasks. Examples
include dealing with
databases,
network related things such as
proxies,
version control
systems,
clustering solutions such as
Kubernetes, and so on.

Playbooks

Playbooks can contain multiple calls to modules in a defined order and limit
their execution to individual or group of hosts.

They are written in the YAML file format, a data
serialization file format that is optimized for human readability.

Here is an example playbook that installs the newest version of the go-agent
Debian package, the worker for Go Continuous Delivery:

The top level element in this file is a one-element list. The single element
starts with hosts: go-agent, which limits execution to hosts in the group
go-agent. This is the relevant part of the inventory file that goes with it:

Then it sets the variable go_server to a string, here this is the hostname
where a GoCD server runs.

Finally, the meat of the playbook: the list of tasks to execute.

Each task is a call to a module, some of which have already been discussed.
A quick overview:

First, the Debian package apt-transport-https is installed, to make sure
that the system can fetch meta data and files from Debian repositories
through HTTPS

The next two tasks use the
apt_repository
and apt_key modules
to configure the repository from which the actual go-agent package shall
be installed

Another call to apt installs the desired package. Also, some more
packages are installed with a loop
construct

The lineinfile module
searches by regex for a line in a text file, and replaces the appropriat line
with pre-defined content. Here we use that to configure the GoCD server that
the agent connects to.

Finally, the service
module starts the agent if it's not yet running (state=started), and
ensures that it is automatically started on reboot (enabled=yes).

Playbooks are invoked with the ansible-playbook command.

There can be more than one list of tasks in a playbook, which is a common
use-case when they affect different groups of hosts:

Variables

Variables are useful both for controlling flow inside a playbook, and for
filling out spots in templates to generate configuration files.

There are several ways to set variables. One is directly in playbooks, via
vars: ..., as seen before. Another is to specify them at the command line:

Another, very flexible way is to use the group_vars feature. For each group
that a host is in, Ansible looks for a file group_vars/thegroup.yml and
for files matching `group_vars/thegroup/*.yml. A host can be in several
groups at once, which gives you quite some flexibility.

For example, you can put each host into two groups, one for the role the
host is playing (like webserver, database server, DNS server etc.), and one
for the environment it is in (test, staging, prod). Here is a small example
that uses this layout:

To roll out only the test hosts, you can run

and put environment-specific variables in group_vars/test.yml and
group_vars/prod.yml, and web server specific variables in
group_vars/web.yml etc.

You can use nested data structures in your variables, and if you do, you can
configure Ansible to merge those data structures for you. You can configure it
by creating a file called ansible.cfg with this content:

That way, you can have a file group_vars/all.yml that sets the default
values:

And then override individual elements of that nested data structure, for
example in group_vars/test.yml:

The keys that the test group vars file didn't touch, for example
myapp.db.username, are inherited from the file all.yml.

Roles

Roles are a way to encapsulate parts of a playbook into a reusable component.

Let's consider a real world example that leads to a simple role definition.

For deploying software, you always want to deploy the exact
version you want to
build,
so the relevant part of the playbook is

But this requires you to supply the package_version variable whenever you
run the playbook, which will not be practical when you instead configure a new
machine and need to install several software packages, each with their own
playbook.

Hence, we generalize the code to deal with the case that the version number is
absent:

If you run several such playbooks on the same host, you'll notice that it
likely spends most of its time running apt-get update for each playbook. This
is necessary the first time, because you might have just uploaded a new
package on your local Debian mirror prior to the deployment, but subsequent
runs are unnecessary. So you can store the information that a host has already
updated its cache in a fact,
which is a per-host kind of variable in Ansible.

As you can see, the code base for sensibly installing a package has grown a
bit, and it's time to factor it out into a role.

Roles are collections of YAML files, with pre-defined names. The commands

create an empty skeleton for a role named custom_package_installation.
The tasks that previously went into all the playbooks now go into the file
tasks/main.yml below the role's main directory:

To use the role, first add the line roles_path = roles in the file
ansible.cfg in the [default] section, and then in a playbook, include it
like this:

pre_tasks and tasks are optional; a playbook consisting of only roles
being included is totally fine.

Summary

Ansible offers a pragmatic approach to configuration management, and is easy
to get started with.

It offers modules for low-level tasks such as transferring files and
executing shell commands, but also higher-level task like managing packages and
system users, and even application-specific tasks such as managing PostgreSQL
and MySQL users.

Playbooks can contain multiple calls to modules, and also use and set
variables and consume roles.

Ansible has many more features, like handlers, which allow you to restart
services only once after any changes, dynamic inventories for more flexible
server landscapes, vault for encrypting
variables, and a rich
ecosystem of existing roles for managing common applications and middleware.

For learning more about Ansible, I highly recommend the excellent book
Ansible: Up and Running by Lorin Hochstein.

I'm writing a book on automating
deployments. If this topic interests you, please sign up for the Automating Deployments
newsletter. It will keep you informed about automating and continuous
deployments. It also helps me to gauge interest in this project, and your
feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

Email Address *

First Name

Last Name