2015-10-07

Instance metadata service is a server available to virtual machines hosted on
the cloud providers (often at http://169.254.169.254/). It provides useful
information about the VM itself and its environment, which the VM typically does
not have access to.

It is often used to configure and distinguish VM instances from each other in
scripts and helps a great deal in bootstrapping cluster orchestrators such as
Kubernetes, Mesos etc.

In a nutshell, the metadata server works like:

In this article, I looked at metadata service offerings of AWS EC2, Google
Compute Engine and DigitalOcean to compare them. At the time of writing,
Microsoft Azure does not provide a metadata service similar to these.

Table of Contents

DigitalOcean: Highlights

AWS EC2: Highlights

Google Compute Engine: Highlights

Feature Comparison Chart

Performance benchmarks

Conclusion

1. DigitalOcean: Highlights

Documentation: https://developers.digitalocean.com/…/metadata/

Although DigitalOcean is not a big player or a full blown cloud provider, their
VPS offering is widely adopted and their lean approach to cloud instances
(droplets) are very practical to use.

Good:

It is minimalist, simply because the environment is. It provides user-data
(cloud-init), public IP, region etc.

The directory queries can be retrieved as JSON (instead of plain text) if you
append .json to the URL.

Bad:

It could’ve made a DigitalOcean API token available on the metadata service to
automate certain operations (such as scale up) within the droplet.

No dynamic metadata. user-data cannot be changed after the droplet is created.

2. AWS EC2: Highlights

Documentation: https://docs.aws.amazon.com/…/ec2-instance-metadata.html

Amazon Web Services was pretty much the first player in the cloud market, in
fact they might as well be the ones who invented the whole concept of “instance
metadata service” and the IP address 169.254.169.254.

Although it is very much the de-facto standard of metadata services, I found it
not modern enough and it is not really dynamic.

Good:

The ami-launch-index field (goes on like 0, 1, 2, …) when
multiple instances of the same AMI are launched. This can be useful only a little.

If there are IAM roles associated with the instance, security credentials are
available on metadata service and it rotates them automatically.

However the metadata service does not take certain measures to protect
them (read on for what GCE does).

Bad:

Tags provided for the instance on EC2 Management Console are not available on
the metadata service. I wonder why.

AWS CLI does not automatically authenticate even though credentials are
perhaps available in the metadata service.

Confusing versioning, they have version numbers like 1.0 and 2015-01-05,
no way to tell which one is the newest. Luckily you can just say latest in
the URL.

It seems like there is a JSON endpoint instance-identity/document, but it
looks like a soup.

I observed lack of strict validation of the URI segments. Such as:

3. Google Compute Engine: Highlights

Documentation: https://cloud.google.com/compute/docs/metadata

Maybe it’s the advantage of being the last one joining the party, but GCE’s
metadata service is just perfect. It provides a great deal of flexibility, it is
very dynamic and yet still not rocket science.

Good:

Google allows you to set dynamic project-wide metadata (key-value pairs, up to 32k). Any project metadata is available to all VMs within the project. Imagine this as the shared metadata among members of a machine cluster.

Also, you can set custom instance metadata (k/v pairs and tags) on the instance and these will be available to the VM within 10 seconds. The “dynamic” aspect is a key differentiator.

gcloud command-line tool automatically authenticates and works out of the box when the VM is provisioned (for instance, you can delete the VM you are currently on). This is very neat.

Speaking of dynamic metadata, if you provide ?wait_for_change=true, the metadata service holds off on your request and returns a response when something changes (such as a new tag gets added or VM migration policy is changed) –although I could not get it working with external IP changes.

The metadata service makes transparent maintenance notices available when your VM is about to get rebooted or migrated. You can subscribe to these using ?wait_for_change=true.

Like AWS EC2, GCE also makes service credentials available on the metadata service (such as Storage, BigQuery) and it rotates these keys automatically.

As a security measure, to prevent accidental proxied access to the metadata service, it refuses to respond queries containing the X-Forwarded-For header. I think it is a nice touch.

Like DigitalOcean, you can get a JSON response by adding ?recursive=true to your request (although this does not work for tokens in instance/service-accounts/).

Bad:

You have to provide Metadata-Flavor: Google header all the time. I am not
sure why this is needed.

There is an instance/virtualClock endpoint that is not documented. No big deal.

The VM description is available on the metadata service, but the disk
description is not.

4. Feature Comparison Chart

Feature

DO

AWS

GCE

cloud-init

Yes

Yes

Yes

External IP

Yes

Yes

Yes

SSH Public Keys

Yes

Yes

Yes

Region/Zone

Yes

Yes

Yes

Disks

N/A

Yes

Yes

Machine type/size

No

Yes

Yes

Dynamic custom metadata

No

No

Yes

Watch for changes

No

No

Yes

Security credentials

No

Yes

Yes

JSON response format

Yes

Meh

Yes

5.Performance Benchmarks

Metadata services are often meant to be used only once to bootstrap things or
maybe a few times a day, so you don’t really care about performance. However,
out of curiosity, I tested performance of these metadata services by sending
10,000 requests (100 requests in parallel) and see how they perform.

DigitalOcean has applied some form of throttling (should be based on an
undocumented rate limit) in some test runs, but it often restored quickly afterwards.

Google Compute Engine performs really well at this concurrency level. When I
bump up the load and the concurrency, a long tail starts to show up and server
gets slower, as expected. I observed no explicit throttling.

AWS EC2 Instance Metadata Service has performed far worse than the others
under load and frequently returns HTTP 409 Conflict responses. I managed to get
a fully successful run once I lowered concurrency level to <10.

6. Conclusion

It’s clear that Google Compute Engine instance metadata service is well thought
out and carefully designed. I can see it being potentially useful in many
scenarios such as cluster bootstrapping.

AWS EC2 and DigitalOcean do not support custom metadata and they are not very
much dynamic, so that has been a big turn off for me.

I appreciate any comments, discussion and possibly comparisons with other
environments such as OpenStack Nova.

Show more