2015-02-04

Apache Mesos wants to make developing software for your data center as easy as for your laptop. I had put off reading up on Mesos for a while as it is more applicable to larger sites, but I had been hearing about it more frequently in the context of Docker so I finally did some research. Here is my quick summary.

What is Mesos

Mesos is interesting if you have a significant number of machines, especially if running different workloads. For example you may have a public web site you are hosting and a Hadoop cluster for bulk data processing. Mesos can help you get better utilization from a set of machines. If you run your own data center, you want to get the most out of the hardware you have as adding more hardware can be slow and expensive. If hosted by a cloud provider, adding more hardware is easier (not your problem) but you still have to pay for it. If you can use less hardware it reduces your operational costs, especially if you rent by the month (rather than say by the minute).

So Mesos is not likely to be interesting if you have only one or two servers running an application.

What Mesos does is allocate work to machines in a cluster. Imagine your machines as one big computer. It’s just your processors don’t have shared memory, meaning units of work have to talk to each other over sockets. A key difference is Mesos also worries about moving that work if (when!) a machine fails.

A cool little demo video can be found at http://mesosphere.com/learn/ – just click the “Watch Demo” button near the top of the page. Mesosphere is a commercial offering on top of Apache Mesos. It shows visualizing the load on your existing servers, adding new servers, support for fault tolerance when a server dies, and so on.

Before you get too excited however, I don’t believe reality will be quite as wonderful as the above video depicts. Mesos is not going to mean you never need to worry about tuning applications. And deciding to deploy a new Hadoop cluster is not the sort of thing you do every day. The good news is it does not have to be that wonderful. What is interesting to me is the ease at which you can flex your capacity up and down, and cope with failures (and even trigger them to test your system before you go live). The speed at which adjustments can be made can be particularly useful in emergencies where you need to react to a production issue quickly. This is where the benefits of such tools shine. Put another way, for me keeping things running matters more than setting things up for the first time.

Going Deeper

I am not going to summarize all of Mesos. Here is my 50,000 ft overview.

Consider say a high end Magento site with a cluster of 5 Varnish instances in front of a cluster of 50 web servers with a cluster of 1 master and 5 replica MySQL databases. (This is a made up configuration.) What Mesos will do is pick servers for these different units of work to run on. If a host dies, it will start up the work on a different server. Each unit of work might be, for example, code running in a Docker container. There would be Docker images for Varnish, the web server, etc, and Mesos would spin up instances of the images on various servers in the cluster.

To connect the containers together Mesos (or to be precise Marathon) can use HA Proxy instances on each server to direct network traffic to the appropriate destination. Periodically each server checks back to a master to see if the HA Proxy configuration needs updating (e.g. if new machines are available or have been removed). This means the units of work don’t need major reconfiguration when a server fails – topology changes are rolled out by changing HA Proxy configuration files on machines in the cluster. See http://mesosphere.com/docs/getting-started/service-discovery/ for more details.

If you only run the above Magento installation, you may decide against using Mesos in favor of controlling your topology more carefully (with the aim of getting better performance). Where Mesos gets more interesting is when you also have say a Hadoop cluster that ever night analyzes your site activity for the day. You are happy for this to run over night during off peak. So ideally each night you would like to reduce the number of web servers running and increase the number of Hadoop nodes. Then the next morning you switch capacity back to the web servers. Making it easier to switch workloads is where I can see real benefit in technologies like Mesos.

One thing to realize is Mesos will recover if a server dies, but it is not always instantaneous. It might take it a minute or two to realize and adjust. HA Proxy is configured to point to multiple possible destinations, so most of the time one server going down won’t be noticed (with appropriately redundant nodes) – but there are some faults that will. It is still better to have your site come back without a human being around, even if the site does go down for a few minutes. Especially if it is the middle of the night!  And of course you can change the heartbeat rate to detect problems faster if needed.

What Mesos Does Not Do

Mesos is not magic. It is a useful part of an overall solution, but it does not solve all High Availability (HA) problems. For example, Mesos will not make your application fault tolerant. You need to design your code so it can be killed at any time and restarted at any time, ideally leaving and joining a cluster gracefully. Mesos is also not a load balancer. And it also does not impose quotas or limits.

Mesos is a scheduler – allocating work to servers, reallocating it if the server the work was running on dies. It can also spin up new instances of that work, or remove instances. Oh, and there is Chronos for scheduling jobs (a version of cron for your whole data center), health checks, and support for rolling deployments (allowing new software to be deployed without the site going down).

Application Design Implications

So how should you design an application to run under Mesos? Here are some qualities to take into account. The good news is these are not specific to Mesos.

Fault tolerant – it should be designed so a server can be killed but the overall application keeps running.

Scalable – adding more hardware should make your application run faster. If you bottleneck on a shared resource then adding more hardware may not help.

Elastic – it should be possible to add or remove hardware without your application stopping.

Multi-tenant – the code needs to be good neighbors as other applications may be running on the same server. Hammering the network or local disk too hard is not good on a shared server.

Rolling deployments – well designed code can co-exist with one older version of the code running on same cluster (particularly tricky when it comes to database schema changes).

Health checks – design monitoring support into your application by working out what metrics best represent the health of the application.

These qualities have implications on how to design your application.

Decompose your application into discrete services on the boundaries of fault domains, scaling, and data workload. For example, a web server should not depend on local file storage to survive outages. Using a distributed Redis cache may provide the required resilience here. Magento having all the processing logic in the web server makes it easier to recover from problems than if the business logic was distributed across multiple servers.

You may decide to have more lower-powered servers to minimize the impact of losing a node. Don’t design your code assuming super-powered servers.

Make as many things as possible stateless.

When dealing with state, think about issues such as concurrency, bottlenecks, throughput, latency, and durability of the data. Shared resources are necessary, but a common source of throughput bottlenecks.

Keep an eye on network utilization. If you split your business logic on the wrong boundaries, you may end up saturating your network with traffic between services. Understanding the limitations of your data center or cloud hosting provider is important here. Even if two hosting providers support the same hardware configurations, the network between them can have very different behaviors.

Coming back to Magento as an example, Varnish nodes are independent – it’s safe to add or remove nodes behind a load balancer at any time. So that is good. Web servers running the Magento code don’t have to preserve state – that is stored in MySQL (or Redis or memcached). So that is also good. Redis has distributed support. That leaves the database.

The biggest weakness in the design of most Magento topologies trying to achieve HA is the database. One approach is to have a single master database with read-only replicas, where if the master goes down one of the replicas is then reconfigured to become the master. Another approach is to have master-master replication, but only write to one of the MySQL instances (allowing flipping to the other master if the first master goes down with less configuration changes). Yet another approach is to use a different database technology like MySQL Cluster or Clustrix which build fault tolerance into the base product.

Conclusions

Mesos does look a useful tool for organizations running multiple workloads. It can simplify spinning up and down workloads on machines. This in turn can give you efficiencies if you can plan out your workload so that, for example, you run more batch jobs during off peak hours. Making it trivial (common) to spin up and down web server nodes for example takes the risk out of this (which is also good for seasonal peak times). This can help organizations get more out of their hardware.

For me with my personal focus on Magento, Mesos does not change my interest in Docker for defining standard “lego blocks” from which to build a Magento installation. For a large site, Mesos could be used to make sure the Docker containers stay running. For a small site, you would just use Docker directly. But it is rather cool to see where the industry is heading.

Show more