Mysqlperformanceblog.com

Migrating MongoDB Away from MMAPv1

2017-03-09

This is another post in the series of blogs on the Percona Server for MongoDB 3.4 bundle release. In this blog post, we’ll discuss moving away from the MMAPv1 storage engine.

Introduction

WIth the MongoDB v3.0 release in February of 2015, the long-awaited ability to choose storage engines became a reality. As of version 3.0, you could choose two engines in MongoDB Community Server and, if you use Percona Server for MongoDB, you could choose from four. Here’s a table for ease of consumption:

Here’s a table for easy consumption:

Storage Engine

Percona Server for MongoDB

MongoDB Community Server

MongoDB Enterprise Server (licensed)

MMAPv1

✔

✔

✔

WiredTiger

✔

✔

✔

MongoRocks

✔

In-memory

✔

✔

Encrypted

✔

Why change engines?

With increased possibilities comes an increase in the decision-making process difficult (a concept that gets reinforced every time I take my mother out a restaurant with a large menu – ordering is never quick). In all seriousness, the introduction of the storage engine API to MongoDB is possibly the single greatest feature MongoDB, Inc has released to-date.

One of the biggest gripes from the pre-v3.0 days was MongoDB’s lack of scale. This was mostly due to the MMAPv1 storage engine, which suffered from a very primitive locking scheme. If you would like a illustration of the problem, think of the world’s biggest supermarket with one checkout line – you might be able to fit in lots of shoppers, but they’re not going to accomplish their goal quickly. So, the ability to increase performance and concurrency with a simple switch is huge! Additionally, modern storage engines support compression. This should reduce your space utilization when switching by at least 50%.

All the way up to MongoDB v3.2, the default storage engine was MMAPv1. If you didn’t make a conscious decision about what storage engine to choose when you started using MongoDB, there is a good chance that MMAPv1 is what you’re on. If you’d like to find out for sure what engine you’re using, simply run the command below. The output will be the name of the storage engine. As you can see, I was running the MMAPv1 storage engine on this machine. Now that we understand where we’re at, let’s get into where we can be in the future.

Public Service Announcement

Before we get into what storage engine(s) to evaluate, we need to talk about testing. In my experience, a majority of the MySQL and MongoDB community members are rolling out changes to production without planning or testing. If you’re in the same boat, you’re in very good company (or at least in a great deal of company). However, you should stop this practice. It’s basic “sample size” in statistics – when engaged in risk-laden behavior, the optimal time to stop increasing the sample size is prior to the probability of failure reaching “1”. In other words, start your testing and planning process today!

At Percona, we recommend that you thoroughly test any database changes in a testing or development environment before you decide to roll them into production. Additionally, prior to rolling the changes into production (with a well thought out plan, of course), you’ll need to have a roll-back plan in case of unintended consequences. Luckily, with MongoDB’s built-in replication and election protocols, both are fairly easy. The key here is to plan. This is doubly true if you are undertaking a major version upgrade, or are jumping over major versions. With major version upgrades (or version jumps) comes the increased likelihood of a change in database behavior as it relates to your application’s response time (or even stability).

What should I think about?

In the table above, we listed the pre-packaged storage engine options that are available to us and other distributions. We also took a look at why you should consider moving off of MMAPv1 in the preceding section. To be clear, in my opinion a vast majority of MongoDB users that are on MMAPv1 can benefit from a switch. Which engine to switch to is the pressing question. Your first decision should be to evaluate whether or not your workload fits into the sweet spot for MMAPv1 by reading the section below. If that section doesn’t describe your application, then the additional sections should help you narrow down your choices.

Now, let’s take a look at what workloads match up with what storage engines.

MMAPv1

Believe it or not, there are some use cases where MMAPv1 is likely to give you as good (or better) performance as any other engine. If you’re not worried about the size of your database on disk, then you may not want to bother changing engines. Users that are likely to see no benefit from changing have read-heavy (or 100%) read applications. Also, certain update-heavy use cases, where you’re updating small amounts of data or performing $set operations, are likely to be faster on MMAPv1.

WiredTiger

WiredTiger is a the new default storage engine for MongoDB. It is a good option for general workloads that are currently running on MMAPv1. WiredTiger will give you good performance for most workloads and will reduce your storage utilization with compression that’s enabled by default. If you have a write-heavy workload, or are approaching high I/O utilization (>55%) with plans for it to rise, then you might benefit from a migration to WiredTiger.

MongoRocks (RocksDB from Facebook)

This is Facebook’s baby, which was forged in the fires of the former Parse business unit. MongoRocks, which uses LSM indexing, is advertised as “designed to work with fast storage.” Don’t let this claim fool you. For workloads that are heavy on writes, highly concurrent or approaching disk bound, MongoRocks could give you great benefits. In terms of compression, MongoRocks has the ability to efficiently handle deeper compression algorithms, which should further decrease your storage requirements.

In-Memory

The in-memory engine, whether we’re speaking about the MongoDB or Percona implementation, should be used for workloads where extreme low latency is the most important requirement. The types of applications that I’m talking about are usually low-latency, “real-time” apps — like decision making or user session tracking. The in-memory engine is not persistent, so it operates strictly out of the cache allocated to MongoDB. Consequently, the data may (and likely will) be lost if the server crashes.

Encrypted

This is for applications in highly secure environments where on-disk encryption is necessary for compliance. This engine will protect the MongoDB data in the case that a disk or server is stolen. On the flip side, this engine will not protect you from a hacker that has access to the server (MongoDB shell), or can intercept your application traffic. Another way to achieve the same level of encryption for compliance is using volume level encryption like LUKS. An additional benefit of volume level encryption, since it works outside the database, is re-use on all compliant servers (not just MongoDB).

Getting to your new engine

Switching to the new engine is actually pretty easy, especially if you’re running a replica set. One important caveat is that unlike MySQL, the storage engine can only be defined per mongod process (not per database or collection). This means that it’s an all or nothing operation on a single MongoDB process. You’ll need to reload your data on that server. This is necessary because the data files from one engine are not compatible with another engine. Thus reloading the data to transform from one engine format to another is necessary. Here are the high-level steps (assuming you’re running a replica set):

Make sure you’re not in your production environment

Backup your data (it can’t hurt)

Remove a replica set member

Rename (or delete) the old data directory. The member will re-sync with the replica set

Make sure you have enough disk space if you’re going to keep a copy of the old data directory

Update the mongo.conf file to use a new storage engine. Here’s an example for RocksDB from our documentation:

Start the MongoDB process again

Join the member to the replica set (initial sync will happen)

When the updated member is all caught up, pick another member and repeat the process.

Continue until the primary is the only server left. At this point, you should step down the primary, but hold off switching storage engines until you are certain that the new storage engine meets your needs.

The Wrap Up

At this point I’ve explained how you can understand your options, where you can gain additional performance and what engines to evaluate. Please don’t forget to test your application with the new setup before launching into production. Please drop a comment below if you found this helpful or, on the other hand, if there’s something that would make it more useful to you. Chances are, if you’d find something helpful, the rest of the community will as well.