2015-01-08

"Should vegetarians open steakhouse restaurants?"

Though someone will probably give me several examples of why they should, I'll argue that they absolutely should not. How can someone who doesn't eat steak convince others to eat at their "steak-only" restaurant?

But this is something a "professional technology benchmarker" (PTB) struggles with on a regular basis. Hello, I'm Tim Callaghan, and I'm a PTB.

professional technology benchmarker, or PTB (noun) : One who compares two technologies as part of their job. One of these technologies is usually the product of the PTB's employer, the other is almost always not.
In a past experience I was tasked with comparing the performance of a fully in-memory database with Oracle and MySQL on a "TPC-C like" workload. At the time I was an Oracle expert and working for the in-memory database company, but had never started a single MySQL server in my life. At Tokutek I've run numerous comparisons of TokuDB and TokuMX against InnoDB and MongoDB. In fact it's a large part of my job, and something I really enjoy.

In benchmarking competing technologies I always follow the same exact process:

Decide which competitive advantage to showcase (keep it simple).

Build the benchmark (borrow from existing apps).

Execute the benchmark (record everything).

Publish and explain the results (blog and encourage feedback).

Step 3 is where I'm always overly cautious. Here's a punch list of rules I follow:

To the best extent possible, make sure that the benchmark environment is fair to everyone.

Nothing invalidates results faster than a misconfigured system.

Capture all details about the environment and publish them in your results.

Hardware, operating system, configuration parameters.

Get advice from the experts on any technology you aren't an expert in.

Minimally, show them the results of your benchmark and ask for feedback prior to publishing.

In my opinion, this last bullet is the most important one. When I first started at Tokutek I was asked to improve the benchmarking. Tokutek's only product at the time was TokuDB, a MySQL storage engine competing with InnoDB. There were several resources at Tokutek to help me configure TokuDB, but InnoDB was another story. I needed to configure a few brand new servers and get benchmarking immediately.

Did I tear open the server boxes and run benchmarks? Nope. Rather, I called my brother. He told me to reach out to Giuseppe Maxia (The Data Charmer) about optimally configuring CentOS servers and Vadim Tkachenko (MySQL Performance Expert) about configuring InnoDB for performance.

Prior to reaching out to Giuseppe and Vadim, I did my homework by reading as much of their web based content as possible. I then sent them emails asking for assistance, and was amazed at how much they were willing to help. That was over 3 years ago and they are still helpful whenever I have a question.

So where am I going with this?

I recently wrote a blog titled "Can we improve the current state of benchmarking?". In it I proposed ways to improve the process of technology benchmarking, primarily peer review. I discussed a mistake in the implementation of the STSsoft Database Benchmark, specifically how it was incorrectly checking size for TokuDB. The benchmark code was checking uncompressed size, not compressed. A simple error, and one that could have easily been reviewed and discussed prior to the putting marketing claims around compression on their website.

Equally concerning to me in the benchmark results was the insertion performance of TokuDB. The STSdb product page claims a "10x performance improvement" over Fractal Trees. Even though the particular benchmark workload was a random insertion pattern, the TokuDB "REPLACE INTO" optimization should have handled it with ease. Granted, the hardware for the test was not ideal as an Intel Celeron processor and single 500G 7.2K SATA hard drive.

So I dug in and read the benchmark code some more...

In their performance chart it shows STSdb 4.0 inserting at a very high rate of speed, the exit throughput looks to be just above 50000 inserts per second. The TokuDB insert performance is horribly low, it's hard to read on the graph but I'd estimate it to be around 1500 inserts per second.

In reading the benchmark code I found the bottleneck for TokuDB's performance was the IO performance. In my test a single SATA drive showed nearly 100% IO utilization. By default, TokuDB runs fully durable meaning that every commit is followed by an fsync() operation. I'm not sure what the STSdb durability guarantee is (I'm the vegetarian in their steakhouse), but given that their documentation states that ACID is on the road-map I find it hard to believe they are performing fsync() for each commit, nor do I understand what an STSdb commit even is. I'm confident that a consumer grade SATA drive isn't going to perform more than ~100 IOPs.

So I ran two tests. One was to shutoff fsync-on-commit behavior in TokuDB. And the benchmark ran much faster. But I like the D in ACID, so I modified the benchmark application to perform 10000 inserts per batch instead of 1000, which reduces the number of fsync() operations by 90%. The results are dramatic.

Note that I'm running on TokuDB v7.5.3 for MySQL 5.5.40, stock defaults (no TokuDB variables defined in my.cnf other than a 256M cache and directIO), on an Ubuntu 14.04 desktop with a Core i7-4790K, 32GB RAM, and an Intel 480GB SSD. The benchmark client is running in a Windows 7 Virtual Machine (VMware Workstation 11.0) on the Ubuntu desktop.

Insert performance, 100 million rows, random keys, 1000 inserts per batch.



Insert performance, 100 million rows, random keys, 10000 inserts per batch.



Increasing the batch size from 1000 to 10000 improved TokuDB insert throughput over 3x. This is largely explained by the fact that a single SATA disk offers low IOPs, so the fsync operations were gating performance with smaller batches. Disabling fsync-on-commit makes it run even faster.

Note: I can't explain why my insert performance was far higher than theirs, as I only changed the stock TokuDB configuration to be a 256M cache and directIO (to make sure this isn't an in-memory test). I'd guess it's their CPU and hard drive, but I'm not sure. And yes, I'd be happy to help figure it out.

So I'm back to where I started. How can I improve things? I'm not an expert in every competing technology I benchmark against. Yet as a professional technical benchmarker I want people to trust my results.

For now I can only wait for others to question my results, configurations, and benchmark applications. While I'm waiting I'll continue questioning the results of my peers. And it doesn't have to be all doom-and-gloom. I'll also be pointing out when I find a great benchmark, or benchmarker, or benchmarketer.

PlanetMySQL Voting: Vote UP / Vote DOWN

Show more