Over the past few weeks as I've been using Google App Engine, I've come across people requesting benchmarks so they can compare App Engine performance to other solutions before they try it out. I don't really think comparing Google App Engine and it's Datastore to something like Azure and SQL Server is all that useful (because you'd generally structure things very different on each platform), but either way, it's interesting to see how things perform.
As well as comparing numbers with other platforms, I think it's worthwhile for App Engine developers to know how the different APIs perform (eg. the difference between fetching something from Memcache and the Datastore). Over the next few blog posts, I'm hoping to provide some numbers I gathered on the production App Engine servers. Please bear in mind that App Engine is still considered a "preview" and as things evolve, performance may change (hopefully for the better!).
The first set of data I gathered was on the performance of db.put(), and more specifically, the difference between calling db.put() multiple times with single entities vs calling it with a whole bunch of entities at once. It's very easy to call db.put() multiple times in a single request, but it's usually trivial to change your code to save all the entities in a single call. I thought that illustrating this difference with some pretty graphs might encourage people to use batch operations.
As always, the size and shape of your data will affect your timings. In my sample I used a small entity with only three string properties (and a key_name). You can grab a copy of the data I gathered in CSV format: App Engine db.Put() Benchmarks
db.put() Performance
Each test was run 10 times, and both the mean and median values plotted on a chart. I did this for varying numbers of entities from 10 to 500 (since db.put() has a limit of 500 entities).
As expected, in all cases, the batch method out-performs calling db.put() many times. Both operations scale very linearly (note the first 3 points are not increments of 100, which is why the line doesn't appear straight). For very small numbers (eg. 10 or less) the results are very similar, but as the number of entities increases, it becomes more important to batch up your requests.
It's worth noting that when you call db.put() with multiple entities, they are not combined into a transaction. If one of the writes fails, an error is raised, but any entities that have already been saved are not rolled back. If you want to update multiple entities are part of a transaction, you must do this the usual way by giving them the same parent and using run_in_transaction.
db.put() Performance Consistency
Since each test was run 10 times, I had enough data to draw a chat showing the consistency of the db.put() performance. The closer a line is to being completely horizontal, the more consistent the write performance is.
Individual db.put() Performance
Batch db.put() Performance
As you can see, the calls are fairly consistent, though the more entities you're saving, the bigger the variance. Although from the graph it looks like the batch calls are less consistent, they graph is drawn at a different scale. The batch calls vary my up to 1.5 seconds, whereas the individual calls vary by up to 5 seconds!
I hope to run some more benchmarks over the coming weeks showing the difference between other APIs such as using Memcache to avoid going to the Datastore on every request.
This post was served up via my RSS feed. Please visit the original article to read/post comments. If you found this article interesting, why not follow @DanTup on Twitter for more? :)