2014-06-17

[Source]

Sometimes writing callback-style asynchronous code with Tornado is a pain. But the real hurt comes when you want to refactor your async code into reusable subroutines. Tornado's coroutines make refactoring easy. I'll explain the rules.

(This article is updates my old "Refactoring Tornado Code With gen.engine". The updated code here demonstrates the current syntax for Tornado 3 and Motor 0.3.)

For Example

I'll use this blog to illustrate. I built it with Motor-Blog, a trivial blog platform on top of Motor, my asynchronous MongoDB driver for Tornado.

When you came here, Motor-Blog did three or four MongoDB queries to render this page.

1: Find the blog post at this URL and show you this content.

2 and 3: Find the next and previous posts to render the navigation links at the bottom.

Maybe 4: If the list of categories on the left has changed since it was last cached, fetch the list.

Let's go through each query and see how Tornado coroutines make life easier.

Fetching One Post

In Tornado, fetching one post takes a little more work than with blocking-style code:

Not so bad. But is it better with a coroutine?

Much better. If you don't pass a callback to find_one, then it returns a Future instance. A Future is nothing special, it's just a little
object that represents an unresolved value. Some time hence, Motor will resolve the Future with a value or an exception. To wait for the Future
to be resolved, yield it.

The yield statement makes this function a generator.
gen.coroutine is a brilliant invention that runs the generator until it's complete.
Each time the generator yields a Future, gen.coroutine schedules the generator
to be resumed when the Future is resolved. Read the
source
code of the Runner class for details, it's exhilarating. Or just
enjoy the glow of putting all your logic in a single function again, without
defining any callbacks.

Even better, you get normal exception handling: if find_one gets a network error or some other failure, it raises an exception. Tornado knows how to turn an exception into an HTTP 500, so we no longer need special code for errors.

This coroutine is much more readable than a callback, but it doesn't look any nicer than multithreaded code.
It will start to shine when you need to parallelize some tasks.

Fetching Next And Previous

Once Motor-Blog finds the current post, it gets the next and previous posts so it can display their titles. Since the two
queries are independent we can save a few milliseconds by doing them in parallel.
How does this look with callbacks?

This is completely disgusting and it makes me want to give up on async.
We need special logic in each callback to determine if the other callback has already run or not.
All that boilerplate can't be factored out. Will a coroutine help?

Yielding a list of Futures tells the coroutine to wait until they are all resolved.

Now our single get function is just as nice as it would be with blocking code.
In fact, the parallel fetch is far easier than if you were multithreading instead of using Tornado.
But what about factoring out a common subroutine that request handlers can share?

Fetching Categories

Every page on my blog needs to show the category list on the left side. Each request handler could just include
this in its get method:

But that's terrible engineering. Here's how to factor it into a coroutine:

This coroutine does not have to be part of a request handler—it stands on its own at the module scope.

The raise gen.Return() statement is the weirdest syntax in this example. It's an artifact of Python 2, in which generators aren't allowed to return values. To hack around this limitation, Tornado coroutines raise a special kind of exception called a Return. The coroutine catches this exception and treats it like a returned value. In Python 3, a simple return categories accomplishes the same result.

To call my new coroutine from a request handler, I do:

Since get_categories is a coroutine now, calling it returns a Future.
To wait for get_categories to complete, the caller can yield the Future.
Once get_categories completes, the Future it returned is resolved,
so the caller resumes.
It's almost like a regular function call!

Now that I've factored out get_categories, it's easy to add more logic to it. This is nice because I want to cache the categories between page
views. get_categories can be updated very simply to use a cache:

(Note for nerds: I invalidate the cache whenever a post with a new
category is added. The "new category" event is saved to a
capped collection
in MongoDB, which all the Tornado servers are always tailing.
This is a simple way to use MongoDB as an event queue, which the multiple Tornado processes use to communicate with each other.)

Conclusion

Tornado's excellent documentation
shows briefly how a method that makes a few async calls can be
simplified using gen.coroutine, but the power really comes when you need to
factor out a common subroutine. There are only three steps:

Decorate the subroutine with @gen.coroutine.

In Python 2, the subroutine returns its result with raise gen.Return(result).

Call the subroutine from another coroutine like result = yield subroutine().

That's all there is to it. Tornado's coroutines make asynchronous code efficient, clean—even beautiful.

Show more