Broadcastingadam.com

Advanced Caching in Rails

2011-05-06

This post has been revised.
I highly suggest you check out that version. This version is outdated
but is left here for historical purposes. I've found some copy related
errors in this post and fixed them in the newer version. Also, this post
is written for Rails 2. The revised post focuses on Rails 3.1+

Caching in Rails is covered occasionally. It is covered in very basic
detail in the caching guide.
Advanced caching is left to reader. Here's where I come in. I recently
read part of Ryan Bigg's Rails 3 in Action upcoming
Rails book (review in the works) where he covers caching. He does a
wonderful job of giving the reader the basic sense of how you can use
page, action, and fragment caching. The examples only work well in a
simple application like he's developing in the book. I'm going to show
you how you can level up your caching with some new approaches.

Different Caching Layers

First, let's start with a brief overview of the different types of
caching:

Page Caching: PRAISE THE GODS if you actually can use page
caching in your application. Page caching is the holy grail. Save
the entire thing. Don't hit the stack & give some prerendered stuff
back. Great for worthless applications without authentication and
other highly dynamic aspects.

Action Caching: Essentially the same as page caching, except all
the before filters are run allowing you to check authentication
and other stuff that may have prevented the request for rendering.

Fragment Caching: Store parts of views in the cache. Usually for
caching partials or large bits of HTML that are independent from
other parts. IE, a list of top stories or something like that.

Rails.cache: All cached content except cached pages are stored
in the Rails.cache. Cached pages are stored as HTML on disk. We'll
use the fact that all the cached action and fragment content are
simply stored in Rails.cache. You can cache arbitrary content in
the Rails cache. You may cache a large complicated query that you
don't want to wait to reinstantiate a ton of AR::Base objects.

Under the Hood

All the caching layers are built on top of the next one. Page caching is
the only exception because it does not use Rails.cache it writes
content to disk. The cache is essentially a key-value store. Different
things can be persisted. Strings are most common (for HTML fragments).
More complicated objects can be persisted as well. Let's go through some
examples of manually using the cache to store things. I am using
memcached with dalli for all these examples. Any driver that
implements the cache store pattern should work.

Those are the basics of interacting withe the Rails cache. The rails
cache is a wrapper around whatever functionality is provided by the
underlying storage system. Now we are ready to move up a layer.

Understanding Fragment Caching

Fragment caching is taking rendered HTML fragments and storing them in
the cache. Rails provides a cache view helper for this. It's most
basic form takes no arguments besides a block. Whatever is rendered
during the block will be written back to the cache. The basic principle
behind fragment caching is that it takes much less time fetch
pre-rendered HTML from the cache, then it takes to generate a fresh copy.
This is very true. If you haven't noticed, view generation can be very
costly. Let's say you have generated a basic scaffold for a post:

Let's start with the most common use case: caching information specific
to one thing. IE: One post. Here is a show view:

Let's say we wanted to cache fragment. Simple wrap it in cache and
Rails will do it.

The first argument is the key for this fragment. The rendered HTML is
stored with this key: views/posts-1. Wait what? Where did that 'views'
come from? The cache view helper automatically prepends 'view' to all
keys. This is important later. When you first load the page you'll see
this in the log:

You can see the key and the operations. Rails is checking to see if the
specific key exists. It will fetch it or write it. In this case, it has
not been stored so it is written. When you reload the page, you'll see a
cache hit:

There we go. We got HTML from the cache instead of rendering it. Look at
the response times for the two requests:

Very small differences in this case. 2ms different in view generation.
This is a very simple example, but it can make a world of difference in
more complicated situations.

You are probably asking the question: "What happens when the post
changes?" This is an excellent question! What well if the post changes,
the cached content will not be correct. It is up to us to remove
stuff from the cache or figure out a way to get new content from the
cache. Let's assume that our blog posts now have comments. What happens
when a comment is created? How can handle this?

This is a very simple problem. What if we could figured out a
solution to this problem: How can we create a cache miss when the
associated object changes? We've already demonstrated how we can
explicitly set a cache key. What if we made a key that's dependent on the
time the object was last updated? We can create a key composed of the
record's ID and it's updated_at timestamp! This way the cache key will
change as the content changes and we will not have to expire things
manually. (We'll come back to sweepers later). Let's change our cache
key to this:

Now we can see we have a new cache key that's dependent on the objects
timestamps. Check out the rails log:

Cool! Now let's make it so creating a comment updates the post's
timestamp:

Now all comments will touch the post and change the updated_at
time stamp. You can see this in action by touch'ing a post.

This concept is known as: auto expiring cache keys. You create a
composite key with the normal key and a time stamp. This will create some
memory build up as objects are updated and no longer create cache hits.
For example. You have that fragment. It is cached. Then someone updates
the post. You now have two versions of the fragment cached. If there are
10 updates, then there are 10 different versions. Luckily for you, this
is not a problem for memcached! Memcached uses a LRU replacement policy.
LRU stands for Least Recently Used. That means the key that hasn't been
request in the longest time will be replaced with new content needs to
be stored. For example, assume your cache can only hold 10 posts. The
next update will create a new key and hence new content. Version 0 will
be deleted and version 11 will be stored in the cache. The total amount
of memory is cycled between things that are requested. There are two
things to consider in this approach. 1: You will not be able to ensure
that content is kept in the cache as long as possible. 2. You will never
have to worry about expiring things manually as long as timestamps are
updated in the model layer. I've found it is orders of magnitude easier
to add a few :touch => true's to my relationships than it is to
maintain sweepers. More on sweepers later. We must continue exploring
cache keys.

Rails uses auto-expiring cache keys by default. The problem is they
are not mentioned at all the documentation or in the guides. There is
one very handy method: ActiveRecord::Base.cache_key. This will
generate a key like this: posts/2-20110501232725. This is the
exact same thing we did ourselves. This method is very important
because depending on what type of arguments you pass into the cache
method it will be called on them. For the time being, this code is
functionally equal to our previous examples.

The cache helper takes different forms for arguments. Here are some
examples:

If an Array is the first arguments, Rails will use cache key expansion
to generate a string key. This means calling doing logic on each object
then joining each result together with a '/'. Essentially, if the object
responds to cache_key, it will use that. Else it will do various
things. Here's the source for expand_cache_key:

This is where all the magic happens. Our simple fragment caching example
could easily be converted into an idea like this: The post hasn't
changed, so cache the entire result of /posts/1. You can do with this
action caching or page caching.

Moving on to Action Caching

Action caching is an around filter for specific controller actions. It is
different from page caching since before filters are run and may prevent
access to certain pages. For example, you only want to cache if the user
is logged in. If the user is not logged in they should be redirect to
the log in page. This is different than page caching. Page caching
bypasses the rails stack completely. Most web applications for legitimate
complexity cannot use page caching. Action caching is the next logical
step for most web applications. Let's break the idea down: If the post
hasn't changed, return the entire cached page as the HTTP response, else
render the show view, cache it, and return that as the HTTP response. Or
in code:

Declaring action caching is easy. Here's how you can cache the show
action:

Now refresh the page and look at what's been cached.

Now that the show action for post #2 is cached, refresh the page and see
what happens.

Damn. 16ms vs 1ms. You can see the difference! You can also see Rails
reading that cache key. The cache key is generated off the url with
action caching. Action caching is a combination of a before and around
filter. The around filter is used to capture the output and the before
filter is used to check to see if it's been cached. It works like this:

Execute before filter to check to see if cache key exists?

Key exists? - Read from cache and return HTTP Response. This
triggers a render and prevents any further code from being
executed.

No key? - Call all controller and view code. Cache output using
Rails.cache and return HTTP response.

Now you are probably asking the same question as before: "What do we do
when the post changes?" We do the same thing as before: we create a
composite key with a string and a time stamp. The question now is, how do
we generate a special key using action caching?

Action caching generates a key from the current url. You can pass extra
options using the :cache_path option. Whatever is in this value is
passed into url_for using the current parameters. Remember in the
view cache key examples what happened when we passed in a hash? We got a
much different key than before:

Rails generated a URL based key instead of the standard views key. This
is because you may different servers and things like that. This ensures
that each server has it's own cache key. IE, server one does not collide
with server 2. We could generate our own url for this resource by doing
something like this:

This will generate this url:

Notice the '?tag=23481329847'. Look familiar from anywhere? Rails uses
this method to tag GET urls for static assets. That way the browser does
not send a new HTTP request when it sees 'application.css?1234' since it
is caching it. We can use this strategy to with action caching as well.

This calls url_for with the parameters already assigned by it through
the router and whatever is returned by the block. Now if you refresh the
page, you'll have this:

And volia! Now we have an expiring cache key for our post! Let's dig a
little deeper. We know the key. Let's look into the cache and see what
it actually is! You can see the key from the log. Look it up in the
cache.

It's just a straight HTML string. Easy to use and return as the body.
This method works well for singular resources. How can we handle the
index action? I've created 10,000 posts. It takes a good amount of time
to render that page on my computer. It takes over 10 seconds. The
question is, how can we cache this? We could use the most recently
updated post for the time stamp. That way, when one post is updated, it
will move to the top and create a new cache key. Here is the code
without any action caching:

Now with action caching:

Here's the code for action caching:

These are simple examples designed to show you who can create auto
expiring keys for different situations. At this point we have not add to
expire any thing ourselves! The keys have done it all for us. However,
there are some times when you want more precise control over how things
exist in the cache. Enter Sweepers.

Sweepers

Sweepers are HTTP request dependent observers. They are loaded into
controllers and observer models the same way standard observers do.
However there is one very important different. They are only used
through HTTP requests. This means if you have things being created
outside the context of HTTP requests sweepers will do you know good. For
example, say you have a background process running that syncs with an
external system. Creating a new model will not make it to any sweeper.
So, if you have anything cached. It is up to you to expire it.
Everything I've demonstrated so far can be done with sweepers.

Each cache_* method has an opposite expire_* method. Here's the
mapping:

caches_page , expire_page

caches_action , expire_action

cache , expire_fragment

Their arguments work the same with using cache key expansion to find a
key to read or delete. Depending on the complexity of your application,
it may be very to use sweepers or it may be impossible. Our simple
examples can use sweepers easily. We only need to tie into the save
event. For example, when a update or delete happens we need to expire
the cache for that specific post. When a create, update, or delete
happens we need to expire the index action. Here's what a the sweeper
would look like:

I will not go into much depth on sweepers because they are the only
thing covered in the rails caching guide. The work, but I feel they are
clumsy for complex applications. Let's say you have comments for posts.
What do you do when a comment is created for a post? Well, you have to
either create a comment sweeper or load the post sweeper into the
comments controller. You can do either. However, depending on the
complexity of your model layer, it may quickly infeasible to do cache
expiration with sweepers. For example, let say you have a Customer. A
customer has 15 different types of associated things. Do you want to put
the sweeper into 15 different controllers? You can, but you may forget
to at some point.

The real problem with sweepers is that they cannot be used once your
application works outside of HTTP requests. They can also be clumsy. I
personally feel it's much easier to create auto expiring cache keys and
only uses sweepers when I want to tie into very specific events.

Now you should have a good grasp on how the Rails caching methods work.
We've covered how fragment caching uses the current view to generate a
cache key. We introduced the concept of auto expiring cache keys using
ActiveRecord#cache_key to automatically expire cached content. We
introduced action caching and how it uses url_for to generate a cache
key. Then we covered how you can pass things into url_for to generate
a time stamped key to expire actions automatically. We've skipped page
caching because it's not applicable to many Rails applications. Now that
we understand how caching works we can address shortcomings in the
system.

Moving Away from the HTTP Request

Now we're going to write some code to address problems in the Rails
caching system. We know that action caching is dependent on URLS.
Fragment caching is dependent on the view being rendered. However, we
know that both of these methods use Rails.cache under the covers to
store content. We can use Rails.cache any where in our code. Unlike
caches_path, caches_action and cache that will no hit the cache
if perform_caching is set to false, the Rails.cache methods will
always execute against the cache. Ideally, it would be nice to
create a simple observer for our models. What it would be cool if we had
a class like this:

Then we can use that utility class anywhere in our code to expire
different things we have cached. First, we need to be able to generate
URL's from something other than a controller. You may be familiar with
this problem. Mailers are not controllers, but you can still generate
URL's. You need a host name to generate paths. The controller have this
information because they accept HTTP requests which have that
information. Mailer do not. That's why the host name must be configured
in the different environments. We can create a frankenstein class that
takes parts of ActionMailer to generate URLS. Once we can generate URL's
we can expire pages and actions. URL generation is included this module
Rails.application.routes.url_helpers. That's a shortcut method for the
generated module which contains url_for, path_for and all the named
route helpers. We also need a class level variable for the host name.
Here's what we can do so far:

Now we can pull in some knowledge on how the cache system works to fill
in the gaps. Some of this comes from reading the various source files
and observation in generating the cache keys. Here is the complete
class:

Since action and fragment caching all use Rails.cache under the hood, we
can simply generate the keys ourselves and remove them manually--all
without the fuss of HTTP Requests. Now you can create an initializer to
define a method on your application namespace so it's globally
accessible. I like this way because it's easy to reference in any piece
of code.

Now we can merrily go about our business expiring cached content from
anywhere. Here are some examples:

The expire_fragment and expire_action methods work just like the
ones described in the Rails guides. Only difference is, you can use them
anywhere. Now we can easily call this code in an observer. The observer
events will fire every time they happen anywhere in the codebase.
Here's an example. I am assuming a todo is created outside an HTTP
request through a background process. The observer will capture the
event.

The beauty here is that we can use this code anywhere. If you have more
complicated cache expirations you may have to use a background job. This
may not be acceptable because of processing time, but in some situations
you can afford a sweeping delay if the sweeping process takes a long
time. You could easily use this code with DelayedJob or Resque if
needed. After all, the generated rails code does reference a cache
observer--now you know how to write one.

Tagged Based Caching

This is an approach I came up with to work in this situation:

Maintain control over how long things are cached

Large number of different associations. Actions or fragments no
longer related to a specific resource.

Content could be invalidated through HTTP requests or any number of
background process.

Hard to maintain specific keys. I thought of it as "resources".

There is a ton of cached content in the system. Many different actions
and fragments. There was also a cache hierarchy. Expiring a specific
fragment would have to expire an action (so a cache miss would occur
when a page was requested thus, causing the new fragment to be
displayed) while other things on pages are still cached. One question to
ask, is how can I expire groups of things based on certain events? Well,
first you need a way to associate different keys. Once you can associate
different keys, then you can expire them together. Since you're tracking
the keys being sent to Rails.cache, you can simply use Rails.cache
to delete them. All of this is possible through one itty-bitty detail of
the Rails caching system.

You may have noticed something in the Cache class in the previous
section. There is a second argument for options. Anything in the
option argument is passed to the cache store. This is where can tie in
the grouping logic. Also, since action and fragment caching use the same
mechanism to write to the cache, we simply have to override the
write_fragment method to add our tagging logic.

Through all of this trickery, you'll be able to express this type of
statement:

The content could from anywhere, but all you know is that's stale.

This is exactly where Cashier comes
in. It is (my gem) that allows you associate actions and fragments with
one or more tags, then expire based of tags. Of course you can expire
the cache from anywhere in your code. Here are some examples:

Then you can expire like this:

All this is possible through this module:

I higly recommend you checkout Cashier.
It may be useful in your application especially if you have complicated
relationships with high performance requirements.

Caching Complicated Actions (or Methods)

Let's say you have an index action. However, it's more complicated than
a normal scaffold index. The user can search, filer, sort and apply
different query options. Think for example a form build with MetaWhere
or Sunspot. There are infinite number of combinations, but the data is
always the same. That is, a search for "EC2" will always have the same
results as another search for "EC2" as long as the underlying data
hasn't changed. We could easily cache the index action if we could
figured how to represent each unique combination of input parameters as
a key value. Memcached also has a key length limit. I don't know what it
is off the top of my head, but you should try to keep the key short.
How can we do this? We use a cryptographic hash. A cryptographic
hash is guaranteed to be unique given a unique set of input parameters.
This means there no collisions.

The Ruby Standard Library comes with SHA1. SHA1 is good hashing function
so we'll have no problems using it for these examples. It takes a string
input and generates a hash. We'll create a composite key with a
timestamp and string representation of the input parameters.

That will cache every combination of input parameters you can throw at
it. This is perfect for actions with pagination as well. It's perfect
for anything that uses the same underlying data based on input
parameters. This can save your bacon if a search takes a few seconds. If
one user just did the same search, the second user won't have to wait at
all. Hell, they might even be impressed.

Bringing Caching into the Model Layer

Caching isn't just for views. Some DB operations or methods make be
computationally intensive. We can use Rails.cache inside the models to
make them more efficient. Let's say you wanted to cached the listing of
all the top 100 posts on reddit.

I've used the most_recently_updated method a few times. It is not a
defined method, but a method named so that you understand what it is
doing. We can use these concepts to do more fun stuff. My main project
has companies and customers. An account has many customers and
companies. It's typical that I need to retrieve all the customers per an
account. This can be 10000 records. That takes time. ActiveRecord
instantiation on that order is not free. However, I only care about
customers or companies in the scope of a specific account. That means, I
only use the account and customers/companies association. Rails gives
you the ability to specific a different attribute for :touch on
belongs_to. I use this to my advantage to create an
association_name_updated_at column. Then specify :touch =>
association_name_updated_at. Here's how it looks in code:

That gives me a timestamp I can use to generate all keys. Now I can use
Rails.cache to fetch different queries and keep them all cached. You can
wrap this functionality in a module and include in other associations.

all is a method that takes many options. We don't really care what's
passed in, we just need to be able to generate a cache key based on the
input parameters. Since we know when the association was last updated,
the method will return fresh content depending if records have been
modified. Include the extension in your association and you're on your
way!

These are just examples of what you can do with caching in the model
layer. You could even write the type of cached finder extension for
ActiveRecord::Base. This is different from SQL caching since it only
persists through request--this is cached throughout the entire
application.

CSRF and form_authenticty_token

Rails uses a CSRF
(Cross Site Request Forgery) token and a form authentic token to
protect your application against attacks. These are generated per
request and each pages get unique values each time.
protect_from_forgery is added by default to ApplicationController.
You may have run into the problem before. You may have tried to submit
a POST and received an Unauthorized response. This is the
form_authenticity_token in action. You can fiddle with it and see what
happens to your application.

These tokens cause problems (depending on what Rails version) you're
using with cached HTML. Caching a page or an action with a form may
generate unauthorized errors because the tokens were for a different
session or request. There are parts of the cached pages that need to be
replaced with new values before the application can be used. This is a
simple process, but it will take another HTTP request.

You'll need to create a controller to server up some configuration
related information that's never cached. That way, a cached action will
load, then a separate request will be made for correct tokens.

NOTE: You may run into more problems with on Rails 2. This is because
Rails 3 uses a form authenticity token and CSRF in a meta tag in the HEAD
of the document. This is for AJAX requests. You may notice the rails.js
file appends them to all AJAX requests. Forms submitted with AJAX with
something like $(form).serialize() will send the
form_authenticty_token since it's automatically included in all forms
generated with form_for or form_tag.

You need to create a new controller that responds_to JavaScript and
return some JS for the browser to evaluate. Here's how you can replace
the information in the meta tag for Rails 3. You can also use this
logic to update all form_authenticty_token inputs on the page.

Dealing with Relative Dates (or other content)

Many Rails applications use distance_of_times_in_words throughout
their application. This can cause major problems for any cached content
with a data. For example, you have a fragment cached. That fragment was
cached 1 month ago. 2 months ago, it's still in the cache. Since you
stored a relative date in the cache, the fragment contains '1 month
ago'. This is no good. You can solve this problem easily with
JavaScript.

JavaScript is better for handling dates/times than Rails is. This is
because Rails needs to know what the user's time zone is, then marshal
all times into that time zone. JavaScript is better because it use the
local time zone by default. How often do you want to display a time in a
different zone than user's current locale? You can dump the UTC
representation of the date into the DOM, then use JS to parse them into
relative or something like strftime. I've encapsulated this process in
a helper in my Rails applications. Once all the data is in the DOM, you
can do all the parsing in JavaScript.

Then, when the page loads you can use a library like date.js to create
more user friendly dates.

Time to Cash Out

I've covered a ton of material in this article. I've given a through
explanation of how all the Rails cache layers fit together and how to
use the lowest level to it's full potential. I've provided a solution
for managin the cache outside the HTTP request cycle as well as shown
you how to bring caching into the model layer. This is not the
be-all-and-all of caching in Rails. It is a indepth look at caching in a
Rails application. I'll leave you with a quick summary of everything
covered and some few goodies.

Page Caching

The honest to goodness best caching ever. Bypass Rails completely.

Usually not applicable to any web application. Have a form? No good,
the form_authenticity_token will be no good and Rails will reject
it.

Action Caching

Most bang for the buck. Can usually be applied in many different
circumstances.

Uses fragment caching under the covers.

Generates a cache key based off the current url and whatever other
options are passed in

Get more mileage by caching actions with an composite timestamped
key.

Fragment Caching

Good for caching reusable bits of HTML. Think shared partials or
forms.

Use a good cache key for each cache block.

Don't go overboard. Requests to memcached are not free. Maximize
benefits by caching a small number of large fragments instead of a
large number of small fragments.

Use auto expiring cache keys to invalidate the cache automatically.

General Points

Don't worry about sweepers unless you have too.

Understand the limitations of Rail's HTTP request cycle

Use cryptographic hashes to generate cache keys when permutations of
input parameters are invloved.

Don't be afraid to use Rails.cache in your models.

Only use sweepers when you have to.

Tagged based caching is useful in certain situations.

Conslidate your cache expritation logic in one place so it's easily
testable.

Test with caching turned on in complex applications.

Look into Varnish for more epic
wins.

belongs to with :touch => true is your friend.

Use association timestamps

Spend time upfront considering your cache strategy.

Be weary of examples with expire by regex. This only works on cache
stores that have the ability to iterate over all keys. Memcached
is not one of those.