One of the most mysterious parts of the BIG-IP Application Acceleration Manager (AAM) is caching. Rarely is it explained, and there are very few documents that describe why you would or would not use one of the BIG-IP's caching facilities.
Even harder to find is some kind of description of what numbers you should use, or whether or not to push some specific caching button when trying configure your AAM policies or applications. So here's an overview of a select few bits of frequently asked AAM caching questions, and some explanation of why you would or would not do something with those pretty buttons and number fields.
To be clear, AAM does not use fast Cache, it has two entirely separate and distinct caching systems of its own: Metastor and the Small Object Cache. In this posting, however, we'll be talking about them, mostly, as if they are one in the same.
The 4 most commonly asked questions we get regarding caching are as follows:
· Why is there an option to turn off cache on first hit, and why would I ever enable this?
· What does Queue Parallel Requests do?
· Why would I ever set the maximum object size to anything less than infinity?
· OK, a maximum object size makes sense, but what about the minimum object size?
Each question is addressed using an analogy of putting marbles into a mason jar. We are, of course, talking about web objects and bytes of data, not marbles and weight.
1) "Why is there an option to turn off cache on first hit, and why would I ever do so?"
OK, well, let's start with a simple mental model of a cache. Imagine your website as just a bunch of marbles. To keep it simple, all your marbles are the same size. Now think of a cache as being like a Mason jar. Imagine if the Mason jar is just big enough to hold exactly one marble. You can think of the BIG-IP as a super-fast copying machine that can copy marbles, and store one copy of one marble.
Finally, imagine a single user sending requests for marbles to your website through the BIG-IP, where every policy node has "Cache marbles on first hit" turned on, and every marble is cacheable, and cached if requested. Pretty simple, right?
If you have "Cache marble on first hit" turned on, then the very first request your user makes for a marble will cause the BIG-IP to turn around, get that marble from the website, copy it, put that copy into the Mason jar, and then hand the original marble to your user. At this point, the Mason jar is full.
If the next request your user makes is for a different marble, then the first marble must be removed from the jar in order to make room for the one just requested.
Sadly, the effort and time it took to copy and put the first marble into the Mason jar was entirely wasted, and the user got both of his marbles later, and slower than he would have if the BIG-IP had simply taken them from the website, and handed them to your user.
If the third request the customer makes is for the first marble, then again the Mason jar has to be emptied and the first marble cached (remember only a single marble can be cached at any time). The BIG-IP is churning away, copying then putting a marble into the Mason jar, then emptying out the Mason jar, but never actually getting any value out of having that Mason jar.
If the user keeps switching back and forth between requesting the first marble and the second marble, the jar will never have the marble being requested, and the load on the back end servers has not been reduced. This is considered a zero cache scenario where the benefits of the cache are moot.
But imagine if "Cache marble on first hit" is turned off. Now the same marble has to be requested twice before the BIG-IP will copy it and put the copy in the Mason jar.
So now, with the first request the BIG-IP does nothing but pass it along. However, the BIG-IP remembers that the blue marble was requested once. The second request also does nothing but pass the marble along, but again, the BIG-IP remembers that, say, a red marble was requested once. At this point, if the user goes back and asks for the blue marble again, it has been requested twice, so it will be copied and stored in the Mason jar.
If the user then asks for a green marble, the BIG-IP remembers that the request was made, but does not discard the marble in the jar, as this is the first request. If the user requests the blue marble again, then the user will get a copy of that from the Mason jar, not from your website. You now have an effective cache where 1 in 5 requests have been offloaded from the origin server.
In summary, turn off "Cache object on first hit" for policy nodes where the objects either change very quickly, or where the time between requests is relatively long. This will prevent the cache from discarding an object that your users will hopefully be requesting more often, and more frequently.
Obviously, the flip side of that coin is that the BIG-IP will have to get the same object from your website twice, so if you are sure that the objects matched by a particular policy node are really popular, and that they will be requested quite frequently, (such as the company logo and navigation buttons) then copy 'em and dump them in the cache the first time they are requested.
2) What is "Queue Parallel Requests" and why would I turn it on?
Queuing parallel requests is interesting, as it interacts with caching, but it really only helps when you have a lot of users trying to get the same marble at the same time, and that marble is being cached for the first time.
A cache is kind of stupid, and it doesn't remember the marbles it threw away. As a result, any marble being put into it looks like it is being stored "for the first time", even when it is actually being put into the jar for the hundredth time.
"Queue Parallel Requests" basically makes all the users who are requesting the same marble wait for it to be fetched off of your website, and then copied once for each user by the BIG-IP.
That doesn't sound too interesting or useful until you realize that if you don't turn this on, then between the time you start the process of requesting that marble from your website and finish putting it into the jar, every other request for that same marble will have to be forwarded to your website. Image a scenario where a server takes 2 ms to respond to a request for an object. Every ms 2 new users request the object. In the time it has taken the server to respond to the first request 3 additional requests would have been sent for the server to process.
This has created unnecessary demand on the servers. With queuing turned on all subsequent requests for the object will be placed into a parking area to wait for the original response to be returned and cached.
Four requests doesn’t sound like it will cause a server to be overloaded, but what if it isn’t 4 but 400 requests. Suddenly, queuing sounds like a better idea, right? It is, but like any other feature, it is not a panacea. Turn it on for new, shareable, highly popular objects that remain the same for a relatively long time.
More to the point, however, if the web server that is giving one marble to the BIG-IP to copy and give to a bunch of users hiccups (say, you decide to take down one of the web servers in your pool, or as luck would have it, one of them fails in the middle of hand over that marble), all of those users will get part of a marble, and that is all. You are trading less pool traffic for what our engineers like to call a "single point of failure" risk.
But if you have a really rare and valuable marble that everyone wants a copy of, all at the same time, and your website pool is pretty stable and handing out marbles pretty efficiently, then request queuing will really reduce the traffic on your web servers!
3) There is an option to set the minimum and maximum cacheable object size. Why would I ever set the maximum object size to anything less than infinity?
Yeah, that's a tough one. First, go read the answer to "Why turn off Cache content on first hit". Then, let's imagine a Mason jar where instead of one marble, we have a jar big enough to store one thousand marbles.
In this scenario, however, we are going to assume exactly 16 simultaneous users, and also that the marbles they are requesting are in the jar. Obviously, the web servers in your pool are getting zero requests. Cool, right!? When caching is working, it can be really handy!
But now let us change one assumption: let's allow your web site objects to vary in size. We still have 16 users, but there is one marble that is twice the diameter as the marbles in our first example. When this marble is cached it reduces the total number of marbles that can be cached. Only 13 of the original 16 requests can be served from the jar, the other 3 requests have to go to the server pool.
If every marble in the cache is twice the diameter of the marbles in our first example, twelve of the 16 requests being made have to go to your pool.
At the extreme, if one object completely fills the Mason jar, that marble (well, bowling ball, really!) is the only object that can be served from cache; the other 15 requests have to go to your pool.
So you limit the maximum size of the marbles that can be stored in your Mason jar to configure the BIG-IP to serve the average number of simultaneous users you expect, and wish to serve. By the emergent properties of the system, it turns out that large objects are often times not that popular, anyway. Unless you are running a web server whose job it is to serve large patch files to end users, that is.
4) OK, a maximum object size makes sense. So why have a minimum object size?
OK, now we have to get explicit about the jar, and about knowing what has been requested, copied, and stored in the jar. Assume that we have a peg board that has exactly one thousand holes in it. Each time we dump a marble in the jar, we write out a tag that describes the marble, tie it to a peg, then put that peg into the peg board.
When we remove a marble from the jar, we remove its associated peg from the board. When the peg board is full, we can't store any more marbles in the Mason jar.
Now, what if your minimum size is that of a grain of sand, but your mason jar is big enough to fit 100 marbles with a diameter of 2 inches? If what is popular, and requested quite frequently is a bunch of grains of sand, you can end up running out of peg board space long, LONG before you even finish coating the bottom of your Mason jar with sand.
Giving your customers copies of those grains of sand will happen often, but will by definition be a smaller percentage of the total volume of traffic than if you made your minimum size larger, AND if you still have enough marbles of that minimum size on your website to fill your cache.
Another way of looking at it is in terms of a collection of marbles of all sizes. If a large marble is in cache, and it has to be displaced to make room on the peg board for a tag that records the information for a grain of sand, and then the grain of sand has to be displaced to make room for the large marble, you will have to get both off of your origin servers.
If you don't try to cache the sand grain, then when a user asks for the larger marble, the total weight of marbles requested from your server is going to be smaller. Even if that grain of sand has to be served from your server several times in order to keep the larger marble in the jar, that will be a lot less total grams of marbles moved, copied and stored or retrieved from the jar.
Obviously, there is a trade off here between the number of requests, versus the total weight of the marbles being requested.
Putting it all together
Knowing when and what to cache is an important step to ensure that BIG-IP and your application is performing optimally. Setting a parameter with a wrong value can have negative effects causing increased traffic on your origin servers and consuming resources unnecessarily on BIG-IP. Think about what you are trying to achieve, what other optimization features are enabled and the traffic patterns of your site when configuring the cache settings.
Thank you to my colleague John Stevens for assistance in writing this article.