The following is adapted from a talk given by Matt Williams at nginx.conf 2015, held in San Francisco in September. This blog post is the first of two parts, and is focused on load balancing; the second part, focused on caching and monitoring, will be uploaded soon. You can view the presentation slides or watch a video of the talk, including both parts.
My name is Matt Williams, and I’m the evangelist at Datadog. Datadog is a SaaS-based monitoring platform. We load up an agent on each host that you want to monitor and start collecting data on everything that’s going on, including metrics from NGINX, or Apache, or MySQL, or PostgreSQL, or the OS, or whatever, then bring it together on a nice dashboard. But I’m not here to talk about Datadog. Today we’re going to focus on “Scaling Web Applications with NGINX Load Balancing and Caching”.
This session is not going to tell you what the #1 most optimal configuration for load balancing and caching is, because there is no such thing. It totally depends on your environment. It totally depends on what you’re serving, what do the assets look like, what do the pages look like, and what you are doing with NGINX. So there’s no way I can give you “the one thing that works for everyone“ because it doesn’t exist.
But what I can do is go over an overview of all the features and configuration options available with load balancing and caching. Then talk about how to test and verify that, any time you make changes, they’re actually doing something good.
Table of Contents – Part I, Load Balancing (this post)
1:45 Benefits of Load Balancing/Caching
2:58 Load Balancing Methods
7:02 Which Method Should You Choose?
10:58 FYI Load Balancing
15:05 How to Ensure Session Persistence
Table of Contents – Part II, Caching and Monitoring (to come)
17:08 Caching
19:33 FYI Caching
21:09 FYI Tuning
23:46 How to Find the Right Configuration
25:17 Why Monitor
26:30 Datadog
27:50 Nginx Monitoring Tools
28:40 Tools to Test With
29:45 Key Metrics
30:29 Active Connections (total and per upstream)
31:00 Dropped Connections
31:33 Requests per Second
32:14 Error Rates
33:44 Request Processing Time
34:28 Available Servers per Upstream
35:15 Scaling Web Applications
1:45 Benefits of Load Balancing/Caching
What’s the purpose of load balancing and caching? Really the purpose is to make the end-user experience a whole lot better. We’re going to accomplish that by reducing lag as we distribute the load over multiple servers and hopefully limit failure for the load that’s coming in from users around the world. And then no single web server should be overloaded once we put all this in place.
At least in theory. You could still be super popular and overload stuff, but hopefully you’ll have something in place that allows for handling all that extra load. And then to the user it just feels like they connect to one server and everything works like magic.
Caching adds to this by basically taking the burden of serving static assets away from the web server and moving it to a more dedicated server that can really focus on doing that.
2:58 Load Balancing Methods
When setting up load balancing on NGINX, there are 5 major methods. The first four are available in both open source NGINX and NGINX Plus, while the last one is an NGINX Plus feature. On this slide and later ones NGINX Plus features are noted with a “+”.
Round Robin
This one is fairly self explanatory and is the default load balancing method.
You set up an upstream block and within it define the group of servers to be load balanced. The first request goes to webserver1, the second request to webserver2, and third request to webserver3; 1-2-3, 1-2-3, and so on.
In addition to simple Round Robin, you can also set up weighted Round Robin. A weight can be specified for each one of those servers. (IP Hash and Hash can be optionally weighted too; see the NGINX blog post on load balancing weights.)
Now the requests go to webserver1, webserver1, webserver2, webserver3, webserver1, webserver1, webserver2, webserver3, and so forth. Pretty cool and pretty easy.
Least Connected
Least Connected asks “Which upstream web server has the fewest connections?” Whichever server that is gets the next request. For example, if webserver1 and webserver2 have 10 active connections each, and webserver3 has only one active connection, the next 9 requests all go to go to webserver3, assuming the 10 connections to webserver1 and webserver2 stay open the whole time.
There are a couple of other load balancing methods that are used mostly to provide session persistence.
IP Hash
IP Hash takes the hash of the first three octets of an IP address and, every time a request comes in from that same IP address, it’s sent to the same web server, providing simple session persistence. In fact it’s a little more than session persistence — it’s basically “Lifetime of the web server persistence”, so every time I come back to this web server from this IP address I get the same upstream web server.
Sometimes that’s good, sometimes bad. For example, if you have a corporate intranet, everybody probably has the same IP address. Now they’re all going to be sent to the same web server, which is no longer a very effective load-balancing scenario.
Generic Hash
For situations where IP Hash doesn’t work, we have Generic Hash. With Generic Hash we can totally customize how requests are distributed. It could be a combination of IP address and also query variable or URL or something else. Every time all those parameters match it’s going to go to the same server.
Least Time
This fifth method is specific to NGINX Plus. Least Time is like a variation of Least Connected. Here the request is sent to whichever server has the least connections as well as the lowest response time. In other words, the request will go to whichever server has the fewest active connections and is responding most quickly. Like Round Robin, this can also be weighted. So you can add a weight = 2 or weight = 10 to get two times, or ten times, as many requests to a server.
7:02 Which Method Should You Choose?
Having these five ways to do load balancing is great, but which one do I choose? Well, it depends on what you’re serving out.
Round Robin works great if all the servers are identical, their locations are identical, and all the requests are maybe short-lived but reach the same length. It’s also the default, so if you don’t make any changes, that’s what you’re gonna get.
Least Connected is really good if, again, all servers are identical, and all servers are in the same location, but the sessions are variable length. For example, maybe I have three load balanced servers and the first request that comes in is really, really short. It gets processed super quick – maybe in just 10ms. The next request that comes in takes a long time to handle. The third one is really short again. The fourth one, also really short. The fifth one, really long. Somehow this keeps happening over and over again and server 2 keeps getting hit by all these huge requests. They’re lasting a long time while the requests going to other servers are really short.
Round Robin’s not really gonna work for that situation because as more connections come into this one box we could quickly get hundreds of connections on a single box, and we could start approaching the maximum number of connections that this box can handle on its own. Especially if you’re buying a super cheap Amazon EC2 instance.
As I reach that maximum number, this server’s going to take longer and longer to process each request. Or it might start serving out 500 errors, which is not gonna be a good user experience for that end user.
Least Connected avoids that problem by saying if there’s a server getting a bunch of short-lived requests, that server should handle the next request that comes in, not the server that already has two or two hundred existing connections.
For simple session persistence, IP Hash uses the first 3 octets of the client IP address to route the request, and Generic Hash expands on that idea, allowing customized parameters to be used to route the request.
Least Time, an NGINX Plus feature, is interesting because the servers can be in different places, have different configurations, they could be big servers, small servers, there could be variable length sessions, and NGINX Plus is still going to be able to handle all that. The way it does this is by using health checks. That’s an NGINX Plus feature where the load balancer sends requests to upstream webservers.
Health checks are a kind of back channel conversation going on between the load balancer and each of the upstream web servers, and that back channel conversation is just asking “Hey, are you healthy? Are you there? Everything all good?”
That health check can go to the main URL, or could go to a special URL that you define for health monitoring. This way I could have a health check that not only verifies that the server is up and running and processing things quickly like it should, but also that the SQL database behind it is all good. If all of that is working well, then report back to the load balancer that “Everything’s good, I’m still here, I’m all good”, but, for instance, if that database is gone, then there’s a problem, and this webserver shouldn’t be available for requests.
Those are some different reasons why you might choose one method versus another.
10:58 FYI Load Balancing
Some things to keep in mind when it comes to Load Balancing.
First, the load balancer server will drop any empty headers it gets. Headers with underscores are dropped as well. If your upstream web servers are relying on headers that have underscore in them, they’re not going to get them unless you configure your load balancer with the underscores_in_headers directive to make that available.
The load balancer will also rewrite the Host header to basically make it seem like this request is actually coming from the load balancer itself. This is totally configurable; you can change this to whatever you like.
Hash methods, which we’ve talked about already, result in a kind of lifetime persistence. Sessions last as long as the upstream web server..
Well, servers might come up and go down. When a server goes down, you don’t want it to still be in that upstream pool of servers, so if you’re using NGINX Plus, you can handle that with health checks. Health checks use a back channel to verify the server is live and working as it should.
If you don’t have NGINX Plus, then you can rely on max_fails and fail_timeout. max_fails is the maximum number of failures that are allowed within a time period and fail_timeout is that time period. For example, I could say max_fails = 3, fail_timeout = 90s. Then, if I see 3 failures within 90 seconds, that server will no longer be served requests. Every 90 seconds it will check again: “Oh is it there? Nope, still down. Oh is it there? No, still down. ” That’s another way to make sure that the servers that should be there are up and running.
Now if I’ve got a load balancer server and a bunch of webservers behind it, the reason I put the load balancer there is to avoid the problem that happens if I have just one box. If I have just one box and I reach a certain number of connections, that box might start serving out 500 errors, and I want to avoid that. But even if I have 4 load balanced web servers and all of a sudden my website gets 4 times more popular, I still end up running against that same threshold. So I might want to set a maximum number of connections per web server. With NGINX Plus, I can set that maximum number to something like 500 connections. If I go over that, then the request gets added to the queue, and will be sent to the web server when the number of connections drops below 500.
Proxy buffers is something you’re gonna want to look at to smooth out speed differences between client and upstream connections. There’s really two types of connections we’re dealing with in load balancing. The client to load balancer is one part, and then the load balancer to the upstream server is another part. Chances are the load balancer to the upstream connection is super fast because they’re probably in the same data center. But that client to load balancer connection, is often going to be significantly slower. To deal with that difference in speed, Nginx uses proxy buffers.
There’s a lot of settings available to configure the proxy buffers. If you’re inside a corporate intranet where all the users are all in the same place and connections are super fast, then you probably won’t need those proxy buffers. You may be able to turn them off and get a little bit better performance. But if your users are out in the real world you probably do need those proxy buffers. There are eight to ten different settings around setting up proxy buffers to make them just right for your environment.
15:05 How to Ensure Session Persistence
We’ve already talked about two simple ways of ensuring session persistence – IP Hash and Generic Hash. These are kind of a crude way to do session persistence where the load balancer says, “Anytime I get the same IP address, or the same IP address and URL and query variables and all of that, send it to the same server”.
But if we’re using NGINX Plus, there are also some other ways. First, there’s Cookie Insertion. This is where any time the load balancer sees a new request come in, it injects a new cookie that says, “Hey this is a brand new request; let’s give it session number 1234”. Then every time another request comes in from that same client, it’s gonna send that same cookie over and the load balancer is going to say “oh this is session 1234 again” and hand it off to the same server.
There’s also Learn, which is like cookie inspection, where you tell NGINX, “OK, look out for this particular cookie, and this particular parameter, for instance a session ID, and when you see that, make sure that all sessions with the same ID go to the same server.”
Sticky Route – kind of a similar idea to generic Hash. You can read up more about what makes Sticky Route and Hash different.
One other session-persistence feature with NGINX Plus is draining. Let’s say there are a bunch of connections that are coming in, but you know that this one server is going to come down for maintenance. You can start draining the sessions. When you turn that on, all new requests are going to go to the other servers, but the existing sessions on that server are going to continue being served by that same server. Finally, when all the sessions are completed, I can bring down that server.
So that’s load balancing. Now let’s talk a little about caching.
This blog post is the first of two parts, and is focused on load balancing. A blog post based on the second part of the talk, focused on caching and monitoring, will be added soon. You can view the presentation slides or watch a video of the talk, including both parts.
The post Scaling Web Applications with NGINX – Part I: Load Balancing appeared first on NGINX.