2014-01-18

You’ve built a website. It was fun, and it feels rewarding to see all those visitors pour in. The traffic increases slowly, until one day, someone posts a link to your app to Reddit and Hacker News, the planets somehow align, GitHub is down or something and for some reason, people notice the post and storm in, breaking all barriers of reason and logic.

Your server chokes, and everything dies. Instead of getting new customers or regular visitors during this epic peak (epeak?), you’re now left with a blank page, scrambling about as you try to get it up and running again, to no avail – even after restarting, the server can do nothing differently to survive the load. You lost traffic – for good.

No one can anticipate these traffic spikes. Very few of us plan so far in advance, unless we’re setting up to build a highly funded project that’s expected to do very well in a fixed time frame. How, then, does one avoid these problems? Two aspects need to be considered: optimization and scaling.

Optimization

We’ve written about this before, and a more advanced article is coming up next week, but the usual advice applies – upgrade to the latest version of PHP (currently 5.5, has a built in OpCache), index your database, cache your static content (seldom changed pages like About, FAQ and similar), etc.

One particular aspect of optimization that can be done is not only caching static resources, but also serving anything static through a non-Apache server like Nginx, optimized for serving static content. You put a layer of Nginx in front of your Apache, tell it to intercept the requests for static resources (i.e. *.jpg, *.png, *.mp4, *.html…) and serve them directly instead of letting the request move on to the Apache application layer. Such a setup is called a reverse proxy (also sometimes identified with a software load balancer – see below), and you can find out more about implementing it here.

That said, there’s nothing like scaling.

Scale

There are two types of scaling – horizontal and vertical.

We say that a website is scalable when it can manage increases in traffic without needing software changes.

Vertical scaling

Imagine having a server serving the web app in question. The server has 4GB of RAM, an i5 CPU, and a 1TB HDD. It performs its function well, but to better tolerate a higher influx of traffic, you decide to replace the 4GB of RAM with 16GB, you put in an i7 CPU, and you add a PCIe SSD/HDD Hybrid drive. The server is now much more powerful and can handle a higher load. This is known as vertical scaling, or “scaling up” – you improved the machine to make it more powerful. In other words, this happens:



Horizontal scaling

On the other side of the spectrum, we have horizontal scaling. In the example above, the upgrade itself will likely cost as much as, if not more than, the starting machine on its own. This is costly, and often doesn’t produce the benefits we need – most of the scaling problems are related to concurrency, and if there aren’t enough cores to perform the logic fast enough, no matter how strong the CPU, the server will grind to a halt and force some visitors to wait.

Horizontal scaling is when you build a cluster of (often weaker) machines linked together to serve the website. In this case, a load balancer is used – a machine or program whose only role is determining to which machine it should send the request it intercepted. The machines in the cluster then automatically divide the workflow among themselves without even being aware of one another, and your site’s traffic capacity increases immeasurably. This is also known as “scaling out”.

There are two main types of load balancers – hardware and software. Software load balancers are installed on a regular machine and accept all traffic, routing it to the appropriate handler. Nginx can be one such load balancer in the case above under “Optimization” – it intercepts requests for static files, and serves them on its own, without burdening Apache with them. Another popular software load balancer is Squid, one I’ve personally used in my company extensively and one which provides truly deep control of all aspects via a user friendly interface.

Hardware balancers are dedicated machines with the sole purpose of being load balancers – no other software is usually installed on them. Some of the most popular ones designed for handling immense amounts of traffic can be read about in this list.

In horizontal scaling, this happens:



Note that the two are not mutually exclusive – you can scale up a machine (also called a node) in a scaled out system, too. In this article, we’ll be focusing on HZ scaling due to it generally being the better (both cheaper and more efficient) choice, albeit more difficult to implement.

Challenges with data sharing

There are several tricky issues to overcome when scaling PHP applications. One such issue is database bottlenecking (something we’ll cover in Part 2) and another is managing session data – surely if you log in on one machine, you’ll be logged out if the load balancer redirects you to another machine in your next request, right? There’s a way around this – you can either share the local data between machines, or you can use a persistent load balancer.

Persistent load balancer

A persistent load balancer remembers where it previously redirected the client, and does the same thing on his next request. So if I visit SitePoint and log in, the load balancer redirects me to, say, Server1, remembers me, and my next click after logging in will also be redirected to Server1. Naturally, this all happens transparently. What if Server1 goes down, though? Yes, all session data is lost – I’m logged out, and I need to start over on another server. This is a needless interruption of user experience. What’s more, the load balancer has so much to do now (not only redirecting hundreds of thousands of people to various servers but also remembering where it sent each of them), it has become the bottleneck and might benefit from some scaling out on its own. But if one LB then crashes, is the remembered data about clients and the servers they were sent to lost as well? WHO WATCHES THE WATCHMEN? The situation reeks of an obvious catch 22.

Sharing local data

Sharing the session data across the entire cluster definitely seems like the go-to approach, and while it does require some architecture changes in your app, it’s well worth it – there is no bottleneck, and the entire cluster is fault tolerant – one server’s demise is completely irrelevant to the rest and isn’t even noticed (by the machines, of course – the humans in charge of them hastily replace the hardware as soon as the fault occurs).

Now, we know session data is stored in the $_SESSION superglobal in PHP, and we know that the $_SESSION superglobal reads and writes from and to a file on disk. If said disk is in one server, though, it’s obvious that other servers have no access to it. How, then, do we make it available across several machines?

First, note that session handlers can be overridden in PHP – you can define your own class/function to handle session management. For more information on how that’s done, please see the docs.

Using a database

Using a custom session handler, we can make sure the session data is always stored in a database. The database should be on a separate server (or cluster of its own!), so the servers being load balanced from the original story are serving just the business logic. While this approach often works well, on truly high traffic incidents the database becomes not only the single point of failure (lose that, and you lost everything), it also causes a significant connection overhead due to having to connect to the various servers writing session data to it at all times. It becomes the new bottleneck, and could use some scaling out, which is another problem when using traditional databases like MySQL, Postgre and similar (covered in Part 2).

Using a shared file system

You might be tempted to set up a network file system to which all servers can write their session data. Don’t. This is the absolute worst approach, prone to corruption and data drops and is extremely slow. It’s also a single point of failure, much like the database aspect above. Activating it is as simple as changing the session.save_path value in php.ini, but it’s highly recommended you use a different approach. If you really insist on using a shared file system, it’s much better to use a solution like GlusterFS.

Memcached

You can use memcached to store session data in RAM. This is arguably unsafe because data in memcached gets overwritten as space runs out, and there’s no persistence – remembering someone’s login will only last for as long as the memcached server is running or has room to remember it. You might be wondering – but isn’t RAM separate on each machine? How does that apply to a cluster? Memcached has the ability to virtually pool the available RAM from multiple machines into one large whole.



The more machines you have, the bigger the pool gets as you dedicate extra RAM to it. You don’t have to give a machine’s RAM to the pool, but you can, and you can give arbitrary amounts from each. So a good chunk of RAM remains on the machine for regular use, while the rest is donated to cache, helping you not only store session data cluster-wide, but also allowing you to cache other content as you see fit – as long as there’s room. Memcached is a great solution, and this approach has widespread adoption.

Usage in PHP apps is as simple as changing some php.ini values:

Redis Cluster

Redis is an in-memory NoSQL data store, much like Memcached, but supports persistence and more complex data types than just string based key => value pairs. It doesn’t have cluster support yet, though, so implementing it in HZ scaling solution is not as straightforward as one might think, but it’s getting there. In fact, an alpha version of their clustering solution is already out and can be used: http://redis.io/topics/cluster-tutorial. For a more in depth comparison between Memcached and Redis, see this StackOverflow answer. Compared to a typical caching solution like Memcached, Redis is more like a Memcached-turned-proper-database.

Other solutions

ZSCM from Zend is an alternative, but requires Zend Server on every node in the cluster.

Other NoSQL stores and caching systems would work – try solutions like Scache, Cassandra or Couchbase, all blazingly fast and reliable.

Conclusion

As you can see, horizontally scaling PHP web apps is no picnic. There are various hurdles to overcome, and the solutions are not easily interchangeable – more often than not it’s all about picking one and sticking with it for better or worse, because by the time the traffic rolls in, it’s too late to make a smooth transition to something else.

I hope this short guide helped you decide on the best approach for your company, and if you’ve got alternative solutions or suggestions, we’d love to hear about them in the comments below. In Part 2, we’ll cover database scaling.

The post Horizontal Scaling of PHP Apps, Part 1 appeared first on SitePoint.

Show more