Scalable LAMP: Caching

New blog, new site, first post.  I've been wanting to write about some of my work and thoughts about scaling LAMP for a while.  But, every time I sit down to do it, it turns into a TOME that never gets finished.  So, I'm going to try a new approach and just put it up here piecemeal with no appologies or even the claim that it's "done."  I don't suspect I'll ever be done really.

I've decided to post some thoughts about caching first.  Anyone trying to scale LAMP beyond one or two servers should know that caching is a critical component of any stack.  There are many different types of caching.

Third party CDN (Content Delivery Networks) like Akamai, Level3, or some others can be extremely helpful for scaling your site and keeping performance in line. Also, doing a cost-benefit analysis of data center bandwidth and server costs versus the cost of a CDN will often show good bang for your buck at scale. You’ll find that the CDN costs might just be less than origin bandwidth costs. And, as a result of using one, you might just need less hardware overall at the origin data center.

Another type of cache is the reverse proxy.  Using a reverse-proxy in front of your application server for the serving static content can be a quick win in many cases.  Some reverse proxies, like varnish, actually claim to be "CMS accellerators."   In generaly, a PHP application server is not optimized to serve static content.  It's optimized to serve dynamic content generated by PHP pages.  The most glaring example is the use of a pre-fork model instead of a worker model.  That translates into large apache process to server static content which equals wasted resources.  Three good proxie to consider are Nginx, Varnish, and Squid. What you choose depends very much on how you intend to scale over time and what your overall systems architecture will be.

The next cache to discuss, a PHP Opcode cache, like APC or XCache. They both have their charms but I lean toward APC at the moment. Put simply, a PHP Opcode cache stores the compiled result of an executed PHP script in local server memory so that the system doesn’t have to compile the script every single time a different user requests that script from that server. This can be a very large payoff for many sites with very little change to the code base. I’ve personally seen increases between 30% - 90% depending on the scripts and the site setup in overall page rendering time for the end user.  Both APC and XCache support the storing of variables and other objects in the cache as well.  One big benefit of this is that it's possible to greatly reduce traffic to the underlying network file systems with these caches depending on how you use them.  That gives you more milage for your NAS.

When you need to reduce database or even file system load and you want to do it across larger clusters of servers you might want to push some of the caching you might do in an opcode cache off to a decidated cache.  Using memcached, distributed in-memory caching, to cache database queries and other objects can lend an absolutely huge scalability and perfomance boost to your web site. This is a huge win for scale for the DB layer and for disk access problems. Using memcached properly will dramatically decrease calls to disk and DB thereby increasing your scalability and performance in most cases.  Remember, the big difference between memcached and opcode caching is the context.  The opcode caches are local to a single server instance while the memcached is either local OR shared amongst many instances.

That wraps up this caching installment.  I've really only scratched the surface here but you get the idea.  I leave you with the mantra:

cache... Cache... CACHE!  then... in some cases, cache the caches.