More Thoughts on Building Scalable Web Applications

A poorly written web application will perform poorly no matter the framework or underlying language it uses.  It does not matter if it is Python, Java, PHP, ASP.NET, Erlang, or any of the frameworks that use those languages.  The skill, diligence, and knowledge of your application architects, framework implementers, systems administrators, and graphic designers all contribute the the efficiency of any web application.  

Regarding physical infrastructure, an ill designed web application will eventually perform poorly no matter how much money you throw at the underlying physical infrastructure.  Throwing RAM and Disk at a problem will only take you so far.  Have you heard this one?  RAM/CPU/Disk is cheap, if the app is slow we'll just add more.  Then, a little later you hear something like, I added 16GB of RAM and 4 Quad Core CPU's so why is my web site slow?  *pause for a chuckle*  You can put all the hardware in the world behind your application and if the web application is not designed or built to scale and it has any amount of success then it will eventually fail.  This is obviously a rather wasteful practice as well.  The sad underlying message here is that a system usually fails when you succeed the most in your efforts to sell and market your service.  That makes the failure that much worse for everyone involved.

A few definitions are in order.  It is these first four definitions that provide something of a base for other concepts.  It is this conceptual base that can help people create more scalable web applications.  These are my definitions, so they may be a little different than what you see in other places.  There's plenty of elbow room so don't panic.

Transaction - Not the DB kind exclusively.  This is simply a unit of work like a page load, a hit on a web server, a web service call, etc.

Performance - How much time it takes for a transaction (a transaction being some arbitrary unit of work) or group of transactions to complete.  This is how fast it is.  This is not equal to how scalable the application might be or not because while performance and scalability are related they are not the same thing.

Capacity - How many transactions a given system can complete in some meaningful time frame without adverse performance issues is it's capacity.  You scale by successfully adding capacity.

Scalability - The ability of an entire system (software, hardware, network, and all) to be able to accommodate increased capacity with the addition of more resources thereby executing more transactions without performance degradation.

I hope after reading those definitions the inter-relationships between them are somewhat obvious.  Every choice made along the way will cause an adjustment for better or for worse to the performance, capacity, and scalability of your application.  For the purpose of creating web applications that won't break your heart or your bank these are important concepts.

For a web application to scale the programmers, designers, and administrators must pay very careful attention to how the application is designed, the code is written, how graphics files are created, how the network and server infrastructure is put together, what are the shared resources, and how the database is designed and implemented.  Finally, when things break, and they will break, don't point fingers.  Get your team together and inspire teamwork to get through the problems.  Scaling web applications isn't easy or an exact science.  Web Applications run in a complex eco-system of servers, routers, switches, data centers, programming languages, operating systems, and egos.  If you are trying to design a scalable web application then, in particular, you should design for portability, cachability, and partitionability.  That leads me to my next three definitions/variables for the purpose of this article.

Portability - The ability of your application to be functional in a relative environment.  For example, can I just move it over to another server or into a different directory path easily?  Can I move it from a Linux box to a Solaris box to a Windows box and back again if i want?  I should be able to do those things.

Cachability - The ability of your application to properly leverage a variety of caching techiques to improve performance, capacity, and scalability.  Proper caching is critical to web application scalability.

Partitionability - The ability of your application or its data to be split/sharded/partitioned on some key object, like users, row numbers, or tables across multiple application and database instances.  Very few applications are capable of this as designed because very few people seem to understand the impact their early design decisions have on this item.

While it's often the first question asked; what programming language should I use?  It is of relatively little importance in the scalability arena.  Choosing your application framework and programming language is only part of the scalability puzzle so just pick the language and framework that meets the needs and for which you can effectively marshal the resources.  Be warned that having, "My XYZ language and framework is more scalable that your ZXY framework and language," are generally entertaining in a juvenile way but mostly pointless.

The areas I see that cause the most scalability issues day after day are mostly consistent.  Shared resource contention, silly or missing caching implementations, bloated or poorly implemented ORM (object relational mapping) layers improperly used, hand written SQL queries that defy all logic and common sense, horrifying database schemas, bloated DB tables, poor DB indexing, bad archiving, and a general disregard for the fact that computing resources in any given system are a finite resources that should be respected.

Other resources for further reading:
http://www.highscalability.com
http://www.royans.net/arch/2007/09/22/what-is-scalability/
Release It! by Michael Nygard
Building Scalable Web Sites: Building, scaling, and optimizing the next generation of web applications by Cal Henderson
Scalable Internet Architectures (Developer's Library) by Theo Schlossnagle

--
Kent Langley has been involved in building web applications since 1997 and is a Director at SolutionSet, LLC.  You can call him there if you need a website.  He is also the author of his own blog here at http://www.productionscale.com