Getting Rid of the Relational Database

In this article, which is written in the style I hope to follow for many of my posts here at ProductionScale, I will discuss the topic of getting rid of the relational database. Afterwards, I will follow up with a brief analysis or executive summary if you will of this infobit might mean for businesses. The summary is written to be accessible for the various business managers out there who just need to know what it all means. The topic today is getting rid of the relational database. Where appropriate, I have tried to include references to source materials in an accessible way also.

There is, what seems to be, a growing trend in scalability circles that the relational database model is simply the proverbial ball and chain in the relationship between scalable applications and the underlying infrastructure. The quest for seamless linear growth for technology applications is being hindered by the “elephant database.”1

What would Amazon do? In a recent talk2 at QCon London Werner Vogels, the CTO of Amazon.com clearly noted that the relational database model is a essentially outdated for the needs of modern applications as a primary data storage medium. In other words, it is simply to slow and cumbersome.

Additionally, Mr. Vogels makes a critical point that in many, many cases relational databases are simply not necessary. Simple key/value pairs (hashes) are all you need.

Recently a developer I work with had begun, when given memcached to play with, storing much more than I had originally intended in the cache. At first, when I found out, because he was dismayed that memcached didn’t like eating anything over 2MB, I just said, why are you putting “big” files there anyway? The answer to that question doesn’t matter in this context. What does matter is the question, why not? So, I thought, well, if you can just put everything in hashes in memcache the the DB is just a cache state backup in case you have to restart the thing. Interesting. Who needs a DB anyway? But, say you need to run more complex queries.

A recent architecture article I read by Todd Hoff on the website High Scalability3 discusses just this to a point. It says, “Move cpu-intensive work moved out of the database layer to applications applications layer: referential integrity, joins, sorting done in the application layer! Reasoning: app servers are cheap, databases are the bottleneck.” Ebay chose to move traditional relations DB work right up into the application layer. How interesting!

In a conversation between Margo Seltzer and Michael Stonebraker we begin to get an idea of why the relational database model is overly cumbersome. It boils down to a single word. Latency. By way if example using the techniques of bond arbitrage Stonebraker notes quite earnestly that it is a “latency arms race.” The arbitrager with the least latency in their system wins. What do they win? Money! So, the stakes are high. Stonebraker continues on to explain what I think is the most important part that it is not the latency of any individual component but the latency of the entire architecture end to end. Seltzer picks up on this when he says, “So, it’s not the latency of the instruction execution; it’s the latency of the architecture?”

So, is this inconclusive evidence of the pending death of the Relational Database? Of course not. But, it is trend spotting in that people are again noticing that there are other ways and that those other ways just might quite faster with modern applications.

So, to paraphrase Varnish4 software architect Poul-Henning Kamp, let’s stop doing things like it’s 1975 and get with the program.

What does this mean for Business?

This means you should be paying attention to your code quality, optimization. You should break out of a one-size-fits-all way of thinking when it comes to databases, data storage, and scalable systems. Vertical scaling by throwing hardware at it is no longer sufficient for modern web scale applications. There are built in limitations as dictated by clear and proven underlying mechanisms that prohibit current modern database and application technology from scaling much further. This is not only about money. It’s about finesse and the application of core scalability design theory from the forefront of technology. In summary, if you intend to run modern applications in truly scalable ways you must break out of the mold we’ve been in for 30+ years and think about new ways to design and build your applications. This article and it’s supporting sources is a good place to start.

Addendum (8/12/2007)

I just found this on a new site launched by GigaOM.  A little more along the same lines.  I haven't read it in depth yet  but just wanted to post it.
http://future.gigaom.com/2007/08/10/data-20-how-the-web-disrupts-our-relational-database-world/

 

  1. A Conversation with Michael Seltzer and Michael Stonebraker. Source URL - http://delivery.acm.org/10.1145/1260000/1255430/p16-stanik.htm?key1=1255430&key2=3880943811&coll=&dl=ACM&CFID=15151515&CFTOKEN=6184618
  2. Werner Vogels: Scalability and Consistency. Source URL - http://www.infoq.com/presentations/availability-consistency
  3. eBay’s Architecture. Source URL - http://highscalability.com/ebay-architecture
  4. Varnish Project – Source URL - http://varnish.projects.linpro.no/