When is Big Data Actually Big?

There is a quandary for anyone trying to wrap their head around what “BigData” means. When is big data really big? I had a good conversation with a friend of mine @ckenton today and as we were discussing some impactful things he and his company have done for clients over the last couple of years with careful and meaningful data analytics of what would mostly be considered social media data. I was struck by the fact that there was significant impact using what, by data volume measure, was actually not all that much data; perhaps a few gigabytes in aggregate in each case we discussed.

On another front I have two active projects for two very different clients right now where I and my teams have architected and built systems from scratch that crunch from the 10’s of 1000’s to the millions of pieces of data per day in near real time. We’ve created code, frameworks, and modules and used off the shelf kit whenever we could. There have been moments of bliss and moments of solid wall to forehead pounding frustrations. We’re using tools like MongoDB, Riak, node.js, PHP, Redis, Scala, Java, Akka, AWS, Capistrano, Jenkins, Chef, Git and more. We’re using a flexible agile workflow models with business agreements and contracts that match. We are doing all this to analyze data. With all of this we are crunching what some would would call big data and it’s definitely growing very, very quickly by volume. But, it is not big data because of the ever growing volume. It’s big data because from it impactful meaning is extracted and the end users of these insights from otherwise chaotic looking data streams can make impactful business decisions quickly for their contextual needs.

In summary, I’ve come to think that Big Data is Big when the insights derived from it is truly meaningful and potentially significantly impactful. It doesn’t matter if it’s a few Gigabytes or a few Petabytes of data. The technical challenges will vary of course depending on data volume but what really matters is what you learn from the data you have and then, most importantly, what you do with that newfound knowledge once you have it in your grasp.