Hadoop - New Blog at developer.yahoo.com

Hadoop is a couple of key things in broad terms.  One, a MapReduce implementation.  Two, a distributed file system called HDFS.  There is a new blog that just launch that I'll be keeping an eye on for sure.


Learning to program against a MapReduce implementation is definitely something that has helped out Google over time.  But, now you can learn it too.

 A few points I pulled out..

 http://research.yahoo.com/node/1849 - ZooKeeper Project -

"In a distributed computing environment, different parts of an application run simultaneously across thousands of machines. Without someone or something in charge to manage all these machines, utter confusion can ensue.

A small group of Yahoo! Research scientists led by Benjamin Reed is trying to bring order to this chaos with a new service they developed called, appropriately enough, Zookeeper."

http://research.yahoo.com/node/1849 - Yahoo Pig

"creating infrastructure to support ad-hoc analysis of very large data sets. Parallel processing is the name of the game."

Those are a few things I'll be keeping an eye on over time.  Distributed computing is an exciting area of research and much of what's being tested and developed now has a tendency to make its way into computer life just a few years down the road.  They mention in the blogs video introduction that Hadoop could be the Apache of distributed computing in a few years.  Only time will tell but they are on a very interesting course either way.

For my part right now I'm mostly interested in playing around with HDFS because a scalable, reliable, flexible, and fast file system is the core of so many other useful things.