I attended the GraphLab Workshop 2013 on Monday this week in San Francisco. It was a good set of talks. I went to this event to get another solid point of view on state of the art in graph database and analytics in the context of data mining and machine learning. There are a few things that really stood out. I thought I'd outline those thoughts.
One of the standouts for me was the work that has been done with GraphChi as a derivative of GraphLab I think this looks to be a very important tool in the adhoc analysis, development workflow, and general push forward for knowledge about graph analysis and databases. Historically, it's not easy to get access to the types of systems needed to learn and test graph analytics. This really makes a dent by letting you work on graphs of substantial size in reasonable time frames using very little hardware at an extremely low operational complexity. GraphLab itself looks to be planning some good work in this ease of use / getting started area as well. A funny question from the audience was, "Did you (graphchi) single handedly kill big data?" Well, of course not, but thinking through that does illustrate that big data isn't all about big infrastructure. From my point of view, big data is about big insight in the most effective manner possible!
I found myself a little surprised at the near complete (with 1-2 small exceptions) mention of other frameworks. This was a GraphLab event so I guess that should not be that surprising. But, given that much of my prior explorations in this area were in the Spark/Shark area I was hoping for a little more comparative analysis.
There was, again and again a recurring theme on the underlying technicalities of doing large scale graph processing. That is that at scale (from size of graph perspective) it's very easy for the communications to become the bottleneck. This makes sense of course. However, what struck me in one of the talks in particular is just how big of a graph you can store in a single machine today. For example, one slide from a twitter speaker could be inferred showed being able to store 40 billion edges on a single commodity server w/ 288 GB of memory. In other words, you can do some very sophisticated things with relatively little hardware. So, this might be one of those use cases where you need a good reason to go out and marshall 100's or 1000's of servers when you might just need one or two.
The "IceCube" project is unbelievable. Wow! How did I never hear about this? This is a neutrino detector for stuff that passes through the earth. Meaning, they have optical sensors turns toward the interior of the earth that collect and generate a LOT of data. Graph analysis is then used to determine the good signals from the bad signals onsite (very remote site) and then only send off to the lab over the satellite link what really seems to matter.
I was also struck by how much the graph processing can potentially benefit from the tools and techniques embodied in technologies like Scala and Actor Model implementations like Akka and some features like Futures. Blocking is bad in graph analysis and it just seems, although I need more data to back it up, that things like Actors and Futures could be very useful in this context.
Lexis-Nexis was doing some very interesting work analyzing data from mobile devices in airports and other areas in concert with technology from companies like Cisco to provide indoor geolocation and help them analyze airport traffic flow patterns over time. This made for some lovely graphs.
I was introduced to BrickStream as well at this talk. In short, imagine putting a Kinect-like sensor in your warehouse or store thereby essentially giving it eyes. The internet of things is very alive and we'll definitely be needing powerful data analytics technologies to make any sense out of it at all. Graph processing seems to be much at the heart of all of this effort.
Demographics. There were 553 people in attendance and a large percentage were women. This is great! You read a lot about the lack of women in technology but they certainly were in attendance at this event in force.
That's about it off the top of my head. If any of this is interesting to you and you'd like to chat about it reach out to me on twitter @kentlangley. For my part, I have some very interesting applications I'm being asked to build related to all of this and looking forward to building even more awesome software that is faster, bigger, and smarter than ever before!