A few weeks ago I was in Lima, Peru on business. One of the things I had the great fortune to do was share Google Glass with a LOT of people. The video below is an interview for TEC channel 4 that I did that aired nationally on 10/13/13. The segment I'm in starts around 11:04.
Building an Application upon Riak - Part 1
- Making the Decision
- Learning
- Operating
- Scaling
- Mistakes
In any event, I’ll not bore you with all the details but we chose Riak. We originally chose it because we felt it would be easy to manage as our data volume grew as well as because published benchmarks looked very promising, we wanted something based on the dynamo model, adjustable CAP properties per “bucket”, speed, our “schema”, data volume capacity plan, data model, and a few other things.
The primary programming language for our project is Scala. There is no reasonable scala client at the moment that is kept up to date for Riak so we use the Java client.
We are running our application (a rather interesting business analytics platform if I do say so myself) on AWS using Ubuntu images.
We do all of our configuration management, cloud instance management, monitoring harnesses, maintenance, EC2 instance management, and much more with Opscode Chef. But, that’s a whole other story.
We are currently running Riak 1.0.1 and will get to 1.0.2 soon. We started on 0.12.0 I think it was... maybe 0.13.0. I’ll have to go back and check.
On to some of the learning (and mistakes)
Up and Running - Getting started with Riak is very easy, very affordable, and covered well in the documentation. Honestly, it couldn't be much easier. But then... things get a bit more interesting.
REST ye not - Riak allows you to use a REST API over HTTP to interact with the data store. This is really nice for getting started. It’s really slow for actually building your applications. This was one of the first easy buttons we de-commissioned. We had to move to the protocol buffers interface for everything. In hind sight this makes sense but we really did originally expect to get more out of the REST interface. It was completely not usable in our case.
Balancing the Load - Riak doesn’t do much for you when it comes to load balancing your various types of requests. We settled, courtesy of our crafty operations team on an on application node haproxy to shuttle requests to and from the various nodes. Let me warn you. This has worked for us but there be demons here! The configuration details of running HA proxy to Riak are about as clear as mud and there isn’t much help to be found at the moment. This was one of those moments over time that I really wished for the client to be a bit smarter.
Now, when nodes start dying, getting to busy, or whatever might come up you’ll be relying on your proxy (haproxy or otherwise) to handle this for you. We don’t consider ourselves done at all on this point but we’ll get there.
Link Walking (err.. Ambling) - We modeled much of our early data relationships using link walking. The learning? S-L-O-W. Had to remove it completely. Play with it but don’t plan on using this in production out of the gate. I think there is much potential here and we’ll be returning to this feature for some less latency sensitive work I perhaps. Time will tell...
Watchoo Lookin’ for?! Riak Search - When we stared search was a separate project. But, we knew we would have a use for search in our application. So, we did everything we could to plan ahead for that fact. But, by the time we were really getting all hot and heavy (post 1.0.0 deployment) we were finding our a few very interesting things about search. It's VERY slow when you have a large result set. It's just the nature of the way it's implemented. If you think your search result set will return > 2000 items then think long and hard about using Riak's search functions for your primary search. This is, again, one of those things we’ve pulled back on quite a bit. But, the most important bits of learning were to:
- Keep Results Sets small
- Use Inline fields (this helped us a lot)
- Realize that searches run on ONE physical node and one vnode and WILL block (we didn’t really feel this until data really started growing from 100’s of 1000’s of “facets” to millions.
OMG It’s broken what’s wrong - The error codes in the early version of Riak we used were useless to us and because we did not start w/ an enterprise support contract it was difficult sometimes to get help. Thankfully, this has improved a lot over time.
Mailing List / IRC dosey-do - Dust off your IRC client and sub to the mailing list. They are great and the Basho Team takes responding there very seriously. We got help countless times this way. Thanks team Basho!
I/O - It’s not easy to run Riak on AWS. It loves I/O. To be fair, they say this loud and clear so that’s my problem. We originally tried fancy EBS setup to speed it up and make it persistent. In the end we ditched all that and went ephemeral. It was dramatically more stable for us overall.
Search Indexes (aka Pain) - Want to re-index? Dump your data and reload. Ouch. Enough said. We are working around this in a variety of ways but I have to believe this will change.
Basho Enterprise Support - Awesome. These guys know their shit. Once you become an enterprise customer they work very hard to help you. For a real world production application you want Enterprise support via the licensing model. Thanks again Basho!
The learning curve - It is a significant change for people to think in an eventually consistent distributed key value or distributed async application terms. Having Riak under the hood means you NEED to think this way. It requires a shifted mindset that, frankly, not a lot of people have today. Build this fact into your dev cycle time or prepare to spend a lot of late nights.
Epiphany - One of the developers at work recently had an epiphany (or maybe we all had a group epiphany). Riak is a distributed key value data store. It is a VERY good one. It’s not a search engine. It’s not a relational database. It’s not a graph database. Etc.. etc.. Let me repeat. Riak is an EXCELLENT distributed key value data store. Use it as such. Since we all had this revelation and adjusted things to take advantage of the fact life has been increasingly nice day by day. Performance is up. Throughput is up. Things are scaling as expected.
In Summary - Reading back through this I felt it came off a bit negative. That's not really fair though. We're talking about nearly a year of learning. I love Riak overall and I would definitely use it again. It's not easy and you really need to make sure the context is correct (as with any database). I think team Basho is just getting started but are off to a very strong start indeed. I still believe Riak will really show it's stripes as we started to scale the application. We have an excellent foundation upon which to build and our application is currently humming along and growing nicely.
I could not have even come close to getting where we are right now with the app we are working on without a good team as well. You need a good devops-like team to build complex distributed web applications.
Lastly and this is the real summary, Riak is a very good key value data store. The rest it can do is neat but for now, I'd recommend using it as a KV datastore.
I'm pretty open to the fact that even with several months of intense development and near ready product under our belt we also are only scratching the surface.
What I'll talk about next is the stack, the choices we've made for developing a distributed scala based app, and how those choices have played out.
A New Year and Fun New Challenges
First of all, I’m working primarily for a company called SolutionSet now. SolutionSet is the 4th largest independent marketing agency in the United States and has four divisions. They are data, direct, local, and digital. I work for the digital division. The areas I focus on are Business Development, Systems Engineering and Architecture, and building up a new Ruby Web Site/Application development practice. I’ll be blogging here and on the SolutionSet blog about my various endeavors in those three areas over time. For a peek into this early part of the year, the kinds of projects I’m involved in at the moment are:
- I am working with an excellent team that I have taken over at SolutionSet. Together we are creating a more scalable Systems Engineering and Architecture practice at SolutionSet. Every day we Architect, Designing, Building, Deploying, and Maintaining websites for our awesome clients. We have a strong focus on systems automation using tools like Chef, Vagrant, IaaS public and private clouds, and are development excellent and scalable processes to get ready for 2011.
- I’m working on a very exciting social media project for a major client on a project I can’t really talk that much about yet. I can say that it has the potential to shake up the way people view and use aggregated social media data for driving core business value. This project is interesting in particular because it’s using a very cool technolog stack. This will be a Scala/Erlang application with a Riak datastore on the back-end. I’ll post more about this stack in detail upcoming blog entries.
- I’m involved as the Systems Architect and Scalability Architect for the planning, design, deploy, and on-going operations of a very large and complicated website deploying on the Terremark eCloud. This one, if you can believe it is a .NET CMS / MSSQL server project. It’s interesting to see how the closed source models compare to the open source models I’ve done so much work with over the years.
- I’m doing a Scalability “Launch Rescue” mission for an up and coming geo location related service. When they got ready to go live they found that their system couldn’t handle nearly enough load to support the current users and what their promotions and social media campaigns would bring in. So, I was brought in by a partner to help them rework things so they can launch. This is a pretty cool project and if they give me permission to speak more freely I’ll definitely have a lot more to say about it soon as well. I think I’ll be able to say more after it’s live. They have a very clever idea and I think they could do quite well.
- A little side project with a friend of mine for a developer workstation rapid application development environment with the proper tools and chef based automation for multi-public cloud deployment of a highly optimized ruby centric technology stack. This is some seriously cool stuff and we successfully used the prototype to launch an extremely successful hyper concentrated web traffic pre-ticket sales event campaign for the Jonas Brothers last year. The other thing that was awesome about this project was the use of BrowserMob to do some sophisticated load testing and Dynamic DNS for some serious scalability and flexibility way up the stack.
That’s just a little preview! All in all I’m very excited and optimistic that 2011 is going to be a great year all around! Most importantly, the array and quality of technology that is available to do things at big scale with concentrated effort and resources. This is a sweet time to be in the technology space I think!
Lastly, I do have upcoming events. I’ll start posting them here as well.
In summary, 2010 was a hell of a year by any measure and 2011 looks promising.
Next Weeks Event Highlights:
Engine Yard: Cloud Out Loud Podcast Interview - I’m being interviewed by Engine Yard (and am very excited to be working closely with them as a partner). I will post/tweet a cross link when the podcast is up on the site.
Using RVM on Ubuntu 10.10
I wanted to use RVM for some testing with Ubuntu 10.10 last night. Not one set of instructions I found around the web would work. At the end of the day it was actually something rather simple; a path issue.
As the documentation clearly points out, you need to make a couple of modifications to your .bashrc script for things to work properly.
Here is what all the instructions say to past at the end:
[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm" # This loads RVM into a shell session.
Here is what you really have to paste at the end:
[[ -s "$HOME/.rvm/src/rvm/scripts/rvm" ]] && . "$HOME/.rvm/src/rvm/scripts/rvm"
^^^^^ ^^^^^
Notice the path differences. I have no idea why this is the case but it is. I had the same issue on previous Ubuntu version as well though. Well, this cost me a little bit of time but once I made that change along with the other documented changes everything has been just peachy.
root@mavrvm:~# rvm --default 1.8.7root@mavrvm:~# ruby -vruby 1.8.7 (2010-08-16 patchlevel 302) [x86_64-linux]root@mavrvm:~# rvm --default 1.9.2root@mavrvm:~# ruby -vruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]root@mavrvm:~#root@mavrvm:~# rvm --default 1.8.7root@mavrvm:~# ruby -vruby 1.8.7 (2010-08-16 patchlevel 302) [x86_64-linux]root@mavrvm:~# rvm --default 1.9.2root@mavrvm:~# ruby -vruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]root@mavrvm:~#
Thank Wayne for catching my error and posting the info here in the comments. The reason I was having issues w/ my RVM installation on Ubuntu 10.10 is that when you install AS ROOT the path for the .bashrc is going to look like this:
[[ -s "/usr/local/rvm/scripts/rvm" ]] && . "/usr/local/rvm/scripts/rvm"
I posted the entire .bashrc contents to a github gist file if you'd like to see the whole thing. The only modifications are to fix it up for RVM.
Gluster 3.1 GA Release
Over the past couple of months I was taking a really close look at GlusterFS for potential use on a virtualization project. Today I saw the notice the version 3.1 was released. That's good news. They call it a scale out NAS platform which it is but it's also a bit more than that too.
I had the chance to speak at length with Anand Babu (AB) Periasamy and a few members of his team at VMWorld recently about 3.1 prior to release and it was genuinely interesting and exciting. I've been following the Gluster project for years and it really just seems to keep getting better and better. Not only that, they seem pretty passionate about what they do which is always a good thing.
Of particular interest in 3.1 is that you are now supposed to be able to add and remove nodes to the cluster without impacting the applications using the cluster at all. This is CRITICAL and was a major barrier to adoption perviously. Previously you actually had to restart the cluster to expand.
One of the things that can be challenging is large scale file sharing to many, and sometimes varying numbers of, application servers in large scale web environments. I could see GlusterFS 3.1 being very useful in this scenario. One recently published example of this is the way that Acquia uses GlusterFS for scaling Drupal.
Of course, other options exist such as Swift from open stack, MongoDB w/ GridFS, Riak perhaps in smaller file size senarios, and perhaps Ceph which just released. The file / storage space is hot right now with change and even *gasp* innovation. It is pretty exciting and more choice over the last few years has been a very good thing.
I suspect I'll be writing more about this in the future assuming I can get some of the testing I want to do completed. As usual, my lab in my secret lair is under powered and over utilized. *sigh*