Building43 GlusterFS Post Comments and Follow Up Thoughts

On the building43 site there was a post about running GlusterFS on Rackspace Cloud Servers (or slicehost - same thing really).  It is a good article that can help you get up an running.  

I happen to really like GlusterFS and will almost certainly be blogging more about it in the near future as I have been doing some work with it lately.

However, in a Rackspace environment due to cost considerations it seems that it's not very efficient to use it for storage there.  The reason is pretty clear.  The cost per GB of storage is much too high in my opinion.  Here are the comments I made.  The cost per GB of storage is so high because you can only get the storage as part of a machine and you can only get larger amounts of storage with the larger instances (which get quite expensive per month).  This is quite different from Amazon EC2 where you can provision EBS volumes to add storage to your storage nodes which should help drive down storage costs on average.  Anyway, here were my comments from the post and I encourage you to read the original post as well as check out the GlusterFS project.

The issue w/ running GlusterFS on Rackspace is that there is no way to add more block storage to an individual node. Also, according to the price list here:

So, a 620GB node would be ~$700 /mo or $1.12 per GB. Then, of course, you need at least two or you haven’t really done anything useful. So, your price per GB will double to $2.24 / GB. That’s quite expensive.

Or, you’re limited to tiny little nodes. A 256MB instance with 20GB of storage * for about $11 / mo is $0.55 / GB * 2 for $1.10 / GB.

* You won’t be able to use all 20GB for your volumes.

Basically RackspaceCloud is missing an EC2 EBS-like analog for more affordable block storage.

The network bandwidth is severely limited for the smaller instances. This could be problematic. Or, is it not limited on the internal interfaces? That is unclear to me at the moment.

I love rackspacecloud and use it all the time. But, I probably would not use it for this in this case for anything very big at all. But, the way described in the article is a nice way to do active/active on a couple of nodes in addition to your applications that might already be there and running in a load balanced way.