140+ v2013.05.15

The field of Deep Learning has something that just rings true. http://bit.ly/139i756 very exciting field.

Deep Learning is a truly exciting area in the field of computer science and mathematics. I was initially brought into this way of thinking when I ran into a number of Jeff Hawkin's lectures on YouTube then purchased and read the book "On Intelligence" by Hawkins. Since I've been learning and exploring various iterations of the concept of deep learning as embodied in projects like Google Now, IBM, and many others. I think the possibilities are nearly endless with this technology. 

The inevitable question of human level AI always comes up when discussing deep learning but, in fact, I have little desire at the moment for a robot best friend with or without benefits. But, what I would like to see is human augmentation beyond what we have already today.

"Traditional IT Department No Longer Tenable" How #CloudComputing Changes Enterprise IT Economics http://bit.ly/13kmsG3 

I started blogging about Cloud Computing right here on this blog in 2007/8. My first posts were looking around to see what people wanted and were trying to do and teasing out what seems obvious now, the differences between SaaS, PaaS, and IaaS. Shortly after, I started a company called nScaled to actually build and use clouds. In the interim I've helped dozens of companies build private clouds, public clouds, hybrid clouds, and lots more.

This particular tweet caught my attention because, having done all that and then seeing this particular post I just thought that what's not tenable is IT as an island. IT is simply part of any business now. It's deeply integrated and one way or another, it's all about the cloud no matter which type you want to build. If you don't go that way, your business will not be able to compete.

Google I/O 2013 Extended - en la Universidad de Lima :)http://fb.me/RJEuL72H 

I just spent a month in Lima, Peru living with my family and working. While there I had the great opportunity to meet many local entrepreneurs and even visit one of the coolest startup incubators in South America called Wayra Peru which is funded by Telefonica. Things are moving fast and growing quickly in the tropical zone and I am super-excited by what I found while I was there. 

Analysts Report that Cloud-Based Adoption Increased 40 Percent this Year for Supply Chain Software http://bit.ly/13kmnlO 

This one just goes to show you how deep it's getting in cloud computing. These are died in the wool you gotta have it and tied into the heart of the value chain for big manufacturing companies adopting cloud computing solutions at dizzying rates. How cool is that? To all those cloud haters from way back in 2007. I poke you in the eye today. There is no going back now.

About 140+

140+ is my periodic effort to expound further on 3-5 of my recent tweets because sometimes, 140 characters just isn't enough.

What is to Show for Five Years of Cloud Computing?

Just a few short years ago launching a virtual machine in the cloud was a simple and basic. With a couple of API calls and maybe a button click or two you were up and running in a just a few minutes. The choices were limited but it was nice. You could even get a little storage to go along with the instance. Just before that we were leasing, renting and co-locating dedicated physical hardware in data centers and it took weeks to order, provision, deploy, and set up the gear. Fast forward to today and we are now full-on in a cloud computing revolution redefining how technology is deployed. There are so many choices and so many of them good that it can be complete overwhelming to those trying to make sense of it all. On top of it all, every day I meet people who have never deployed anything in “the cloud.” It’s just as easy as ever to launch a machine but there is so much more available to the on-demand computing as a service would be client today. I was trying to think of what really is different or better today than it was five years ago. What’s really new to show for 5+ years of cloud computing innovation and effort?

First and foremost, people figured out that cloud computing is good for something really important (meaning people not Google). They figured out that the cloud in its various forms is phenomenal for capturing and processing what has come to be known as big data. This is a really important point. It’s never been easy, and still isn’t, to aggregate and process voluminous, high speed, or wildly unstructured data. In fact, prior to cloud computing coming of age it was down right impossible fiscally and technically. Now, it’s all there at the click of a few buttons as pretty as you like. You can now spin up a super computer for just a few dollars an hour to crunch even your most gnarly data sets.

A second fairly dramatic improvement is in the category of orchestration of resources. There are far more resources available to orchestrate for an infinite number of purposes but doing so has never been easier (not, I did not say easy). Due to the proliferation in understanding of the creation and consumption of API’s you can now quite literally with a single set of tools launch a server at several different cloud providers, geo locations, and even operating system varieties if so you wanted and if you’re clever with tools like Puppet, Chef, Cloud Formation, Cloud Foundry or others you can do it all from the comfort of your very own laptop in just a few minutes. You can quickly and relatively easily, historically speaking, compose masses of servers into useful services for nearly anything you can dream!

A third thing that’s changed is the raw power available via a command line or cloud console and in the newer implementations of older software architectures. You can now, in just a few moments, provision a server with 244 GiB memory and high speed 10 Gigabit Ethernet. And, that is just a building block to the real power. The real power comes as a result of massive improvements and capabilities in the arena of distributed computation, storage, and software defined networking. This allows you to provision dozens to thousands of these types of machines relatively on a whim. Frankly, not many people can even figure out what to do with all this power even if they do know how to provision it today. This has forced software architects and engineers to push forward much faster with zeal and learn how to write distributed applications and in many cases, the occasion is being met. So, raw power in both virtualized hardware and the software that can be deployed on it has come a very long way.

In summary, cloud computing had already been brewing for decades with its roots reaching far back in time. Grids, clusters and more were all precursors. However, it is striking how far things have come in just about five years. There has been unprecedented improvement and what feels like ever increasing speed of improvements. Good times indeed.

QSNB Meetup v1.3 - Feb 26!

The Quantified Self North Bay Meetup Group http://bit.ly/11fe6vb will be meeting on Februay 26th, 2013 for it's 3rd scheduled meetup.

If you are interested in what Quantified Self is up to this could be an intersting meeting to attend. So, if you find yourself in/near San Rafael, CA after work on 26 Feb 2013 come an join some curious folks that are all interested in what we can learn from ourselves and each other with smart tracking and analysis of the data we generate.

QS is a large and growing community so if you aren't near here on the 26th be sure and check out the upcomming meetings around the world on the main site.

http://quantifiedself.com/2013/01/the-quantified-self-community/

I hope to see you on the 26th!

http://www.meetup.com/quantified-self-north-bay/events/99809072/

It's 2013! Things Break, Services Falter. Move Forward.

It's a New Year, I have the cloud, but I still have many of the same old Single Points of Failure.

It's known that a single point of failure (SPOF) is a risk. It's an Achilees heel so to speak. That goes for people, companies, planets, AMI's, AZ's, Regions, Countries, or beers in the fridge. Whatever processes you have to do your general day to day work should be able to deal with known SPOF's and be flexible enough to assimilate and adjust to newly found failure modes. But, and this is important, there is a substantial cost associated with eliminating certain SPOF's. Let's say you decided that you no longer are accepting of having Earth be an SPOF for your awesome blog. Well, in that case, you need a space program, and an interplanetary network that puts this desire out of reach unless you are NASA, Elon Musk, or Richard Branson. Admittedly, that is an extreme example but my point is that your tolerance for risk and downtime must be considered carefully for any technology for which you have implicit or assumed service level agreements with your users. Let's think about Netflix for a moment.

Netflix's service was severely impacted this last Christmas Eve by an outage affecting AWS ELB's in their US East region. Based on my arms length information about Netflix operations through what I've read that is public, in my opinion and far more than most organizations, Netflix understands this cost/benefit of utilizing AWS. They say themselves in a recent post:

"Our strategy so far has been to isolate regions, so that outages in the US or Europe do not impact each other."

"Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two." Source: http://techblog.netflix.com/2012/12/a-closer-look-at-christmas-eve-outage.html

Netflix clearly understands the risk and still they have chosen to take it despite the known risks. They were completely at the mercy of AWS in this last outage since the failure was regional in nature and their systems do not allow for multi-regional failover within a country for a single user account or group of accounts YET; but they are working on it.

As an AWS client, they do have a reasonable expectation as a customer that the underlying primitives they use from AWS to compose their services will work reliably. In this case, that primitive was Elastic Load Balancers. Like an AMI is a virtual server, an ELB is something of a virtual load balancers. In VPC's ELB's can span AZ's but then, the ELB is an SPOF unless your service is capable of re-initializing an ELB dynamically when it ceases to serve its purpose and can then re-route traffic accordingly. This is non-trivial but can also likely be dealt with if you understand the various intricacies of geo aware anycast backed DNS services.

Someone asked me if the AWS outages of 2012 would make me re-think my plans for cloud computing in 2013. This does not change my cloud plans for 2013 in any way. But, to be clear, even though I really like AWS, AWS is not the cloud and the cloud is not AWS. AWS is a big and deeply important part of the cloud ecosystem. I'm quite thankful for all they've done to further the understanding of cloud around the world. They are likely to stay on top, from my point of view, for a long while. I and my teams deployed large amounts of AWS in 2012 supporting the services of numerous clients.

I don't think these outages will cause any meaningful pause in most cloud plans for 2013 for anyone who takes the time to understand these sorts of situations and doesn't just fall prey to FUD (Fear, Uncertainty, Doubt) and really is serious about moving to a cloud computing model will keep marching forward. It's not perfect but the benefits to business and technical agility far outweigh the risks and knowledge ramp up investment that is necessary to make full use of cloud computing.

Things break and outages happen. There are very few systems where this is not true and those systems have been designed specially to deal with an extreme need for continuous availability. Especially complex systems and systems deployed at a large scale like AWS can break in interesting ways. It's not so much that things break that is so bad. It is what is done next that matters to keep the same things from breaking again and again. AWS does a pretty good job on this front in my opinion. It performs, communicates, and adjusts far better than most hosting providers I have historical experience working with over the last 15 years or so. They have raise the bar substantially.

Regarding AWS's IaaS services. It is AWS's job to provide a reasonable SLA and maintain it. It is up to users of the services to provide their users with services that have a reasonable SLA and maintain. Decoupling the service from the server is at the heart of the accelerating innovation in hosting of internet connected services that began quite some time ago and now marches under the banner of cloud computing. Now, if you use their PaaS services, it's a bit of a different situation but that's the subject of a whole different discussion I suspect.

Supporting Information Blast from ProductionScale's blog past is contained within the following older posts of mine (in no particular order):

The Traits of a Modern IT Organization, 8/2008
Thoughts on the Business Case for Cloud Computing, 4/2009
Get Your Head in the Clouds, 4/2008
Why Should Businesses Bother with Cloud Computing, 3/2009

Moving at the Speed of Cloud

The majority of my work in the last three years or so has been all about receiving, getting, pushing, pulling, and generally wrangling streams of data (mostly social data) for the purposes of analytics, comparison, or saving across a broad range of products and services for startups (one of my own) and fortune 500 companies. It's been keeping me busy. All of this for the ultimate reason of helping businesses make better and more well informed decisions about products, services, and more.

During this time I and my colleagues have developed the relationships, partnerships, technology stacks, and processes necessary to deliver these types of applications very quickly and at a high quality level. This has been fun all in all and something for which demand seems to be growing quickly.

To give a sense of the technology "stack" I've mostly settled on for solving these types of problems we are using:

Languages: Scala, Java, Node.js, PHP, Ruby

Frameworks: Symfony2, Play2.0, express.js, twitter bootstrap

Data Store: MySQL, MongoDB, Riak, Redis

Infrastructure: Amazon Web Services

Orchestration: Chef, Custom Scripting, AWS Cloud Formation

That's just a high level snapshot of course, there are a lot of details down inside each of those items from favored libraries to DB clients, and configuration management frameworks.

The best part for me is that it seems like for the first time in a long time many buisinesses seem to understand and believe in the value of the application of technology to solving business problems as a first order task.

The drive for big data aggregation and analytics is a natural evolution of the the maturation of cloud computing as both a technology and a service/process. The continued evolution of programming languages, application frameworks, and even the general understanding of distributed service oriented architectures and how to program REST API's is all improving as such an incredible rate that it's just an awesome time to be creating software.

So much of what we are doing now has been "around" in one form or another for a long time. The science in computer science laid the foundations quite some time ago. It's only now that so much is becomming so  accessible and the information on how to use all these tools is readily available.

I read a recent article/survey posted to Forbes.com that said the cloud is still three years away from it's full impact. The first cloud camp, where I did a session on developing for the cloud, was in 2008. That's only four years ago and look how much has changed! Awesome. 

From where I sit, this is an exciting time with nearly unlimited possibilties. Ideas are critical. Exececution is just as important. If you want to talk about any of these things I'm usually found either in San Francisco or San Rafael so let's chat! Good times!!

Data Goes Through Phases on the Way to Insights

Over the last few years I've been primarily building medium to large scale custom real time analytics platforms for clients. It's kept me pretty busy. I've done some for startups and even one for a big Fortune 50 client. This is been awesome in a variety of ways. Much of that work is finally about to see sunlight ('net light) finally and starting to hit the wires which makes me happy of course.

Along the way I have seen a patterns emerge in these types of systems. They are patterns that at a glance may seem obvious but are anything but when you are down in the weeds dealing with the various challenges associated with building these types of real time business analytics applications. For now, I've decided there are six phases in the life of a data object like a tweet, post, G+ post, email, support call, etc. to become meaningful and ultimately measurable. This pipeline looks something like:

Capture -> Distill -> Index -> Compute -> Display -> Interact --> Measure

The measure phase can feed right back into capture so the snake can eat it's own tail. Most of the applications I've architected and built with my clients and teams have ended up just like this evenutally. They didn't all need each piece right out of the gate and of course, there are more items you can add on to augment this list. But, no matter which direction, tools, or applications we built, they all ended up looking a bit like this pipeline eventually as they matured.

Capture. This keeps getting easier and easier. Much of it can even be very successfully outsourced now by using tools like DataSift or Gnip. Aggregating and storing the data with things like node.js and MongoDB, HBase, Cassandra, Redis, Node.js, and others is making this a bit rote at this point. So, it is much easier now to capture and save arbitrary data streams than ever before.

Distill. This is a combination of things like manually curated filters, NLP for categorization or sentiment and various other possible "metrics" of a sort. This can be heavily automated using a variety of very useful open source tools/algorithms, services like openamplify and much more. This part is about taking that raw mess of data and filtering it down to something more meaningful while deriving a data set that can be used for later purposes in the indexing and compute phases.

Indexing. Take the data you have saved and distilled. Then, making it searchable. Doing this at low volumes and high latency is dead easy. Doing this at scale in a low latency, high throughput, highly available and scalable fashion is very non-trivial. You'll notice this is getting harder to do as you move through the pipeline. Solr, ElasticSearch, and other tools have proven helpful in this area in various ways.

Compute. All the rage at the moment is creating metrics and scores from the derived data that has been captured, distilled, and indexed. Even apparently simple and embarrassingly parallel algorithms can be an insane can of worms at this stage. When you hit the limits of scale up you better have made wise choices at the beginning or you'll be facing a big rewrite. Writing code that scales and creating algorithms that can be scaled is also not easy at all. Tools like Akka and Fabric Engine are ones I've been working with and exploring quite a lot as well as hadoop of course and various options for stream based processing. This is were a lot of the FUN FUN is right now and it's technically very exciting in this area.

Display. Displaying information in a meaningful way to a user takes serious concerted effort. When millions and billions of things are being analyzed in near real the limitations of your user interface and choices to display data will become evident very quickly. Be ready to pivot. This is extremely hard to get right the first time. Until you get solid user feedback it's nearly impossible to get it right. This is one of the reasons I'm a fan of Agile, Lean, Lean UX, etc. The dance between UX, IA, Front End Development, API design and backend development, systems engineering, and more to make these complex distributed high throughput systems all work in a performant, distributed, and scalable way while being a joy to use is definitely a massive challenge. Often, I've found, it is an exercise in keeping things simple and fighting to eliminate unnecessary complexity day after day. Complex systems definitely seem to trend toward entropy and not entropy.

I'm happy to say all of this is becoming easier quickly as frameworks mature, new tools come along, and knowledge amongst engineers, designers, and product teams continues to increase on average. We build on the foundation what what came before us for sure. Lastly, I'd be remiss to point out that there is definitely no silver bullet for any of these phases of the pipeline, there is no one programming language that makes it all better, there is no secret handshake super secret message queue, or database that solves all your ills. But, this certainly is a fun time to be working in 'net space.

When is Big Data Actually Big?

There is a quandary for anyone trying to wrap their head around what “BigData” means. When is big data really big? I had a good conversation with a friend of mine @ckenton today and as we were discussing some impactful things he and his company have done for clients over the last couple of years with careful and meaningful data analytics of what would mostly be considered social media data. I was struck by the fact that there was significant impact using what, by data volume measure, was actually not all that much data; perhaps a few gigabytes in aggregate in each case we discussed.

On another front I have two active projects for two very different clients right now where I and my teams have architected and built systems from scratch that crunch from the 10’s of 1000’s to the millions of pieces of data per day in near real time. We’ve created code, frameworks, and modules and used off the shelf kit whenever we could. There have been moments of bliss and moments of solid wall to forehead pounding frustrations. We’re using tools like MongoDB, Riak, node.js, PHP, Redis, Scala, Java, Akka, AWS, Capistrano, Jenkins, Chef, Git and more. We’re using a flexible agile workflow models with business agreements and contracts that match. We are doing all this to analyze data. With all of this we are crunching what some would would call big data and it’s definitely growing very, very quickly by volume. But, it is not big data because of the ever growing volume. It’s big data because from it impactful meaning is extracted and the end users of these insights from otherwise chaotic looking data streams can make impactful business decisions quickly for their contextual needs.

In summary, I’ve come to think that Big Data is Big when the insights derived from it is truly meaningful and potentially significantly impactful. It doesn’t matter if it’s a few Gigabytes or a few Petabytes of data. The technical challenges will vary of course depending on data volume but what really matters is what you learn from the data you have and then, most importantly, what you do with that newfound knowledge once you have it in your grasp.

People, Process & Technology

I am often asked how I do things and why.  In most cases it really does come down to thinking about three things and making sure they are properly tended to at all times.  Those three things are people, process and technology in that order.

With the right people you can do anything you can set your mind to doing within the realms of possibility.  With the right process you will be far more effective than the next team and your operations will be able to scale.  With the right technology you will be enough out on the edge to get ahead but rock solid and scalable so that you will not crumble under brittleness or technical debt (also related to process mind you).

I haven’t been blogging much lately because I’m really quite wrapped up in my work for my clients as well as just trying to spend more time with my family.  I am trying to find ways to get back into the blogging grove more than once a month though.  I’ve been building Social CRM and Business Analytics solutions for some clients on two interesting technology stacks. One stack is a Riak, Redis, RabbitMQ, Jetty, Scala/Java, Akka, Nginx core stack on AWS.  The other is MongoDB, RabbitMQ, PHP/Symfony2, Node.js, Nginx core stack on AWS.  There are other techs involved of course.  They both use Chef extensively for example.  One uses CI with Jenkins as part of the process, both use automated deployment tools in Ruby (capistrano primarily).  Both of these are slated for public release in short order so I’ll be able to say more then.  It’s been truly interesting working on them in parallel and I’m very excited to see them go live over the next weeks.

In both cases, it’s the team, the process, and the technology that have made it possible to process large amounts of data, deploy tremendous amounts of code fast, and create products that have excellent potential to make a difference to their users and scale well over time.

So, if you ask me to build you something you’ll be hearing me ask if you are willing to make the commitment to the people, processes, and technologies that will make it a reality fast.  When it all comes together it is great fun!

I hope to get some time to write more about the actual details of this trinity in the near future.  Be patient, I have to pay the bills you know!

How NOT to Sell NoSQL Database

This is the description of my first experience with a newer NoSQL database that we'll just call NoSQL Database #9999 I was told about and asked what I thought about it overall.  I hadn't heard of it before but I wanted to see what the deal was since I work with several others.  I'm always up to see if something is actually the new hotness.

I found marketecture diagrams everywhere.  The development cycle is closed and opaque development for server and client.  There is a 30 day "free trial" signup wall to maybe get to the download screen.  I'm not sure since I didn't fill it out and I really don't feel like spending my time navigating a sales channel for filling it out. The License agreement was really fun.  The short version is that it is a non-exclusive licensing model and no ability to use/test in production to see if it really works.  The choice parts basically say that I can't use the software in production and that says that if it doesn't work that's not our problem and we never said it would.  There is actually a warranty clause that says they don't warranty anything at all it's is just "as-is" without warranty!  I am not feeling the love at this point.  Then, I wanted to see the pricing.  I couldn't of course.  The minimum contract term beyond 1st 30 days is 12mos with, you guessed it, unspecified pricing information unless I contact sales directly.

So, now I know why I've never heard of this software and nothing meaningful has been written about that is not PR or Marketing driven.  There is really no way that I would even consider adopting this software at this point.  It's most likely that it is not real.  

So, NoSQL database #9999 there are many other equally usable solutions that are far more transparent in the way they do business and foster community around their products.  This isn't about paying money.  This is about trust.  So, sorry NoSQL #9999, but I'll not be entering your sales cycle in this fashion or evaluating your product at this time.  Moving along now...  

Happy Wednesday Everyone!

Building an Application upon Riak - Part 1

For the past few months some of my colleagues and I have been developing an application with Riak as the primary persistent data store.  This has been a very interesting journey from beginning to now.  I wanted to take a few minute and write a quick "off the top of my head" post about some of the things we learned along the way.  In writing this I realized that our journey breaks down into a handful of categories:
  • Making the Decision
  • Learning
  • Operating
  • Scaling
  • Mistakes
We made the decision to use Riak around January of 2011 for our application.  We looked at HBase, Cassandra, Riak, MySQL, Postgres, MongoDB, Oracle, and a few others.  There were a lot of things we didn’t know about our application back then.  This is a very important point.

In any event, I’ll not bore you with all the details but we chose Riak.  We originally chose it because we felt it would be easy to manage as our data volume grew as well as because published benchmarks looked very promising, we wanted something based on the dynamo model, adjustable CAP properties per “bucket”, speed, our “schema”, data volume capacity plan, data model, and a few other things.

Some of the Stack Details

The primary programming language for our project is Scala.  There is no reasonable scala client at the moment that is kept up to date for Riak so we use the Java client.

We are running our application (a rather interesting business analytics platform if I do say so myself) on AWS using Ubuntu images.

We do all of our configuration management, cloud instance management, monitoring harnesses, maintenance, EC2 instance management, and much more with Opscode Chef.  But, that’s a whole other story.

We are currently running Riak 1.0.1 and will get to 1.0.2 soon.  We started on 0.12.0 I think it was... maybe 0.13.0.  I’ll have to go back and check.

On to some of the learning (and mistakes)

Up and Running - Getting started with Riak is very easy, very affordable, and covered well in the documentation.  Honestly, it couldn't be much easier.  But then... things get a bit more interesting.

REST ye not - Riak allows you to use a REST API over HTTP to interact with the data store.  This is really nice for getting started.  It’s really slow for actually building your applications.  This was one of the first easy buttons we de-commissioned.  We had to move to the protocol buffers interface for everything.  In hind sight this makes sense but we really did originally expect to get more out of the REST interface.  It was completely not usable in our case.

Balancing the Load - Riak doesn’t do much for you when it comes to load balancing your various types of requests.  We settled, courtesy of our crafty operations team on an on application node haproxy to shuttle requests to and from the various nodes.  Let me warn you.  This has worked for us but there be demons here!  The configuration details of running HA proxy to Riak are about as clear as mud and there isn’t much help to be found at the moment.  This was one of those moments over time that I really wished for the client to be a bit smarter.

Now, when nodes start dying, getting to busy, or whatever might come up you’ll be relying on your proxy (haproxy or otherwise) to handle this for you.  We don’t consider ourselves done at all on this point but we’ll get there.

Link Walking (err.. Ambling) - We modeled much of our early data relationships using link walking.  The learning?  S-L-O-W.  Had to remove it completely.  Play with it but don’t plan on using this in production out of the gate.  I think there is much potential here and we’ll be returning to this feature for some less latency sensitive work I perhaps.  Time will tell...

Watchoo Lookin’ for?! Riak Search - When we stared search was a separate project.  But, we knew we would have a use for search in our application.  So, we did everything we could to plan ahead for that fact.  But, by the time we were really getting all hot and heavy (post 1.0.0 deployment) we were finding our a few very interesting things about search.  It's VERY slow when you have a large result set.  It's just the nature of the way it's implemented.  If you think your search result set will return > 2000 items then think long and hard about using Riak's search functions for your primary search. This is, again, one of those things we’ve pulled back on quite a bit. But, the most important bits of learning were to:
  • Keep Results Sets small
  • Use Inline fields (this helped us a lot)
  • Realize that searches run on ONE physical node and one vnode and WILL block (we didn’t really feel this until data really started growing from 100’s of 1000’s of “facets” to millions.
At this point, we are doing everything that we can to minimize the use of search in our application and where we do use it we’re limiting the result sets in various ways and using inline fields pretty successfully.  In any event, just remember Riak Search (stand alone or bundled post 1.0.0 is NOT a high performance search engine).  Again, this seems obvious now but we did design around a bit and had higher hopes.
 
OMG It’s broken what’s wrong - The error codes in the early version of Riak we used were useless to us and because we did not start w/ an enterprise support contract it was difficult sometimes to get help.  Thankfully, this has improved a lot over time.

Mailing List / IRC dosey-do - Dust off your IRC client and sub to the mailing list.  They are great and the Basho Team takes responding there very seriously.  We got help countless times this way.  Thanks team Basho!

I/O - It’s not easy to run Riak on AWS.  It loves I/O.  To be fair, they say this loud and clear so that’s my problem.   We originally tried fancy EBS setup to speed it up and make it persistent.  In the end we ditched all that and went ephemeral.  It was dramatically more stable for us overall.

Search Indexes (aka Pain) - Want to re-index?  Dump your data and reload.  Ouch.  Enough said.  We are working around this in a variety of ways but I have to believe this will change.

Basho Enterprise Support - Awesome.  These guys know their shit.  Once you become an enterprise customer they work very hard to help you.  For a real world production application you want Enterprise support via the licensing model.  Thanks again Basho!

The learning curve - It is a significant change for people to think in an eventually consistent distributed key value or distributed async application terms.  Having Riak under the hood means you NEED to think this way.  It requires a shifted mindset that, frankly, not a lot of people have today.  Build this fact into your dev cycle time or prepare to spend a lot of late nights.

Epiphany - One of the developers at work recently had an epiphany (or maybe we all had a group epiphany).  Riak is a distributed key value data store.  It is a VERY good one.  It’s not a search engine.  It’s not a relational database.  It’s not a graph database.  Etc.. etc..  Let me repeat.   Riak is an EXCELLENT distributed key value data store.  Use it as such.  Since we all had this revelation and adjusted things to take advantage of the fact life has been increasingly nice day by day.  Performance is up.  Throughput is up.  Things are scaling as expected.

In Summary - Reading back through this I felt it came off a bit negative.  That's not really fair though.  We're talking about nearly a year of learning.  I love Riak overall and I would definitely use it again.  It's not easy and you really need to make sure the context is correct (as with any database).  I think team Basho is just getting started but are off to a very strong start indeed.  I still believe Riak will really show it's stripes as we started to scale the application.  We have an excellent foundation upon which to build and our application is currently humming along and growing nicely.

I could not have even come close to getting where we are right now with the app we are working on without a good team as well.  You need a good devops-like team to build complex distributed web applications.

Lastly and this is the real summary, Riak is a very good key value data store.  The rest it can do is neat but for now, I'd recommend using it as a KV datastore.

I'm pretty open to the fact that even with several months of intense development and near ready product under our belt we also are only scratching the surface.

What I'll talk about next is the stack, the choices we've made for developing a distributed scala based app, and how those choices have played out.

The SaaS Aggregation Benefit Mirage

In this service oriented on-demand world I’ve been running into something again and again lately that I’ve found interesting and a bit annoying.

To start, imagine I’m going to build an application that uses two 3rd party services on-demand.  We’ll just call them service A and service B and say each have two features.  For this example it does not really matter what the services do.

Service A
  Feature A-1
  Feature A-2
Service B
   Feature B-1
   Feature B-2

So, I create my application and it first uses service A do something and it uses Feature A-1 and A-2.  Then, with the output of that it uses service B to do something else using feature B-2.

Now, a few months down the line when things are going great I get a call from my account manager at Service A telling me I can now get all the features of service B directly from them included.  So, what they are telling me is that my service structure now looks like this:

Service A
  Feature A-1
  Feature A-2
  Feature B-1
  Feature B-2
Service B
   Feature B-1
   Feature B-2

On the surface this looks really good.  It’s the same thing with less hassle right?  Maybe not.

This is where my annoyance surfaces.  Dig in and dig in well.  What I find again and again is that it’s simply not true because of what I’ll just call the filter effect.  What you really are getting with this new and improved service A is more like.

Service A
  Feature A-1
   Feature A-2
   Feature B-1

Notice that Feature B-2 is missing and that probably no body mentioned it.  Or, it’s more like:

Service A
   Feature A-1
   Feature A-2
   Feature C-1
   Feature C-2
   Feature C-3
   Feature C-n-OMG
Service B
   Feature B-1
   Feature B-2

And you don’t care because C isn’t B and all you need as A-1, A-2, and B-2.  While they say it’s equal is not and the app use feature B-2 if you’ll recall.  How much time did you just spend?

So, by the time you get through all this and figure out that the new improved Service A + B is pretty useless and all you really want is what you already have you will have wasted a lot of time.  There are less features, more complexity, less control, and likely much worse service and support for the aggregated services since you have no direct relationship to the end point provider.

So, rambling aside the point is that these service provider mashup aggregaters are not what they often seem on the surface and I’m frequently finding that the best deal is going right to the source and that any “savings” on the surface likely gets eaten up later in a variety of ways that are difficult to predict.  In most cases, it’s best to go to the source to get what you want.

Brick and Mortal Retail Doomed, Doomed I Say

Not my usual blog topic, but hey, it’s Friday and I had a brutal week.  But, I just had to relay a retail experience I had a week or so ago.  I went into a local hardware store.  It’s a pretty good one but I’ve always thought they were a bit pricey.  But, I needed a new vanity mirror and I needed it now.  On the display they had one I immediately liked and was willing to pay the premium.  Once I found a salesperson I actually started w/ a question about some sconce lights I liked all so.  He looks at me.  Looks me up and down and says, “your not going to like it.”  I said, “hit me.”  He did, I didn’t.  Whatever.  Bad start for sure.  In any event, I say, I’d like to get this mirror you have over here on the wall.  I took him to it, I said I’d like one of these please.  Here are the two things he said to me:

1. We don’t have those in stock.
2. But, I bought one and it’s still in the box at home.  Want to make me an offer I might sell it.

Ohhhhkay... I said, alright, I think I’ll stick to the store here.  How long to get one.... Turns out it’s 7-10 business days.  Keep in mind this is a premium price.  Apparently there are six with their nearby suppliers.  At this point I had already snapped a picture with my phone and sent it to my wife for approval.  She says... LOVE IT... in response.  So, I take the product #.  I tap it into a google search.  I find it on Amazon.com from an affiliate for 30% less, 5-7 days delivered, and a bit of tax.  Net savings over store of around 20%.  I decide to just show they guy and I said the following things:

1. I’d prefer to buy local. Can you sell it to me for this price and get it here in about the same time frame.
2. I can just order it from here an it’ll come to my doorstep by pushing this buy button now.

The response floored me.  No.  He mentioned the one he had at home again.  I said okay.  I hit the buy button and went home.

Here’s the crazy part.  This is an employee owned store!

I’ve tested the I’d like to buy this at several other stores recently.  Most of them simply have NO stock.  They are just display stores. You cannot buy what they have and go home w/ it.

Astounding.  Physical retail has absolutely no chance with me using this kind of approach.  I’ll just order from home.  Shame, I wanted to buy local.  Either way mirror will be here in a couple of days along w/ the sconce lights, towel rack, and TP holder to match.  *sigh*

Stop Staring at my Polyglot!

I received an interesting comment/question via my blog recently.  It went a bit like this... 

I’m developing a distributed cloud application but my developers are pushing back on me for having a polyglot database strategy.  What should I do?

I won’t get into exactly what it is they are doing since that would take several more pages.  This is something of a stream of thought post so apologies if it is a little rough around the edges.  The easiest way to answer is in the context of an application I’ve been working on for a while that has some similarities to what this person wants to build.  Everything I’m describing is part of an app I’ve been building with a client since earlier this year.

Typically you'll need a few layers of "data storage” for any distributed batch or real time application (cloud native application) which is what I understand that you are trying to build.

I consider anything that holds data that is for presentation, computation, or transformation part of the data storage architecture and I like to break it down by time in storage from least amount of time to most. 

Short or Very Short Term: Single node caches (like memcached) or volatile computer node memory
Mid Term: Queue's and IMDG's
Long Term  Durable Storage:  Dynamo and BigTable derivatives abound 

There are numerous database products that live in or even between those tiers these days; more than ever before.  By no means is what follows even close to an exhaustive list.  A quick list of the ones I have worked with in the last few months personally looks like: 

Short Term: Memcached, Redis, RabbitMQ, ZeroMQ, DRAM, APC, MongoDB
Mid-Term: Redis, GridGain, RabbitMQ, ZeroMQ, MongoDB
Long-Term: Riak, MongoDB, S3, Ceph, Swift, CloudFiles, EBS, HBase

Short-Term storage is ALL in memory, not persisted to disk, and not intended to be used for long periods of time.  Your application also has to be able to deal with the fact that this type of storage is essentially ephemeral.  If the node gets a KILL signal from some source or another your app needs to know how to deal with this gracefully.  In other words, storage here is not durable at all.

Mid-Term storage is used for longer running processes.  It benefits greatly from being distributed and having a higher degree of durability.  This is generally still where most of the work in done in main system memory (no disk I/O) but also where you might do complex calculations or data transformations on your way to your goal.  You do it here because it’s fast.  You do things here because they can be shared amongst lots of workers (like queue subscribers). 

Long-Term storage is used for exactly that, long term durable storage of important data that provides sufficient and reasonable interfaces from which to retrieve that data again when needed.  Preferably it’s possible to do things like map-reduce jobs so that you can iterate and retrieve what is necessary which you may then operate on at one of the higher levels up this stack.

You’ll see that I’ve put some of them in all or multiple categories which might seem odd until you understand how they work and match the technology to what ever you are trying to achieve from a business perspective.

I have a tendency to avoid things that require overly complex operational management issues for starting up projects because I like to try to get my TCO (Total Cost of Ownership) over time (3-5 years) as low as possible while achieving the project goals and SLA’s.  There are a couple of exceptions on the list above that do have more operational overhead (MongoDB and HBase) but they are good enough in the right context that you might want to learn and use them anyway.

Now, back to the question at hand.  Should I use one type of DB or many for the needs at hand.  In this case, I’ve told them that they should use as few as possible, possibly only one.  The reason for this was that in their case they will value speed, consistency, and lower cost of operations at this early stage of their project.  They are developing an interesting distributed system for cool reasons.  I recommended a choice to them that I think will help them get to their goals fast and cost effectively while allowing them down the line to break off pieces of the application later as and if needed.

Parting words are that it will, over time, be nearly unavoidable that this (and most) applications of a distributed nature end up being database polyglotoumous.  However, I do think it adds a lot of complexity and overhead and in the early stages of a project it's not usually necessary unless what you are doing is of great complexity in which case you might want to break that down anyway to something more manageable.

Can New Clouds Teach Old Apps New Tricks?

Cramming the same old code, CMS, application, etc into the cloud (any cloud) doesn't make the most of the capabilities of cloud computing in all it's various forms.  I expect to be discussion this subject more in the near future.  But, start by giving two examples and labeling them cloud native application design pattern and anti-pattern. 

A Cloud Native Application Design Anti-Pattern

I'll pick on Drupal a bit (but with love).  If one installs Drupal at a cloud IaaS or PaaS provider then that does not make Drupal a cloud native application.  To me, this seems obvious but I am not so sure it is obvious in general.  The Drupal CMS is not a Cloud Native Application.  Putting Drupal, Wordpress, CMS XYZ of your choice on cloud computing IaaS or even PaaS provider of your choice essentially means you end up with an virtualized n-tier application running in the cloud with many of the same limitations of a hardware based deployment and only some of the benefits of being a cloud native application running on a cloud computer.  Yes, of course, and admirably (see billions of pageviews per month) drupal can run IN the cloud.  But, that does not make it OF the cloud.  But, I will say that based on personal experience even considering all this situation it's still likely the right choice in a great many cases to run it in the cloud.

A Cloud Native Application Design Pattern

If you want to see what CMS can look like as a cloud native application then check out the Lily CMS project. I personally might not choose this specific architecture and systems design to achieve the same goals.  However, there is more than one way to build a CNA.  They have done some great work there and are clearly on the right track!  It's excellent work and I have respect for what the Outerthought team has created with their platform.  It's actually potentially quite a lot more than just a CMS as well.  In any event, I think that with the exception of the default HBase high availability limitations (which will be addressed soon by HBase project I suspect) this can be considered a cloud native application.  Coupled with the appropriate monitoring, automation, and even cloud environment awareness it would be a very powerful cloud native application.

All of this summarizes to me as one very simple fact.  There is a tremendous opportunity ahead!  Exciting times.

The NIST Definition of Cloud Computing(Draft)

click for original doc

I thought I'd start the week with a reminder of an oldie but goodie.  This document came out after the intial barrage of "what is cloud computing" and "cloud computing defined" posts from a few years ago.  But, I've always felt that NIST did a great job with it overall.

In my opinion it's still one of the best and most complete current definitions of Cloud Computing of any other out there.  So, in the off chance that you have not seen this defintion of cloud computing it is definately worth the time to read through.

One of my earliest definition articles from April 2008, Get Your Head in the Clouds, is still my most trafficked article on this site most weeks.

"Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud  model  promotes availability and is composed of five essential characteristics, three service models, and four deployment models..."  --NIST Cloud Definition

In particular I like the way they break down the characteristics, service, and deployment models. 

 

Cloud Operating Systems: Do They Exist Yet?

I was asked an interesting question by friend of mine a few days ago.  He says simply, “Are there any Cloud Operating Systems?”

I then proceeded to argue that no, there really were not any because all of the necessary pre-requisites for being both cloud native while providing the basic services an operating system should provide were not available in the current crop of software claiming to be a cloud operating system upon which I could build cloud native applications.

Let’s think about what an Operating System gives you.  An OS is essentially an abstraction layer over hardware that provides the necessary interfaces(drivers) to get the work done that is needed by computer programs.  At its most basic an operating system provides storage, networking, and compute and the management of those things in a fairly transparent way.   It also provides a tremendous amount of flexibility to it’s user.

Wikipedia defines an Operating System as, “software, consisting of programs and data, that runs on computers, manages computer hardware resources, and provides common services for execution of various application software. The operating system is the most important type of system software in a computer system. Without an operating system, a user cannot run an application program on their computer, unless the application program is self booting.”  -wikipedia

If I was to write the definition for wikipedia using the definition for an operation system as the bse then I would say, “A cloud operating system (COS) is software, consisting of programs and data, that runs on IaaS providers, manages IaaS resources, and provides common services for execution of various application software and SaaS. The cloud operating system is the most important type of system software in a cloud computing system. Without a cloud operating system, a user cannot run a cloud native application program on the cloud computer, unless the application program is self booting.

To me, this sounds most like what we currently often call Platform as a Service (PaaS).

The problem I have with current PaaS systems is that they dictate heavily and take away far too much from the programmer from the underlying services.  This is a compromise of course.  But,  a true operating system is far more flexible than the existing crop of PaaS services.  Just imagine if your newest operating system upgrade wouldn’t let you access ½ the CPU cores or the networking services for some reason and your manufacture just said, tough cookies because that’s just how it is.  You’d probably try to return it if you could.

Where this got interesting for me in thinking about it later was the concept of “self-booting” that was mentioned in the wikipedia defintion above.  If you build the necessary intelligence into your PaaS such that it can bootstrap an application, the storage, networking, and compute necessary to run and then manage that moving forward then you do, in fact, have something more resembling a cloud operating system as defined above.

Welcome VGBuilder to the world!

Build custom apps + deploy them to your cloud host of choice -- straight from the command line.


The first applications using VGBuilder are live already and there are more in works.

This Cloud Native Application Development toolset was created out of a genuine need for a big project about a year ago that needed to move very fast.  There were no tools that we felt really met our needs at the time so we created our own.  That’s a story that will be told soon in more detail I expect.

I consider VGBuilder a Cloud Native Application development tool set.  It does not meet all my requirements  as you’ll note if you read the above article.  But, in time it will I believe.

To understand some of my thinking that went into VGBuilder it helps to read these articles:

Cloud Native Applications

I wouldn't claim necessarily that VGBuilder is revolutionary but it certainly is evolutionary.  One of the more powerful aspects of this toolset is that it removes the need for any middle man or centralized platform to get your ideas from your brain and into the cloud crazy fast and with quite a bit of style.  It allows you to build and deploy your applications in the cloud with almost no barriers.  It removes the need for PaaS services for many people right from the start.

Essentially your laptop or computer becomes your private cloud and either Amazon or Rackspace (supported today out of the box) becomes your public cloud. This is essentially a hybrid cloud model that puts all the power in the developers hands and automates almost everything that is often tedious or cumbersome otherwise.

Things are still rough around the edges.  As much as anything I want to guage interest and get feedback on this new set of tools.  There is a long way to go but preliminary case studies have been excellent.  

Please contact me or sign up on the early access web page for if you’d like to be kept informed of the progress and provided access to use the tools for your own projects as it is opened it up to more early adopters.

 

ProductionScale Communication Problem Discovery

I discovered today a distressing thing about my blog.  For the last few months two things have been misconfigured.

One, my phone number was an old number that is out of service.  So, if you tried to call me then please just note the new number on the about page and try again.

Two, my contact submission form was going to an email address that was Supposed to forward the mails but has not being properly performing its duty.  I have fixed this issue and gone back through and replied to all the emails that were legitimate.

My sincere apologies for those communications issues and if you have tried to reach me through this blog and I did not respond, that is likely why.  I always respond to legitimate requests for information, work, or engaging discourse.  So, please send me a new note and don't hesitate to reach out!  I thoroughly enjoyed the connections I've made over the years from this blog and hate to think I let it lapse for a bit!

Sincerely, 

Kent Langley

Scale Planning and the AKF Scale Cube

There are a lot of ways to draw diagrams for availability and scalability.  I use different ones for different purposes all the time.  However, when I was reading the Art of Scalability by AKF partners I ran across a nice compound diagram they call the AKF Scale Cube which helps simplify the explanation of the multi-dimensional nature of scalability issues in complex web application scenarios.  

I’ve been using this visualization model to help me explain how things fit together in both technical and business discussions.  It comes in very handy I must say.

Most recent I’ve been using it to describe a gnarly distributed application I’m working on for a client.  What follows is a generalized version of a functional use and some discussion of a compound view of ones I have created for clients of mine.
 


Some base-line definitions are in order if you haven’t read the book mentioned above.

X-Axis - Horizontal Scalability, Clones, Scale Out.  These are terms often associated with the X-Axis.  In the case of this graph, day you build a data processor then make 10 copies of it.  Well, that’s scaling to 10 units on the x-axis in this graph.  Depending on your application, this can help you increase your capacity; but not always!

Y-Axis - I’ve called this axis functional decomposition for a long time.  It can be thought of as breaking the application down from a monolithic single instance into discreet stand-alone parts.  I have some examples that are a bit of mix from various projects I’ve worked on in the past here.

Z-Axis - This is the tricky one for most folks.  This is what people might call sharding, partitioning, etc.  Keep in mind, I’m not only talking about a database here.  I’m talking about an entire complex multi-faceted distributed highly-available web application.

0,0,0 - The intersection of all three axis or 0,0,0.  This is what some would call an all-in-one server.  It’s often used for proof of concept for for launching without a care in the world for future growth needs.  There’s nothing wrong with it as long as you understand the limitations and technical debt associated with the approach.

Z-Axis Item Explanations

Client - Assuming this is a multi-tentant application you may want to shard your application by client such that each client or group of clients is assigned somehow to a specific cluster of nodes.

Geography - Assuming you’d like to have built in DRBC and your applciation is capable of surviving being split up into many pieces then you could end up sharding your application by data center and broader geographies such as city, state, country.

External Cloud - Using IaaS and PaaS resources outside of your own data centers.  For a refresher on IaaS and PaaS see the article I wrote, “Cloud Computing:  Get Your Head in the Clouds,” in 2008 that was heavily read over the years.

Internal Cloud - Using your own infrastructure resources BEHIND your own firewall to do whatever it is that your application does.  This doesn’t always have to be a cloud.  If you want to know what I think it takes to be a cloud then read the several articles I wrote over the years related to that topic.  I do set the bar pretty high though I’ve learned.

Purpose - You might want to simply partition along the Z-Axis by any generic purpose for various reasons.  I think of this a little bit like saying I want to put all widgets in data node 1 and all waggles in data node 2.  They’ll both fit in a single node but maybe I want to spread my risk around.  This one is a little nebulous but it can engender fun conversations about why things need to exist at all.

Shard Key - We see this all the time in traditional RDBMS style deployments and even in some of the newer tools in the NOSQL world.  It’s basically just some index of what node you put things one somewhere.  For those of you that had to deal with libraries before the internet you’ll remember the lovely card catalog.  It’s was nicely set up to help you figure out with shard of the library your book was close to.  Then, when you got there, good old dewey decimal system kicked in to take you the rest of the way.

Y-Axis Item Explanations

Data Processing - this could be some application that transforms data from one state to another.  For example, it might simply remove all the spaces in a document and replace them with dashes.  That’s a bit of a silly example, but just to make the point.

Data Aggregator - I’ve had to build project after project that needed one form or another of data aggregation.  So, just think of this as something that might consume and RSS feed and stick it in a database of some kind.

Distributed Calculation - I’ve been doing work and research with Map-Reduce, Actor Models, the Bulk Synchronous Parallel Model and more exotic instruments from past, present, and future.  This is simply something that does some kind of math or calculation of some sort.  For example, counting all the uses of the word onomatopoeia in 50TB of English essays by high school students across 100’s of of compute nodes.

Processor App - This is just a generic discreet application that processes something, like an API request for example.

Web App - This is an application, in my case, written in a modern MVC framework that has the job of interacting with web users and getting things done in the back-ground in various ways with various services.

Base Installation - I think of this as just shared code.  One of the developers I have been working with recently suggested that we extract a number of commonly used components from various application pieces on the Y-Axis and build a library of sorts.  Great suggestion in this case, so I stuck in on my general diagram to remind me in the future.

What’s interesting about all these conceptual applications is that if you create them correctly and with the correct architectural models that each item that lives on the X Axis will also be able to scale on the X and Z axis.  For example, you could have your web application running 5 X-Axis copies in 4 Y-Axis partitions; say external cloud, by client, purpose, and by geography per client.  So, you’d end up use four AWS Availability zones in 2 AWS locations running 96 application nodes in total.  Of course, your application has to be built correctly to take advantage of all of this distribution at every level.  But, that’s a topic for a later date I suppose.

In summary, this post was just to share some of my thinking around the use of a very nice visualization tool by the fine folks at AKF Partners.  So, a shout out to them for the nice tool and hopefully this helps people a bit understand how it can be used / thought of in a variety of ways.
Just remember, it's not one-size-fits all.  Your use, labels, and needs for such things will vary greatly depending on what you are trying to architect, develop, and deploy.

On Clouds and SPOF’s (or the Great AWS Outage of April 2011)


Just a couple of days after posting about cloud native applications Amazon raised the bar by having some issues in one of their data center regions.  These issues primarily affected EBS and RDS from what I’ve read.  So, pretty much everything one way or another since using AWS EC2 without EBS in any form for most applications that exist today is a little wacky for most folks.  This is because your EC2 AMI won’t persist through a reboot in the absense of the use of EBS.  Most folks have not reached the operational nirvana yet of full automated configuration management and application fault tolerance that makes this acceptable for them.

What level of SPOF (Single Point of Failure) are you are willing to tolerate.  So, I wanted to “scale up” the idea of the SPOF then bring it back down again.  Here we go.

If the earth stops working, so will your web application (admittedly there might be some satellite networks that don’t have this problem... but who cares at that point?)

So, let’s keep going.  Each of these is a potential single point of failure.

Earth > Continent > Country > State/Region > City > Neighborhood > Building > Floor > Room > Rack > Server > Server Component

And, at each tier, there are numerous dependencies and contexts to keep your service running at any given time.  There are the obviously ones like the above example where if the earth explodes the neighborhood is pretty much shot to hell also.  But, that’s obvious.  It’s gets less obvious when you dig deeper into the data center and see that there are 5 servers so that’s okay right?  Maybe. Maybe not. If it is something like.

Dynamic Name Service > Load Balancer > Web Server > Application Server > Database Server

Then those 5 servers/services might be in that one rack per data center per room per building per neighborhood per city per state per country per continent per planet is looking pretty vulnerable.  In the grand scheme of things the loss of one power supply in one machine could impact the entire planet’s capacity to retrieve whatever is on that DB that is so globally important; like a picture of your kid making a funny face on his 2nd birthday.

Do you think it is Amazon AWS’s fault if you put that database on one server in one rack in one place with no reasonable SLA and it goes away forever?  Not so much.  You are accountable and responsible.  You made that choice.

Now, how can we change this for the better?  We can develop applications that are able to tolerate the loss of a single point of failure at a sufficient granuality (Earth is a bit extreme today) such that our applications keep running when bad things like the AWS outage occur.  I call these Cloud Native Applications.  They have certain traits that should look a little familiar to cloud folks.

You cannot create a cloud native application doing things the same way you always have before.  It simply will not work.  The necessary software architecture and systems architecture has changed if you want your application to run on the cloud w/ no SPOFs.

Just needed to get that off my chest.  Some related links for good reading:

http://blog.basho.com/2011/04/21/Amazons-outage-proves-riaks-vision/

http://www.thestoragearchitect.com/2011/04/22/so-your-aws-based-application-is-down-dont-blame-amazon/

http://highscalability.com/blog/2011/4/22/stuff-the-internet-says-on-scalability-for-april-22-2011.html

http://www.infoq.com/news/2011/04/amazon-ec2-outage

And if your REALLY keen to write some CNA’s (contact me) and read...

http://www.infoq.com/presentations/Actor-Thinking
http://www.infoq.com/presentations/1000-Year-old-Design-Patterns

 

Data Driven Diet (c) 2012