scale

Monitoring – Updated and Revisited

I’ve written several times on this blog about various monitoring tools and services.  But, in light of a recent project I’ve been working on for the last few months I have some updates.

The Overall Monitoring Architecture

There are areas of monitoring that I usually like to pay close attention to with a live web application.

  1. Process Monitoring - This makes sure things are running and stay running within certain tolerances. Examples are God, Monit, SMF.  Your choice will depending on your operating system and preferences with scripting languages.
  2. Resource Monitoring - This is fine grained CPU, Memory, Disk Space, Disk IO, Networking, application server threads, and much more. Examples are Nagios, Ganglia, and Munin. Choosing correctly depends on your specific situation.  There is a worth newcomer on the block called Reconnoiter that also looks very promising.
  3. UpTime Monitoring - This is the only monitor people usually do if they do any at all. This should be a disinterested 3rd party to provide accountability and what I call a 3rd party eye in the sky should any dispute about uptime arise.  I like pingdom and there are even free services as well.  I’ve also been using CloudKick in some situations for this purpose as well.

Those three above are from a post I wrote some time ago.  Today, I’m adding a 4th item to that list because it has finally become easy enough and reasonably affordable to add now that there is an affordable choice:

4. Synthetic Transaction Monitors – These actually perform tests of processes a user might go through in your application and report back any anomalies if they occur along with an error report, screen shot, and other data as appropriate.  I’ve been using a tool called BrowserMob and Selenium IDE for this.  You create scripts w/ Selenium, upload them to browswermob and then setup a monitor script.  That’s a simplified overview of course but it’s really quite effective and relatively affordable compared to historical solutions for synthetic transaction monitoring.  Historically it was prohibitively expensive to do synthetic transaction monitors.

The Monitoring Tools I use

What follows are some of my personal current favorites to meet the above goals.

Munin > http://munin.projects.linpro.no/

One of the things on my list for a while to get done is enable munin across your systems.  I use it a lot successfully.  You can see a demo here:

Monit > http://mmonit.com/monit/

Pingdom > http://www.pingdom.com

 

Things I’m testing and have high hopes for are Reconnoiter, BrowserMob monitoring, CloudKick

 

D-I-D Approach to Scalabilty - Article at AKF Partners

One of the blogs I frequent is AKF Partners.  They write some top quality content there.  A recent post introduced what they call the D-I-D approach to scalability.  This stands for Design, Implement, and Deploy.  It advocates planning for scalability.  *GASP*  Say what?!  Plan for scalability?  I kid.. I kid...  This is something that's all to rare and I usually get told that planning for scalability is a waste of time.  I 100% disagree with that attitude and do like that approach AKF is espousing.  I posted some comments to their blog entry and will just dupe those here. Nothing like quoting yourself anyway right?

My comments:

Excellent write up. Thank you. This is almost exactly what I advocate day after day regarding planning for scalability. I like that you’d put a bit of a framework around it.

Personally, I treat scalability concerns the same as any other “feature” in a project in a Agile Development context. It’s brought up and discussed briefly in scrums, possibly handled in a follow up meeting, and then either put on the backlog to be prioritized w/ everything else or worked on next if necessary.

It’s often very, very difficult to get some teams to think pro-actively about scalability AND agree to table it for later. I think this is because sometimes when you have these discussions people realize that they need to “fix” something and can’t really help themselves.

Of course, these days with cloud computing gaining so much traction and resources being available on-demand more than ever at a moments notice it’s getting easier. But, if you don’t architect for scalability at the outset you may find when you get the implementation and deployment phases that it’s impossible to do what needs to be done. But, that’s a whole other topic I suppose.

Cheers!

Drupal: Peformance and Scalability

Since I do work on Drupal sites from time to time and have built some very large ones I do keep an eye on the Drupal "stack" and change. But, I haven't revisted it for a while.

A few months ago I posted an article about a Drupal stack that I had tested w/ a big media company to do 2.5 billion page views a a month. It was a lot of servers and some sophisticated modifications to Drupal.

It's still a solid architecture but there are some new entries I thought I'd like to evaluate. I also deployed that one on Joyent so I was curious what I might be able to do with other cloud vendors. This time I picked Rackspace's Cloud Servers since they were kind enough to comp. nScaled, Inc. a little free time to test things out an demo to clients. So, I've been beating up on them pretty good. I'm impressed and considering what I did to some of their servers yesterday I'm surprised I didn't get a cease and desist!

Here is the Drupal Stack I built and configured.

Varnish: Extreme Front End Web Caching and Much More

I've written several times in the past about varnish. But, in the last six months it wasn't the right fit for the client work I was doing so it drifted away a bit in my world.  Today, doing some research, I ran across and old bookmark and thought that I should check in on the project.  There was tons of great news!

Here's the top of the items that grabbed my attention