monitoring

The Five Pillars of Monitoring – Pillar One

This post is the first in a series of five that will be released over the next few weeks. Today’s post is an introduction to the series and the first pillar of monitoring. When a website is launched there is one particular component of the system that is deeply important. That system is monitoring. Monitoring is often swept under the rug, held for last, or poorly done. This is a huge mistake that, in my opinion, is nearly inexcusable. It is my opinion that a lack of monitoring of key items in key contexts will lead to poor performance, outages, and bad business decisions. Monitoring is simply a requirement for any production web application that you actually care about.

Monitoring – Updated and Revisited

I’ve written several times on this blog about various monitoring tools and services.  But, in light of a recent project I’ve been working on for the last few months I have some updates.

The Overall Monitoring Architecture

There are areas of monitoring that I usually like to pay close attention to with a live web application.

  1. Process Monitoring - This makes sure things are running and stay running within certain tolerances. Examples are God, Monit, SMF.  Your choice will depending on your operating system and preferences with scripting languages.
  2. Resource Monitoring - This is fine grained CPU, Memory, Disk Space, Disk IO, Networking, application server threads, and much more. Examples are Nagios, Ganglia, and Munin. Choosing correctly depends on your specific situation.  There is a worth newcomer on the block called Reconnoiter that also looks very promising.
  3. UpTime Monitoring - This is the only monitor people usually do if they do any at all. This should be a disinterested 3rd party to provide accountability and what I call a 3rd party eye in the sky should any dispute about uptime arise.  I like pingdom and there are even free services as well.  I’ve also been using CloudKick in some situations for this purpose as well.

Those three above are from a post I wrote some time ago.  Today, I’m adding a 4th item to that list because it has finally become easy enough and reasonably affordable to add now that there is an affordable choice:

4. Synthetic Transaction Monitors – These actually perform tests of processes a user might go through in your application and report back any anomalies if they occur along with an error report, screen shot, and other data as appropriate.  I’ve been using a tool called BrowserMob and Selenium IDE for this.  You create scripts w/ Selenium, upload them to browswermob and then setup a monitor script.  That’s a simplified overview of course but it’s really quite effective and relatively affordable compared to historical solutions for synthetic transaction monitoring.  Historically it was prohibitively expensive to do synthetic transaction monitors.

The Monitoring Tools I use

What follows are some of my personal current favorites to meet the above goals.

Munin > http://munin.projects.linpro.no/

One of the things on my list for a while to get done is enable munin across your systems.  I use it a lot successfully.  You can see a demo here:

Monit > http://mmonit.com/monit/

Pingdom > http://www.pingdom.com

 

Things I’m testing and have high hopes for are Reconnoiter, BrowserMob monitoring, CloudKick