IT Management and Cloud Blog

« | Home | »

The Relativity of Outages

By John | February 20, 2008

What is the impact of a 3 hour outage?

Of course the answer is, “It depends,” right? In the aftermath of last week’s Amazon S3 outage, there has been a lot of discussion on what the outage meant. Many have argued that it is cloud specific. In one particular blog article, the author stated that 3 hours of outage per year would be paradise and that 3 hours per quarter would be just alarming–anymore than that would be troublesome. For whom?

Three hours per quarter would not be acceptable for even this simple “ESM” blog. I would move to another vendor. On a good day that three hours might cost me 10 possible new readers. What if my three hours of down time happened to be the first day whurley, Coté, or Mark Hinkle decided to check out my site.

One of my first long-term consulting gigs was with the Federal Reserve. My first day on the job, I was informed that under no circumstances could I bring down a production system. System outages at the Federal Reserve make the front page of the Wall Street Journal, I was politely told. A little over 10 years ago, I did a short stretch with GE-Capital, and our CTO was a disciple of Six Sigma from GE Aircraft. She always started a meeting with her classic “Five Nines (99.999) equates to 25 plane crashes a year” speech. I am sure that she is explaining to someone right now how last Fridays Amazon’s S3 outage was the equivalent of 34 plane crashes.

I don’t know, but what does three hours of outage mean to you?

Topics: amazon, cloud computing | 5 Comments »

5 Responses to “The Relativity of Outages”

  1. Frank Kleinburg Says:
    February 21st, 2008 at 11:41 am

    There’s an additional point you may want to consider – one of confidence.. Using your example, if I was a new visitor to your site, and when going to it for the first time, I got a server unavailable error, some of my first thoughts would be “This guy doesn’t know what he’s doing”, and I might never come back..

    Yes there’s the potential for lost revenue from a returning customer not being able to place an order, but the loss of future business as well to consider.. A three hour outage is a big deal.. It shows a basic lack of planning and testing.. flk k

  2. John Says:
    February 21st, 2008 at 11:54 am

    I totally agree. As more things become commodities the more import it will be to stand out in a crowd, and “anything” unavailable will not be tolerated.


  3. Berkay Says:
    February 21st, 2008 at 2:00 pm

    Hi John,
    Contrarian view :) couple of hours of downtime is acceptable for many many applications. It depends on when it happens, it depends on what the application is. it depends on the cost of the downtime.
    Planes need more than 5 nines, because when a plane crashes, people die. If I couldn’t read your blog.. well it would hurt but I’d survive :)
    Just to note, your site was down when I first visited :) I think you were rebuilding your blog at the time. I came back later when I saw another post or link from someone else.
    For many companies, five nines is a dream. Not because they would not want to have it, but because the cost of the investment in hw/sw and expertise cannot be justified. It is expensive to get a lot of nines.
    There lies the attraction to clouds. If they can prove that they can provide that kind of reliability with low costs, it becomes quite an attractive option for many businesses IMHO.

  4. John Says:
    February 21st, 2008 at 2:49 pm

    I asked this question on the Tivoli message board this morning and didn’t get a strong response. I was hoping to hear more of the “Brick-n-Mortar” type responses. A total three outage at one of the investment banking divisions of BofA or JPChase would minimally get someone fired if not killed.

    You are spot on about the clouds. It allows new companies to get the higher availability at a much lower cost.


  5. February 2008 - Review Post | IT Management and Cloud Blog Says:
    July 15th, 2008 at 3:18 pm

    [...] The Relativity of Outages [...]