« IT Management Podcast #006 | Home | Is Jim Whitehurst a Lou Gerstner? »
Look Mom, Two Nines – Amazon S3 Major Outage Today
By John | February 15, 2008
I guess it’s 99. today for Amazon S3 customers. Man do I have timing or what. We literally moved this blog site server off of AWS this week in preparation for our move over to Mosso this weekend. I have seen reports of AWS being impacted for at least two hours, and some reports were longer. Some popular internet sites like Twitter, AdaptiveBlue, and 37Signals were impacted today.
The good news is that S3 is really just an API for storing data–it’s not the virtual server that AWS provides (i.e., EC2). Most AWS customers use only S3′s storage API’s for warm backups and/or snapshots. A few savvy vendors who rely on S3 for near-real time access had already built redundancy plans for this scenario. I have been reading through a lot of the blogs about this morning’s outage, and here are some take-aways:
- Amazon S3 appears to be not as cloudy as we would have hoped. The design does have a single point of failure, so it will be interesting to see how they explain the outage.
- Remember, S3 is still in beta. There is a trade-off of using S3 and its easy, low-cost entry ramp for getting started versus the risk of using its service. Also EC2/S3′s potential for elasticity is very alluring, but startups have to weigh the risks as well.
- One of the key take-aways from today should now be about how clouds are too risky but about the maturity of the cloud provider. Other cloud providers out there have a lot more experience in providing computer hosting services to customers. In the end, cloud or no cloud, this is still about hosting computing services. When I first put my blog on EC2, in all honesty, I did it to be cool. I got lucky, however, and had 100% availability for over 3 months, although I was still concerned about periodic latency spikes and the fact that AWS doesn’t have a mature service delivery model and no support infrastructure. My blog is far from mission critical; however, my assessment was that AWS is to risky even for a small enterprise like mine. I was amused to find that one of the companies that was using AWS was complaining that the only support that it had was the AWS developers forum, and even that went to read only at some point.
- Amazon provides an SLA for S3, but, since it is based on an API service, the customer has to keep very good log entries on a per transaction basis. My guess is that a lot of these new startups are so busy just getting their bugs and new features mature that they probably didn’t have a lot of time for that old-fashioned RAS stuff. Therefore, I bet there probably aren’t going to be a log of refunds next week.
- I will be interested to see a postmortem on any of the newer EnterpriseDB or MySQL implementations. From what I have seen in my research, most of the vendors who are implementing relational DB support on EC2 are using S3 as warm or near-time backups, requiring a perfect storm for something to go wrong. For example, if their EC2 image dropped during the S3 outage and they didn’t have a third backup plan, they would have been in a world of hurt.
- Customers who are looking to use utility or cloud implementations need to do more homework during their selection process. They need to fully understand who holds what responsibilities and when. If you are planning on using EC2 and S3, you need to understand what you are going to have to develop to provide outstanding services to your customers. Alternatively, other providers like RigthScale provide a robust and redundant type of implementation on top of AWS, and they might be a better choice. Like I have said before, AWS EC2 and S3 are game-changing technologies, and I consider them to be wonderful tools – but buyer beware.
- No one is immune to outages.
For more on this, see:
Topics: amazon, cloud computing, ec2, mosso, rightscale, s3 | 4 Comments »


February 17th, 2008 at 3:09 am
Yeah, I’ve heard about that problem. Some od my friends are using Basecamp and they had a hard day too. Oh, well, I was lucky, cause I switched to Wrike.com. Anyway, it’s understandable that on-line apps outsource some parts of the service to a third party. But they definitely should take responsibility, even if the fault is not theirs.
March 2nd, 2008 at 7:29 pm
[...] Look Mom, Two Nines – Amazon S3 Major Outage Today [...]
March 4th, 2008 at 6:24 pm
[...] Look Mom, Two Nines – Amazon S3 Major Outage Today [...]
August 20th, 2008 at 4:09 pm
[...] John is more of a monitoring kind of guy, he still entertains us with his posts on cloud computing, downtime, virtualization and [...]