Amazon Web Services has been going through a much publicised outage, which has lasted by all appearances more than 12 hours. A range of services including Hootsuite, Reddit, Heroku, Foursquare, Quora and others have all faced major disruptions.
What is interesting is how they have positioned these outages: many have said EC2 is great, but they are having a bit of a problem at the moment. It appears these providers are taking the view that “whew, glad we outsourced our stuff so that it is clear that is not OUR fault that something like this has happened, and we can point to other vendors to prove the case that it wasn’t us – just imagine if we had have done this on our own servers and this happened, we would have been much more at fault!”
Just because systems are moved to the Cloud doesn’t mitigate the responsibility to ensure mission critical outages are mitigated. If a business has a use-case that cannot tolerate down time then that business needs to architect their solution in a way that prevents downtime. Cost tradeoffs are always an issue, but if something goes wrong, and the cost of that problem is too high, then perhaps the service isn’t really feasible.
Find our more information about legal issues in the Cloud.
Imagine an airline providing a service where they cut costs on safety in order to offer a cheap service… Doesn’t bear thinking about. Imagine that the airline outsourced their safety inspections to a third party and then wiped their hands of responsibility in the event of a “downtime”. No-one would buy that.
The whole point about the Cloud is that it enables you to free your thinking about one provider. Even if you stick with an Amazon only solution, or a Microsoft or Google or Salesforce or Rackspace or whatever solution, you still need to architect things in a way that allows you to accept the consequences of any flaws, no matter how they are caused.
After all, you are the service provider to your customer base – how you decide to deliver that is up to you.
A lot of people are learning a very hard lesson at the moment – there are good ways and bad ways of doing things. For some, a 12 hour outage is hardly a problem, but for others it can ruin lives.
Alan Perkins is CIO of Sydney-based software developer Altium. He is an award-winning Cloud computing pioneer recently recognised as a top 10 finalist for IDC Asia Pacific's CIO of the Year. This post appears on his blog Cloud81 and the views and opinions are his own and do not reflect those of his employer or any other corporation or individual.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.