CIO

Amazon Cloud outage bad for business

Concerns about reliability raised following EC2 outage

Outages to Amazon’s Elastic Compute Cloud (EC2) offerings over the weekend have received plenty of global coverage, but they left at least one Australian business frustrated.

Simon Ellis, chief technology officer of Australian-Canadian startup LabSlice told Computerworld Australia that the outage to Amazon’s eastern US region on Friday came at a bad time for the company.

The company, which uses EC2 Web hosting, offers services to companies wanting to create sales/training demos, evaluations, and training environments in the Cloud.

Ellis was attempting to give a demonstration of one of the company's products to clients at the time.

"It's the second time there has been an outage," Ellis said. "It doesn't reflect too well on our business or Amazon considering we are a Cloud service provider and we are promoting Amazon as a high-level product."

The company used the Cloud service to host and download training images.

"As I was giving a demonstration, the images which are normally available in three minutes, I was left waiting for 15 minutes with nothing happening. It takes at least an hour until the fault status shows up on Amazon's website. We were in a ‘no man's land’ for a while."

Ellis said that while his company had mitigated against the possible risks associated with services from third parties like Amazon, he would not be "too impressed" if an outage happened again.

"Everything is running now but this is the second time that I have been demoing the software and there was an outage. As long it's not a multi-day outage, it is something we can brush off for the meantime.”

Another Melbourne company, Cyclopic Energy, which also uses EC2, was luckier.

Technical director, Rick Morgans, said the engineering firm was an "atypical Amazon user", using the service for cluster computing rather than critical data storage.

"We'll fire up 16 cluster compute instances and have them running for 20 to 40 hours and then shut down,” he said. “I'm not sure if the cluster compute instances were affected as we didn't have any running."

Popular social networking services including Reddit, Foursquare, Quora, Awe.sm and others were left without service for portions of the weekend as a result of the Amazon outage, partially fuelling increased commentary over the issue.

Since then, the company has slowly revealed information about the cause, which it attributed to a “networking event” that triggered a shortage capacity to some of the four availability zones that make up the eastern US region operated out of Virginia. The issue ultimately led to increased latency and some data outages.

Amazon operates two regions in the US in Virginia and California, as well as regions in Singapore, Ireland and most recently Japan. Each region is split into ‘availability zones’ used to provide multiple, redundant instances of client data and computing clusters.

Among the 99.5 per cent uptime included in client service level agreements, the outage has called Amazon’s use of availability zones into question.

Though some have laid the blame on Amazon for the outage, others have argued it was a simple issue of risk mitigation, one CIOs had no excuse not to counter in using the Cloud services.

Ellis warned companies should have appropriate backup strategies in place, regardless of the provider or claimed reliability.

Has your business experienced an outage due to Amazon Cloud? Let us know below!

Follow Hamish Barwick on Twitter: @HamishBarwick

Follow Computerworld Australia on Twitter: @ComputerworldAU