A fortnight after an electrical incident caused an interruption in service, another outage hit an Amazon EC2 data center. This time weather conditions caused the problem which impacted services such as Pinterest, Netflix, Instagram, and Heroku.
Thanks to the community of online service owners integrating our Radar we visualized incident in real-time. It lasted for hours. Here is a screenshot made by a member of our technical team:
One blogger on ZDnet France summarized this black day for many online service managers using EC2 and stated that it is now the time for technical managers to eliminate the risk of “SCOF” : Single Cloud of Failure, a variation of SPOF (Single Point of Failure).
While this type of incident is planned for and there are multiple systems designed to keep the data center operational, Murphy’s Law is omnipresent.
We’re actually working on a way to use Openmix to route traffic based on the electricity usage and cost in a data center. The inclusion of this variable is particularly useful when the main power grid supplying the infrastructure is cut and we therefore prefer to automatically move traffic to another data center. Of course, what to do in this situation is entirely up to your business and technical needs with Openmix.
Grandmother said : “don’t put all your eggs in one basket”. Our services are designed around this famous sentence : measure the performance of your Cloud providers and anticipate potential failures by deploying a multi-Cloud strategy.
As a bonus, here are two maps of disasters that might occur in the United States and around the world… If you plan to use an infrastructure near (or in) one of its areas, these data can help you to make a best choice.
A Map of the World That Reveals Natural Disaster Hot Zones (by io9)