Posts

New Feature: Reason Code reporting

Cedexis’ Global Load Balancing solution Openmix makes over 2.5 billion real-time delivery decisions every day. These routing decisions are based on a combination of Radar community’s 14 billion daily real user measurements and our customers’ defined business logic.

One thing we hear time and time again is “It’s great that you are making all these decisions, but it would be very valuable into why you are switching pathways.”  The “why” is hugely valuable in understanding the “what” (Decisions) and “when” (Time) of the Openmix decision-routing engine.

And so, we bring you: Reason Codes.

Reason Codes in Openmix applications are used to log and identify decisions being made, so you can easily establish why users were routed, such as to certain providers or geographical locations.  Reason Codes reflect factors such as Geo overrides, Best Round Trip Time, Data Problems, Preferred Provider Availability and or whatever other logic is built into your Openmix applications. Having the ability to see which Reason Codes (the “why”) impacted what decisions were made allows you to see clearly where problems are arising in your delivery network, and make adjustments where necessary.

Providing these types of insights is core to Cedexis’ DNA, so we are pleased to announce the general availability of Reason Codes as part of the Decision Report.  You can now view Reason Codes as both Primary and Secondary Dimensions, as well as through a specific filter for Reason Codes.

As a Cedexis Openmix user, you’ll want to get in on this right away. Being able to see what caused Openmix to route users from your preferred Cloud or CDN provider to another one because of a certain event (perhaps a data outage in the UK) allows you to understand what transpired over a specific time period. No second guessing of why decisions spiked in a certain country or network. Using Reason Codes, you can now easily see which applications are over- and under-performing and why.

Here is an example of how you can start gaining insights.

You will notice in the first screenshot below that for a period of time, there was a spike in the number of decisions that Openmix made for two of the applications.

Now all you have to do is switch the view from looking at the Application as your primary dimension to Reason Code and you can quickly see that “Routed based [on] Availability data” was the main reason for Openmix re-routing users

Drilling down further, you can add Country as your Secondary Dimension and you can see that this was happening primarily in the United States.

All of a sudden, you’re in the know: there wasn’t just ‘something going on’ – there was a major Availability event in the US. Now it’s time to hunt down your rep from that provider and find out what happened, what the plan is to prevent it in the future, and how you can adjust your network to ensure continued excellent service for all your users.

Don’t Be Afraid of Microservices!

Architectural trends are to be expected in technology. From the original all-in-one-place Cobol behemoths half the world just learned existed because of Hidden Figures, to three-tiered architecture, to hyper-tier architecture, to Service Oriented Architecture….really, it’s enough to give anyone a headache.

And now we’re in a time of what Gartner very snappily calls Mesh App and Service Architecture (or MASA). Whether everyone else is going for that particular nomenclature is less relevant than the reality that we’ve moved on from web services and SOA toward containerization, de-coupling, and the broadest possible use of microservices.

Microservices sound slightly disturbing, as though they’re very, very small components, of which one would need dozens if not hundreds to do anything. Chris Richardson of Eventuate, though, recently begged us not to assume that just because of the name these units are tiny. In fact, it makes more sense to think of them as ‘hyper-targeted’ or ‘self-contained’ services: their purpose should be to execute a discrete set of logic, which can exist in isolation, and simply provide easily-accessed public interfaces. So, for instance, one could imagine a microservice whose sole purpose was to find the best match from a video library for a given user: requesting code would provide details on the user, the service would return the recommendation. Enormous amounts of sophistication may go into ingesting the user-identifying data, relating it to metadata, analyzing past results, and coming up with that one shining, perfect recommendation…but from the perspective of the team using the service, they just need to send a properly-formed request, and receive a properly-formed response.

The apps we all rely upon on those tiny little computers we carry around in our pocketbooks or pockets (i.e. smart phones) fundamentally rely on microservices, whether or not their developers thought to describe them that way. That’s why they sometimes wake up and spring to life with goodness…and sometimes seem to drag, or even fail to get going. They rely upon a variety of microservices – not always based at their own home location – and it’s the availability of all those microservices that dictates the user experience. If one microservice fails, and is not dealt with elegantly by the code, the experience becomes unsatisfactory.

If that feels daunting, it shouldn’t – one company managed to build the whole back end of a bank on this architecture.

Clearly, the one point of greatest risk is the link to the microservice – the API call, if you will. If the code calls to a static endpoint, the risk is that that endpoint isn’t available for some reason; or at least, is unavailable at an acceptable speed. This is why there are any number of solutions for trying to ensure the microservice is available, often spread between authoritative DNS services (which essentially take all the calls for a given location and then assign them to backend resources based on availability), and application delivery controllers (generally physical devices that perform the same service). Of course if either is down, life gets tricky quickly.

In fact, the trick to planning for highly available microservices is to embed locations that are managed by a cloud-based application delivery service. In other words, as the microservice is required, a call goes out to a location that combines both synthetic and real-user measurements to determine the most performant source and re-direct the traffic there. This compounds the benefits of the microservice architecture: not only can the microservice itself be maintained and updated independently of the apps that use it, so too the network and infrastructure necessary to its smooth and efficient delivery can be tweaked without affecting existing users.

Microservices are the future. To make the most, first ensure that they independently address discrete purposes; then make sure that their delivery is similarly defined and flexible without recourse to updating apps that use them; then settle back and watch performance meet innovation.

What is the difference between the optimist and the pessimist?

Answer: nothing. Except the pessimist is better informed.

This old Russian joke is funny because it has some truth to it. The pessimist understands that things will fail. The pessimist is eventually always right, since eventually everything fails. There is a reason that most good system admins and operations professionals are pessimists. Everything eventually fails.

Today’s discussion is about the availability (or lack thereof) of CDN (Content Delivery Networks) and Cloud services.

As we will see in a moment – Clouds and CDNs generally can and do have availability issues. Regularly. These issues do not exhibit themselves as major outages that get in the newspapers. Rather – they exhibit themselves in the 1000’s of micro-outages (or reachability issues) between ISPs and the Clouds/CDNs.

I recently went and looked at Cedexis Live to get a sense of how many Cloud outages we might see in a 10 day period. I randomly chose June 25th to July 5th. Over that time there were 156 Availability lapses and 24 Latency fluctuations.

Screen-Shot-2016-07-06-at-10.53.20-AM

In the world of CDNs the micro-outages during this time frame came even hotter and heavier!

Screen-Shot-2016-07-18-at-11.15.27-AM

As you can see – 638 Availability issues and 546 significant Latency fluctuations!

We have talked about Cedexis Live before and if you have not had a chance to see how messy the internet can be, I urge you to go check it out.

Sum of Availability

One of the best examples of an innovative company that has pursued this strategy to increase its Availability is Amplience.

This rapidly growing company has solved many of the really hard problems for its customers base. One of which is availability.

Sum-of-Availablity-

They solved the problem of availability by combining the natural availability provided by each provider into a 100% available solution. This is weaved into their broader product in an integrated fashion so that their customers do not even know it is there – and yet – it all works flawlessly together.

Screen-Shot-2016-07-18-at-3.09.03-PM

You can learn more about how Amplience uses Cedexis here to see how it might work for you.

I will leave you with another great quote on the difference between the optimist and the pessimist.

A pessimist is a man who has been compelled to live with an optimist.
-Elbert Hubbard, 1927

Don’t be an optimist. Be a realist. Things always fail, but the good news is you can do something about it.

Cedexis September 8th roundup – Cedexis joins the SVA + Significant Cloud outage in Europe

We have a busy week this week. Cedexis is headed to IBC in Europe, and we will have some big announcements around that event. Stay tuned.

For now, here are a couple of important tidbits.

Cedexis is excited to announce its membership in the Streaming Video Alliance.

sva_logo1

The SVA is an industry group dedicated to developing, publishing and promoting open standards, policies and best practices that allow the video streaming ecosystem to flourish. This is important work. Because of the complexity of streaming video, as opposed to web pages and files, this segment of the Internet industry has always had added complexity and additional requirements that have driven unique innovations. These innovations over time have become part of the fabric of Internet streaming but like barnacles on a ship at sea have over time started to slow the overall progress.

“Scaling streaming video for future demand requires a significant collaboration among the entire online video value chain.”
— Emil Rensing, Chief Digital Officer, EPIX.

Screen-Shot-2015-09-08-at-7.57.38-AM

To learn more about the SVA, check out their site.

Radar Live

Last week, we announced a new service: Radar Live.

We are constantly detecting micro-outages on the Internet because of the unique view that we get from the Radar community. Radar is a real time, real user monitoring solution, transforming every Internet user in the world into a probe, helping us all to understand the quality of service of your websites and the health of the Internet. Radar gathers billions of measurements a day to build an accurate and unprecedented view of the real time status of any platform in the world, whether it is a Cloud, a CDN or even a data center.

Radar Live exposes this data in real time in a really cool graphical interface.

As I mentioned, we see many outages, but we saw one on September 4 that deserves some attention. Radar Live alerted us to a major issue that affected many users – specifically ones trying to reach a major Cloud provider in Ireland. It looks like it lasted about an hour.
EC2-Ireland-

Within a few minutes of this happening, we saw an avalanche of issues through Radar Live.

The Major Cloud provider acknowledged an issue around that time:
“2:56 PM PDT We are investigating Internet connectivity issues in the EU-WEST-1 Region.”

And Level 3 acknowledged an issue as well (https://twitter.com/Level3/status/639943238040862720):Screen-Shot-2015-09-08-at-10.22.19-AM

As we have mentioned in many posts, the beauty of this for Cedexis customers is that they experience NO outage during this time. When outages occur on Clouds, CDNs or data centers, Cedexis customers have their traffic automatically routed away from the problem area. This can only be done if you are taking billions of measurements of the public Internet providers from every ISP in the world. Go take a look at Radar Live right now and see what’s happening in and around your infrastructure.

Click here to save your job

Consumers care about availability. Quite a bit.

If a web or mobile application has good availability people usually like it.

If it has poor availability they hate it.

The antithesis of availability is an outage – either a micro-outage or a major outage.

If the Fantastic-4 represent availability, then Dr. Doom causes outages. 

Outages can ruin your day, your weekend, your month, your life.

Outages can cause firings. Nobody likes to get fired.

There is a fairly simple way to avoid outages – it’s well understood and time tested. An operations team must have at least N+1 redundancy at every link in the operational chain. Even better than N+1 is Active-Active, where the components have transparent failover with both portions taking load at all times. We will talk a little more about that below.

It is amazing how many seasoned IT professionals will agree with everything I just said.

They would never consider putting their servers in a data center that did not have multiple power generators with readily available fuel. Uninterrupted Power Supply (Battery) systems that could carry the entire DC load for a day. Multiple IP providers configured using BGP. Multiple switches and firewalls at the top of the stack. It is a basic best practice when selecting your data center and implementing a server farm.

In this article, Mr. Longbottom breaks out the considerations that an IT leader must analyze when deploying services within multiple data centers.

  1. Component failure – for example, where a power supply or a disk drive fails
  2. Assembly failure – for example, the failure of a complete server or a storage system
  3. Room failure – for example, through power distribution failure
  4. Building failure – for example, through fire or flood
  5. Site failure – for example, caused through a local power failure or a break in connectivity through cable/fibre fracture
  6. City failure – for example, due to major disruption such as terrorism activity, storm or power grid failure
  7. Regional failure – for example, due to major natural disaster such as earthquake or tsunami
  8. Country failure – for example, due to civil war or epidemic outbreak

That is a lot of crud that can fail. (He also mentions World Failure, but I think it’s a little premature to talk about data centers on Mars).

The interesting thing is, these leaders of IT who would never deploy services in a non-N+1 data center often change their tune when they move services to the Cloud and CDN, or even with regard to multiple origins.

So, why do so many people choose to bet their jobs, businesses and careers? Seems pretty stupid right?

We have spoken at length about the Cloud Maturity Model as well as the CDN Maturity Model.

It is really surprising how many IT leaders will just purchase one CDN and think that it is somehow different than just having one server in one data center. Likewise, it is surprising how many IT leaders will deploy on a single cloud instance – or a single cloud vendor. Obviously, we at Cedexis see the more visionary leaders that have already invested in moving to a multi-cloud or multi-CDN infrastructure. But, there are many who have not.

Because we have discussed the other topics in previous posts, today let’s talk a little about the importance of redundant origins.

First, what the heck do we mean by a redundant origin? It’s pretty simple. All video and web properties have a set of servers where the content originates. This is the origin.

If you have only a single origin, you have the risk. If your origin goes down, your viewers can no longer view your content. You go out of business.

Single-Origin-Dillemma

The single origin dilemma is: it is easy to maintain a single origin, but you risk losing everything.

Many people solve this problem using a Disaster Recover (DR) strategy.

DR is a simple idea – you set up a geographically distinct service that has identical features and content, and you switch to it when disaster strikes. This set of servers and infrastructure lies dormant until the time comes to use it. Then, it involves pushing a button like this.

do-not-press

That’s a scary red button. 

There are two big problems with this idea.

  1. A significant number of times that you ACTUALLY have to switch over, it fails. Untested infrastructure often fails when you actually need it. The reasons for this are as varied as the reasons for failure. Humans implement systems, and humans are flawed.
  2. Underutilization of company resources. The IP, servers, switches, data center, Clouds and CDN that you are not using could actually be deployed, helping your site in real time, and utilizing company investment.

Let’s look at this second item more closely. It’s the important point here.

Active-Active Load Balancing of your Origin

What is Active-Active? It’s simple – you basically take your DR site and make sure that traffic is running on it at all times – so it’s no longer a DR site with a scary red button. Instead, it is actively improving performance and ensuring 100% availability.

Multiple-Origin-Success

By doing this, you take advantage of the 2nd (or 3rd) installation by actually using it.

Rather than sit dormant, they are utilized and road tested. This ensures two things.

  1. Better performance from the origin. For dynamic sites, this is VERY important.
  2. A fully-tested live failover than can be trusted when the inevitable crash happens.

From my perspective, no one that does this is losing their job anytime soon, at least not because of a system outage.

Cedexis specializes in Multi-CDN and Multi-Cloud configurations that have 100% availability. For more on this topic, check out our solution brief on configuring this type of setup.

Subscribers to Multi-CDN See No Snag from Saturday's CDN Service Suspension

Saturday was not a particularly good day for one of the major CDNs. From about 01:56 MST (Sat Jul 25 08:56 UTC) to 03:44 MST (Sat Jul 25 10:44 UTC) this particular CDN was experiencing significant issues that resulted in near 0% available within the United States for around 1 hour. Something they did restored levels around 9:30 UTC but then they had a recurrence that took them back to 60% Availability for another 30 minutes or so before being fully restored. The below availability map shows their availability plummeting and then recovering. Down is bad – up is good.

The CDN has identified the issues as a DNS resolution issue. Its services are currently fully restored.

Major-CDN-Outage-July-25th-Availalibty-Very-Low

What about the rest of the world? Lets look at the Radar measurements for this CDN from a wider variety of Geographies. Again – this is availability so lower is bad.

Major-CDN-outage-wolrdwide-

From Poland to Australia – Germany to Korea this CDN was offline.

The interesting issue about this outage is that Cedexis has many customers that use this CDN. Because of this we were able to see how traffic was routed to various CDNs and how different Enterprises Availability was affected.

For instance here is the decision report for one of our customers during this time period. You can clearly see that the 2nd CDN picked up the traffic that was lost to the faltering CDN. This resulted in zero downtime for our client – even as 50% of its CDN infrastructure failed.

CDN-Outage-Decision-Report-2-CDNs-with-1-Failing-

Its important to note here – even if you are smart enough to multi-home your CDN traffic – if you are using Geo-routing or Round Robin your traffic will be black holed.

You only get the benefit of 100% availability by monitoring every CDN in real time – and routing traffic based on those measurements.

Lets take a look at another customer that has 4 CDNs. These 4 CDNs work in conjunction to provide this customer with top performance and 100% availability.

4-CDN-Setup-during-Service-Interuption-on-July-25

As you can see from the above to examples – CDN outages do not have to be your problem. We see literally 100’s of micro-outages a year (although to be clear – this one was a major worldwide outage). This is equally true for the Cloud. The smart enterprise will protect itself from outages by multihoming its infrastructure across multiple CDN and Cloud Vendors. For more on how easy it is to implement multi-CDN go check how we helped great brands like Accor hotels.

Subscribers to Multi-CDN See No Snag from Saturday’s CDN Service Suspension

Saturday was not a particularly good day for one of the major CDNs. From about 01:56 MST (Sat Jul 25 08:56 UTC) to 03:44 MST (Sat Jul 25 10:44 UTC) this particular CDN was experiencing significant issues that resulted in near 0% available within the United States for around 1 hour. Something they did restored levels around 9:30 UTC but then they had a recurrence that took them back to 60% Availability for another 30 minutes or so before being fully restored. The below availability map shows their availability plummeting and then recovering. Down is bad – up is good.

The CDN has identified the issues as a DNS resolution issue. Its services are currently fully restored.

Major-CDN-Outage-July-25th-Availalibty-Very-Low

What about the rest of the world? Lets look at the Radar measurements for this CDN from a wider variety of Geographies. Again – this is availability so lower is bad.

Major-CDN-outage-wolrdwide-

From Poland to Australia – Germany to Korea this CDN was offline.

The interesting issue about this outage is that Cedexis has many customers that use this CDN. Because of this we were able to see how traffic was routed to various CDNs and how different Enterprises Availability was affected.

For instance here is the decision report for one of our customers during this time period. You can clearly see that the 2nd CDN picked up the traffic that was lost to the faltering CDN. This resulted in zero downtime for our client – even as 50% of its CDN infrastructure failed.

CDN-Outage-Decision-Report-2-CDNs-with-1-Failing-

Its important to note here – even if you are smart enough to multi-home your CDN traffic – if you are using Geo-routing or Round Robin your traffic will be black holed.

You only get the benefit of 100% availability by monitoring every CDN in real time – and routing traffic based on those measurements.

Lets take a look at another customer that has 4 CDNs. These 4 CDNs work in conjunction to provide this customer with top performance and 100% availability.

4-CDN-Setup-during-Service-Interuption-on-July-25

As you can see from the above to examples – CDN outages do not have to be your problem. We see literally 100’s of micro-outages a year (although to be clear – this one was a major worldwide outage). This is equally true for the Cloud. The smart enterprise will protect itself from outages by multihoming its infrastructure across multiple CDN and Cloud Vendors. For more on how easy it is to implement multi-CDN go check how we helped great brands like Accor hotels.

Routing Adventures in Internet Land

Every morning, you navigate to websites from your home, in your car from your smartphone or from your corporate network.

In each case, you traverse at least three or more networks of three or more different operators. These three operators have probably made entirely different technical decisions in the course of building and managing their networks. All of this can sometimes affect your morning humor (and more importantly your user experience).

Consider a simple example: August 12, the Internet has experienced a major incident. Several world famous websites but also major telecom operators, CDN, Clouds of American and European hosting providers encountered connectivity issues. In just a few seconds, networks disappeared from the internet even as the servers worked perfectly. As strange as it may seem you, this is quite common and can result from voluntary actions (malicious, political / governement request, etc) or involuntary (incident during maintenance, incident of data center, fiber cut due to a squirrel, etc).

Internet, how it works?

The Internet is a series of networks (ASNs) that generate and exchange traffic. With our Radar installed by hundreds of customers worldwide, Cedexis detects around 40,000 ASN (Access Networks) and many millions of IP addresses that are announced behind these networks.

To interconnect networks, companies using routers (“hearts”) from different brands (Cisco, Juniper, Huawei, Alcatel …). These machines “speak” through BGP and uses a global routing table updated regularly to know where and who to send the traffic.

Over the years, with the growth in networks, subscribers, servers and terminals, the routing table has grown steadily to exceed coming soon just over 512,000 routes.

“So, it’s not our problem?” you will tell us. Obviously, yes. Except when people use materials that were not provided (or simply upgraded) to exceed the maximum size of 512,000 routes. Yet the manufacturers themselves have provided solutions to increase the value of size.

On 12 August, someone (ASN) announced more than 512,000 routes (read also here). Logically, it has spread from network to network (router to router) as is the functioning of the global network … and that’s only a few companies have started to sweat. What for? Because routers have not been upgraded to accommodate more than 512,000 routes – even though this problem is not new and people have known for some time that it must be done.

Result: when they have to take into account an abnormal number of routes, routers that are not up-to-date have crashed and have disappeared from the internet for several minutes or hours, the time that the teams of ASN impacted try to identify the origin of incident, correct the problem and reboot routers. We saw this one the 12th via our Radar – both in Europe and the United States and for many companies, some very well known.

Thousands of disgruntled tweets from around the world have shown great frustration because of not being able to use certain services from early morning of August 12. This went on all day.

Imagine the loss of revenue for many companies selling online, VoIP operators or SaaS companies who propose services to businesses that depend on affected providers.

The Internet infrastructure is a type of magic with his incredible ability to withstand all kinds of attacks and huge waves of traffic but fails when people are not doing their job properly. In defense of these providers, finding the right maintenance window to restart a router – that is one of the most critical network elements – while minimizing disruption of service is a daunting task can sometimes cause unfortunate side effects.

While we can not “officially” say that every service provider or online content editor was down for several hours the day of august, 12 – caused by the “512k routes outage”, several reports do not lie.

On our side, we can congratulate ourselves again of Openmix work perfectly fulfilled its role. Our choice to distribute the infrastructure across multiple anycast networks in 50 data centers and checking each potential vulnerable point (including eg, network equipment) is relevant.

Internet is composed of thousands of networks and hundreds of thousands of engineers all who learn from their mistakes. Now, all this will be corrected in the coming hours and tomorrow will be another day on the Internet with a lot of invisible outages… almost.

Journalists: Need to speak with an expert? Contact us by mail or Twitter @Cedexis

Internet hiccup: Trouble in Europe today

I love the man that can smile in trouble, that can gather strength from distress, and grow brave by reflection. ‘Tis the business of little minds to shrink, but he whose heart is firm, and whose conscience approves his conduct, will pursue his principles unto death.
–Thomas Paine

Like a strong turkish coffee, there seems to be some trouble brewing today in Europe. Let me share some screen shots of the data we are seeing here at Cedexis. Since we do not know whom the culprit is at this time – we will redact the names.

Here what you see is a single CDN that has presence in Europe being measured from Belgium, Netherlands, France, Switzerland and Germany. The particular CDN had a serious dip in Availability that lasted well over an hour.

 Now let us take a look at 3 different CDNs from a single location – for instance Switzerland.

 As you can see multiple CDNs had similar issues in the same timeframes. This suggest a broader connectivity issue. Also it is important to note that not all of the CDNs were equally affected. We come back to this below.

Let us take a look at another CDN (different from the first one) that and look at it from multiple locations as we did in the first diagram. Interestingly enough – while there are issues from France, Switzerland and the Netherlands – they maintained good availability from some of the other countries. Again – this suggests that there seems to be some connectivity issues that are affecting these providers but that are not universal – so that some CDN’s that are multi-homed in the correct ways are avoiding these issues – at least from some countries.

A different CDN from Europe

Next we measure a sample of 5 different European cloud providers from the US. These providers are primarily in Germany and France. As you can see – there are significant availability issues from the US.

These are clues – clues that there is something rotten in Denmark – or more appropriately – something rotten with the connectivity in Europe. If there is any doubt in your minds about the relationship of Availability and latency – go read this blog. We will update this posting as more facts come to light.

If you want to understand more about this we suggest you open a free Cedexis account and see the state of the internet for yourself!

UPDATE: To see the explanation for this outage go here

Major outage by Major CDN provider. What RUM tells us. And why a Multi-CDN strategy makes the most sense.

Last Thursday, 17th of July, one of the big CDNs had a major outage. The duration was between 45 and 70 minutes – depending on what ISP you were on. We have spoken a great deal recently about Micro-outages and their impact. The outage last thursday was a different beast. It was a pretty complete outage (meaning availability was severely diminished for a significant amount of time). I thought it would be interesting to compare this outage to a Micro-outage to understand the differences.

What you see above is a heat map of what this outage looks like from most of the top talker networks around the world. Real user measurements (over 2 billion a day) from over 40k networks worldwide registered this outage. As you can see – this was a significant outage measured both in severity and in length of time. Little to no traffic was being passed at this time from this CDN. If this were a storm in the Atlantic it would get a name.

The availability from some locations dropped to as low as 2%. Other networks such as Cox Communications measured a 76 minutes outage and over that time averaged 27% availability. For this one network we took over 7400 measurements during this time. Google registered 19.4% availability for 5809 measurements over a 58 minute period. Sprint had almost 6000 measurements over a 58 minute period where it averaged 21.4% availability. Suffice it to say, end users that were trying to consume content were not having a quality experience. The enterprises that were using this CDN were losing customers. At least that would be true for any enterprises singly-homed to this CDN.

There is a strong movement within the enterprises that utilize CDNs to use a Multi-CDN setup to protect themselves from this type of Single Point Of Failure (SPOF). Surveys have shown that the vast majority of companies that are spending more than $100k MRR have already adopted a multi-CDN strategy. Technology providers like Cedexis have allowed companies with a much more modest spend to now adopt these risk mitigation strategies. The barrier to entry is much lower than it used to be.

Let’s take a look at what this outage looked like relative to other providers. Below what you see is a screen shot of the last 7 days measuring Availability. There are 3 CDN’s listed and you can see all of their availability was virtually the same before and after the incident. However – for the period of time that the outage occurred one of the CDN’s had severely diminished ability.

So if there were a technology that could dynamically ingest this RUM data in realtime and signal process it in such a way as to provide dynamic data driven CDN load balancing enterprises could avoid outages by having 2 or more CDN partners and directing the traffic to the provider that had the best availability and performance for every request. Well the good news is – there is – its called Cedexis Openmix. We recommend it for any customer that has ever had a CDN outage – or any customer that wants to avoid ever having one. Go here to learn more about Openmix and how it might save your bacon, you job or your website.

**BTW – for anyone who is curious about why we redact the names of CDN’s that had outages there are a couple of reasons. This blog is a public forum and these CDNs are our partners. Every CDN and cloud has outages and we are not here to lambast any provider. Quite the opposite – we are in partnership with them to make the Internet a batter place. For the morbidly curious you can create a free Cedexis account and look at the public Radar data yourself.