Posts

Announcing Cedexis Netscope: Advanced Network Performance and Benchmarking Analysis

The Cedexis Radar community collects tens of billions of real user monitoring data points each day, giving Cedexis users unparalleled insight into how applications, videos, websites, and large file downloads are actually being experienced by their users. We’re excited to announce a product that offers a new lens into the Radar community dynamic data set: Cedexis Netscope.

Know how your service stacks up, down to the IP subnet
Metrics like network throughput, availability, and latency don’t tell the whole story of how your service is performing, because they are network-centric, not user-centric: however comprehensively you track network operations, what matters is the experience at the point of consumption. Cedexis Netscope provides you with additional user-centric context to assess your service, namely the ability to compare your service’s performance to the results of the “best” provider in your market. With up-to-date Anonymous Best comparative data, you’ll have a data-driven benchmark to use for network planning, marketing, and competitive analysis.

Highlight your Service Performance:

  • Relative to peers in your markets
  • In specific geographies
  • Compared with specific ISPs
  • Down to the IP Sub-net
  • Including both IPv4 and IPv6 addresses
  • Comprehensive data on latency or throughput
  • Covering both static and dynamic delivery

Actionable insights
Netscope provides detailed performance data that can be used to improve your service for end users. IT Ops teams can use automated or custom reports to view performance from your ASN versus peer groups in the geographies you serve. This lets you fully understand how you stack up versus the “best” service provider, using the same criteria. Real-time logs organized by ASN can be used to inform instant service repairs or for longer-term planning.

Powered by: the world’s largest user experience community
Real User Monitoring (RUM) means fully understanding how internet performance impacts customer satisfaction and engagement. Cedexis gathers RUM data from each step between the client and any of the clouds, data centers, and CDNs hosting your applications to build a holistic picture of internet health. Every request creates more data, continuously updating this unique real-time virtual map of the web.

Data and alerts, your way
To effectively evaluate your service and enable real-time troubleshooting, Netscope lets you roll up data by the ASN, country, region, or state level. You can zoom in within a specific ASN at the IP subnet level, to dissect the data in any way your business requires. This data will be stored in the cloud on an ongoing basis. Netscope also allows users to easily set up flexible network alerts for performance and latency deviations.

Netscope helps ISP Product Managers and Marketers better understand:

  • How well users connect to the major content distributors
  • How well users/business connect to public clouds (AWS, Google Cloud, Azure, etc.)
  • When, where, and how often outages and throughput issues happen
  • What happens during different times of day
  • Where are the risks during big events (FIFA World Cup, live events, video/content releases)
  • How service on mobile looks versus web
  • How the ISP stacks up v. ”the best” ISP  in the region

Bring Advanced Network analysis to your network
Netscope provides a critical data set you need for your network planning and enhancement. With its real-time understanding of worldwide network health, Netscope gives you the context and actionable data you need to delight customers and increase your market share.

Ready to use this data with your team?

Set up a demo today

 

Introducing the All New Sonar: a cloud-native synthetic testing tool for any infrastructure

I never guess. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
Sir Arthur Conan Doyle

Synthetic monitoring built for hybrid cloud
Sonar tests all of your endpoints: in your public clouds, private clouds, data centers, or CDNs. This provides a comprehensive and uniform view of the overall health of your applications delivery, no matter what the status of your various infrastructure components happens to be.
Sonar’s proactive testing acts like a virtual end user, testing to see how an application, video, or large file download would be experienced by your global customers. Being able to test your app from nine locations worldwide helps ensure your data has incredibly low latency, and therefore is actually usable for your app delivery strategy.

Ultra-low latency Synthetic monitoring, refreshed up to every other second
Public cloud users are probably used to having access to some sort of synthetic app testing functionality, as a core part of the services offered by the individual cloud provider. Where many cloud services offer services that check for availability every 30 to 120 seconds, Sonar offers checks as frequently as every two seconds. Data that’s updated every few minutes really isn’t meaningful for a solution that need to make real-time, automated delivery decisions. Not to mention the question of data objectivity when source information comes from the provider of the infrastructure being monitored.

Monitoring is passive. Cedexis is insight + action.
What makes Sonar different to other synthetic testing agents is that Sonar data can be used to shape application delivery decisions in real-time. Data collected by Sonar feeds directly into the Cedexis application delivery platform, which uses fully user-configurable algorithms to route traffic to the endpoints that deliver the highest customer experience at the lowest operational cost. Owing to the frequent health checks, and the rapid calculation of optimal traffic routes, Cedexis provides the lowest-latency cloud-based application delivery service available, with automated delivery decisions being made to route around traffic congestion less than 10 seconds after problems initially arise. By contrast, most cloud services, with less frequent synthetic checks and slower decisioning engines, may be expected to take as much as two to four times as long to respond to emerging issues.

Better data means better decisions.
Delivering applications over the internet, like all interactions with complex, dynamic systems, ultimately meets success or failure based on the data you use for making decisions. In this case, decisions are the “real-time” application delivery choices your platform makes to ensure apps and video reach your customers in a way that produces a great user experience. Using real user monitoring like Radar – the world’s largest real-time user experience community – provides data you can use to make automated delivery decisions on your hybrid infrastructure. But to enable your application delivery logic to fully understand and optimize delivery for all of your customers and potential customers worldwide, you need to proactively test networks. That’s where Cedexis’ Sonar functionality comes in.

The three pillars of Application Delivery
Cedexis application delivery platform is powered by three powerful services:

  • Radar: the world’s largest community of instantaneous and actionable user experience data
  • Fusion: a powerful 3rd party data ingestion tool that makes APM, Local Load Balancer, cloud metrics, and any other dataset actionable in delivery logic
  • [NEW!]: Sonar: a massively scalable and architecture-agnostic synthetic testing tool that is immune to the latency issues of proprietary cloud tools

 

The Cedexis application delivery platform automates and optimizes the customer experience for apps, video, and static content while minimizing cloud and content delivery costs. This is done by combining billions of real user data points from over 50,000 networks, Sonar synthetic testing data, and any other dataset you use to optimize delivery based on real user data from our entire network (not just your customers).
If you haven’t created a Cedexis portal account yet, now’s the time. You can set up your global application delivery in a few minutes and see how Sonar works for yourself.   

Cedexis Solves Avoidable Outages in Real-Time

Portland, Ore. – August 15, 2017 Cedexis, the leader in crowd-optimized application and content delivery for clouds, CDNs and data centers, today announced the release of its connected Sonar service, which uses low-latency synthetic monitoring to eliminate costly and avoidable outages by ensuring consistent application delivery. Providing exceptional quality of experience (QoE) to application consumers by eliminating outages and slowdowns is at the heart of building a profitable and sustainable cloud-native service, but has proven to be an elusive goal.

Synthetic monitoring uses programmed requests of customer-designated endpoints to validate, on an ongoing basis, that those endpoints are available for use. Until now, the marketplace has offered only two substantial choices:

  • Implement disconnected synthetic monitoring, which requires manual intervention when problems arise. The time from anomaly detection to resolution can range from minutes to hours, often resulting in prolonged outages and slowdowns.
  • Implement cloud vendor-specific synthetic monitoring, which delivers automatic intervention when problems arise. The time from anomaly detection to resolution is measured in just minutes, but is generally restricted to re-routing within that vendor’s cloud infrastructure, resulting in shorter, but still meaningful, outages and slowdowns.

With the release of Sonar, there is now a third, and more effective, option:

  • Implement Sonar connected synthetic monitoring, which delivers an automatic intervention when problems arise. The time from anomaly detection to resolution is measured in just seconds, and traffic can be re-routed across and between substantially any infrastructure (from data center to hosting facility to cloud provider), resulting in the elimination of most outages and slowdowns entirely.

Sonar is able to reduce the MTTR (mean time to repair) – and thus prevent consumer-visible outages and slowdowns – automatically owing to two key characteristics:

  1. Sonar is connected to the broader Cedexis application delivery platform. As such, the data that is collected automatically flows into the Openmix global traffic manager, which is able to adjust its traffic routing decisions in just seconds.
  2. Sonar is configurable to run endpoint tests as frequently as every two seconds, providing up-to-date telemetry moving at the speed of the Internet. By contrast, cloud vendor-specific solutions often limit their testing frequency to 30 – 120 second intervals – far from sufficient to contend with rapidly-evolving global network conditions.

“Synthetic monitoring data, as a core input to traffic routing decisions, must be accurate, frequent, and rapidly integrated into algorithms,” said Josh Gray, Chief Architect at Cedexis. “However, data is only as valuable as the actions that it can automatically activate. Updating traffic routing in just seconds is the key to making outages a thing of the past and ensuring unparalleled user experience.”

The updated Sonar synthetic monitoring service enhances the industry-leading actionable intelligence that already powers the Openmix global traffic management engine. The Cedexis application delivery platform (ADP) uniquely uses three different sources of actionable data to ensure the smoothest internet traffic logistics:

  • Radar: the world’s largest community of instantaneous user experience data
  • Fusion: the powerful 3rd party data ingestion tool that makes APM, Local Load Balancer, Cloud metrics, and any other dataset actionable in delivery logic
  • Sonar: a massively scalable and architecture-agnostic synthetic testing tool that is immune to the latency issues of proprietary cloud tools

“No global traffic management platform can provide reliable, real-time traffic shaping decisions without access to accurate, actionable data,” noted Ryan Windham, Cedexis CEO. “The evolution of Sonar to provide industry-leading latency levels confirms our commitment to delivering an end to avoidable outages.”

Cloud-First + DevOps-First = 81% Competitive Advantage

We recently ran across a fascinating article by Jason Bloomberg, a recognized expert on agile digital transformation, that examines the interplay between Cloud-First and DevOps-First models. That article led us, in turn, to an infographic centered on some remarkable findings from a CA Technologies survey of 900-plus IT pros from around the world. The survey set out to explore the synergies between Cloud and DevOps, specifically in regards to software delivery. You can probably guess why we snapped to attention.

The study found that 20 percent of the organizations represented identified themselves as being strongly committed to both Cloud and DevOps, and their software delivery outperformed other categories (Cloud only, DevOps only, and slow adopters) by 81 percent. This group earned the label “Delivery Disruptors” for their outlying success at maximizing agility and velocity on software projects. On factors of predictability, quality, user experience, and cost control, the Disruptor organizations soared above those employing traditional methods, as well as Cloud-only and DevOps-only methods, by large percentages. For example, Delivery Disruptors were 117 percent better at cost control than Slow Movers, and 75 percent better in this category than the DevOps-only companies.

These findings, among others, got us to thinking about the potential benefits and advantages such Delivery Disruptors can gain from adding Cedexis solutions into their powerful mix. Say, for example, you have agile dev teams working on new products and apps and you want to shorten the execution time for new cloud projects. To let your developers focus on writing code, you need an app delivery control layer that supports multiple teams and architectures. With the Cedexis application delivery platform, you can support agile processes, deliver frequent releases, control cloud and CDN costs, guarantee uptime and performance, and optimize hybrid infrastructure. Your teams get to work their way, in their specific environment, without worrying about delivery issues looming around every corner.

Application development is constantly changing thanks to advances like containerization and microservice architecture — not to mention escalating consumer demand for seamless functionality and instant rich media content. And in a hybrid, multi-cloud era, infrastructure is so complex and abstracted, delivery intelligence has to be embedded in the application (you can read more about what our Architect, Josh Gray, has to say about delivery-as-code here).

To ensure that an app performs as designed, and end users have a high quality experience, agile teams need to automate and optimize with software-defined delivery. Agile teams can achieve new levels of delivery disruption by bringing together global and local traffic management data (for instance, RUM, synthetic monitoring results, and local load balancer health), application performance management, and cost considerations to ensure the optimal path through datacenters, clouds, and CDNs.

Imagine the agility and speed a team can contribute to digital transformation initiatives with fully automated app delivery based on business rules, actionable data from real user and synthetic testing, and self-healing network optimizations. Incorporating these capabilities with a maturing Cloud-first and DevOps-first approach will likely put the top performers so far ahead of the rest of the pack, they’ll barely be on the same racetrack.

 

 

The Cloud Is Coming

Still think the cloud (or should that be The Cloud?)is a possible-but-not-definite trend? Take a look at IDC’s projection of IT deployment types:

Credit: Forbes

So much to unpack! What really jumps out is that

  • Traditional data centers drop in share, but hang in there around 50%: self-managed hardware will be a fact of life as far out as we can see
  • Public cloud will double by 2021, but it isn’t devouring everything, because in the final analysis no Operations team wants to give up all control
  • Private cloud expands rapidly, as the skills to use the technology become more widespread
  • But most importantly…in the very near future, most every shop will likely be running a hybrid network, which combines traditional data centers, private cloud deployments, public clouds for storage and computation, and CDNs for delivery (don’t forget that Cisco famously predicted over half of all Internet traffic would traverse a CDN by the year after next)

It’s a brave new world, indeed, that has so many options in it.

If it is true, though, that cloud computing will be a $162B a year business by 2020 (per Gartner), and that 74% of technology CFOs say cloud computing will have the most measurable impact on their business in 2017, that means this year will end up having been one of upheaval, and of transformation. As ever more complex permutations of public/private infrastructure hit the market, the challenges of keeping everything straight will rapidly multiply: can one truly be said to be optimizing if one cannot centralize the tracking and traffic management for all resources, regardless of whether they’re in your own NOC, under Amazon’s tender care in Virginia, or located at some unidentified POP somewhere in Western Europe?

The truth is that, as with all transformations, this move to hybrid networks will be marked by the classic Hype Cycle:

We are fast approaching the Peak of Inflated Expectations; the sudden fall into the Trough of Disillusionment will be precipitated by the realization that there are now so many different sources of computation in the mix that nobody is quite sure where the savings are. Perhaps we’re saving money by using different CDNs in different geographies – but it’s hard to tell if we’re balancing for economic benefit; perhaps we’re making the right move by storing all our images on a global cloud, but it’s hard to tell whether adding a second (with the inevitable growth in storage fees) would result in faster audience growth; perhaps we’re right to avoid sending content requests back to origin, but at the same time, that seems like a lot of resources to not use.

The Slope of Enlightenment will hit when the tools come along to put all the metrics of all the elements of the hybrid network onto a single pane: balancing between nodes that are, at an abstract level at least, equally measurable, configurable, and tunable will start us down the path to the Plateau of Productivity.

The Cloud is coming; how long we spend in the Trough of Disillusionment trying to figure out how to make it hum like a well-oiled machine is assuredly on us.

Don’t Be Afraid of Microservices!

Architectural trends are to be expected in technology. From the original all-in-one-place Cobol behemoths half the world just learned existed because of Hidden Figures, to three-tiered architecture, to hyper-tier architecture, to Service Oriented Architecture….really, it’s enough to give anyone a headache.

And now we’re in a time of what Gartner very snappily calls Mesh App and Service Architecture (or MASA). Whether everyone else is going for that particular nomenclature is less relevant than the reality that we’ve moved on from web services and SOA toward containerization, de-coupling, and the broadest possible use of microservices.

Microservices sound slightly disturbing, as though they’re very, very small components, of which one would need dozens if not hundreds to do anything. Chris Richardson of Eventuate, though, recently begged us not to assume that just because of the name these units are tiny. In fact, it makes more sense to think of them as ‘hyper-targeted’ or ‘self-contained’ services: their purpose should be to execute a discrete set of logic, which can exist in isolation, and simply provide easily-accessed public interfaces. So, for instance, one could imagine a microservice whose sole purpose was to find the best match from a video library for a given user: requesting code would provide details on the user, the service would return the recommendation. Enormous amounts of sophistication may go into ingesting the user-identifying data, relating it to metadata, analyzing past results, and coming up with that one shining, perfect recommendation…but from the perspective of the team using the service, they just need to send a properly-formed request, and receive a properly-formed response.

The apps we all rely upon on those tiny little computers we carry around in our pocketbooks or pockets (i.e. smart phones) fundamentally rely on microservices, whether or not their developers thought to describe them that way. That’s why they sometimes wake up and spring to life with goodness…and sometimes seem to drag, or even fail to get going. They rely upon a variety of microservices – not always based at their own home location – and it’s the availability of all those microservices that dictates the user experience. If one microservice fails, and is not dealt with elegantly by the code, the experience becomes unsatisfactory.

If that feels daunting, it shouldn’t – one company managed to build the whole back end of a bank on this architecture.

Clearly, the one point of greatest risk is the link to the microservice – the API call, if you will. If the code calls to a static endpoint, the risk is that that endpoint isn’t available for some reason; or at least, is unavailable at an acceptable speed. This is why there are any number of solutions for trying to ensure the microservice is available, often spread between authoritative DNS services (which essentially take all the calls for a given location and then assign them to backend resources based on availability), and application delivery controllers (generally physical devices that perform the same service). Of course if either is down, life gets tricky quickly.

In fact, the trick to planning for highly available microservices is to embed locations that are managed by a cloud-based application delivery service. In other words, as the microservice is required, a call goes out to a location that combines both synthetic and real-user measurements to determine the most performant source and re-direct the traffic there. This compounds the benefits of the microservice architecture: not only can the microservice itself be maintained and updated independently of the apps that use it, so too the network and infrastructure necessary to its smooth and efficient delivery can be tweaked without affecting existing users.

Microservices are the future. To make the most, first ensure that they independently address discrete purposes; then make sure that their delivery is similarly defined and flexible without recourse to updating apps that use them; then settle back and watch performance meet innovation.

Live and Generally Available: Impact Resource Timing

We are very excited to be officially launching Impact Resource Timing (IRT) for general availability.

IRT is Impact’s powerful window into the performance of different sources of content for the pages in your website. For instance, you may want to distinguish the performance of your origin servers relative to cloud sources, or advertising partners; and by doing so, establish with confidence where any delays stem from. From here, you can dive into Resource Timing data sliced by various measurements over time, as well as through a statistical distribution view.

What is Resource Timing? Broadly speaking, resource timing measures latency within an application (i.e. browser). It uses JavaScript as the primary mechanism to instrument various time-based metrics of all the resources requested and downloaded for a single website page by an end user. Individual resources are objects such as JS, CSS, images and other files that the website pages requests. The faster the resources are requested and loaded on the page, the better quality user experience (QoE) for users.  By contrast, resources that cause longer latency can produce a negative QoE for users.  By analyzing resourcing timing measurements, you can isolate the resources that may be causing degradation issues for your organization to fix.  

Resource Timing Process:

Cedexis IRT makes it easy for you to track resources from identified sources, normally identified through domain (*.myDomain.com), by sub-domain(e.g. images.myDomain.com), and by the provider serving your content. In this way, you can quickly group together types of content, and identify the source of any latency. For instance, you might find that origin-located content is being delivered swiftly, while cloud-hosted images are slowing down the load time of your page; in such a situation, you would now be in a position to consider a range of solutions, including adding a secondary cloud provider and a global server load balancer to protect QoE for your users.

Some benefits of tracking Resource Timing.

  • See which hostnames  – and thus which classes of content – are slowing down your site.
  • Determine which resources impact your overall user experience.
  • Correlate resource performance with user experience.

Impact Resource Timing from Cedexis allows you to see how content sources are performing across various measurement types such as Duration, TCP Connection Time, and Round Trip Time. IRT reports also give you the ability to drill down further by Service Providers, Locations, ISPs, User Agent (device, browsers, OS) and other filters.

Check out our User Guide to learn more about our Measurement Type calculations.

There are two primary reports in this release of Impact Resource Timing. The Performance report, which gives you a trending view of resource timing over time and the Statistical Distribution report, which reports Resource Timing data through a statistical distribution view.  Both reports have very dynamic reporting capabilities that allow you to easily pinpoint resource-related issues for further analysis.  


Using the Performance report, you can isolate which grouped resources are causing potential end user experience issues by hostname, page or service provider and when the issue happened. Drill down even further to see if this was a global issue or localized to a specific location or if it was by certain user devices or browsers.  

IRT is now available for all in the Radar portal – take it for a spin and let us know your experiences!

Why The Web Is So Congested

If you live in a major city like London, Tokyo, or San Francisco, you learn one thing early: driving your car through the city center is about the slowest possible way to get around. Which is ironic, when you think about it, as cars only became popular because they made is possible to get around more quickly. There is, it seems, an inverse relationship between efficiency and popularity, at least when it comes to goods that pass through a public commons like roads.

Or like the Internet.

Think about all that lovely 4K video you could be consuming if there was nothing between you and your favorite VOD provider but a totally clear fiber optic cable. But unless you live in a highly over-provisioned location, that’s exactly not what’s going on; rather, you’re lucky to get a full HD picture, and even luckier if it stays at 1080p, without buffering, all the way through. Why? Because you’re sharing a public commons – the Internet – and its efficiency is being chewed away by popularity.

Let’s do some math to illustrate this,

  • Between 2013 and January 2017 the number of web users increased by 1.4 billion people to just over 3.7 billion. Today Internet penetration is at 50% (or put another way – half the world isn’t online yet)
  • In 2013, the average amount of Internet data per person was 7.9G per month; by 2015 it was 9.9G, with Cisco expecting it to reach over 25Gb by 2020 – so assume something in the range of 15Gb by 2017.
  • Logically, then in 2013 web traffic would have been around 2.3B * 7.9G per months (18.1t exabytes), by 2015 it would have been  3.7B * 17Gb per month (62.9 exabytes)
  • If we assume another billion Internet users by 2020, we’re looking at 4.7B & 25Gb per month – or a full 117.5 exabytes

In just seven years, the monthly web traffic will have grown 600% (based on the math, anyway: Cisco is estimating closer to 200 exabytes monthly by 2020).

And that is why the web is so busy.

But it doesn’t describe why the web is congested. Congestion happens when there is more traffic than transit space – which is why, as cities get larger and more populous, governments add lanes to major thoroughfares, meeting the automobile demand with road supply.

Unfortunately, unlike cars on roads, Internet traffic doesn’t travel in straight lines from point to point. So even though infrastructure providers have been building out capacity at a madcap pace, it’s not always connected in such a way that makes transit efficient. And, unlike roads, digital connections are not built out of concrete, and often become unavailable – sometimes for a long time that causes consternation and PR challenges, and sometimes just for a minute or so, stymying a relative handful of customers.

For information to get from A to B, it has to traverse any number of interconnected infrastructures, from ISPs to the backbone to CDNs, and beyond. Each is independently managed, meaning that no individual network administrator can guarantee smooth passage from beginning to end. And with all the traffic that has been – and will continue to be – added to the Internet, it has become essentially a guarantee that some portion of content requests will bump into transit problems along the way.

Let’s also note that the modern Internet is characterized less by cat memes, and more by the delivery of information, functionality, and ultimately, knowledge. Put another way, the Internet today is all about applications: whether represented as a tile on a smart phone home screen, or as a web interface, applications deliver the intelligence to take the sum total of all human knowledge that is somewhere on the web and turn it into something we can use. When you open social media, the app knows who you want to know about; when you consult your sports app, it knows which teams you want to know about first; when you check your financial app, it knows how to log you in from a fingerprint and which account details to show first. Every time that every app is asked to deliver any piece of knowledge, it is making requests across the Internet – and often multiple requests of multiple sources. Traffic congestion doesn’t just endanger the bitrate of your favorite sci fi series – it threatens the value of every app you use.

Which is why real-time predictive traffic routing is becoming a topic that web native businesses are digging deeper into. Think of it as Application Delivery for the web – a traffic cop that spots congestion and directs content around it, so that it’s as though it never happened. This is the only way to solve for efficient routing around a network of networks without a central administrator: assume that there will be periodic roadblocks, and simply prepare to take a different route.

The Internet is increasingly congested. But by re-directing traffic to the pathways that are fully available, it is possible to get around all those traffic jams. And, actually, it’s possible to do today.

Find out more by reading the story of how Rosetta Stone improved performance for over 60% of their worldwide customers.

 

Better OTT Quality At Lower Cost? That Would Be Video Voodoo

According to the CTA, streaming video now claims as many subscribers as traditional Pay TV. Another study, from the Leichtman Research Group proposed that more households have streaming video than have a DVR. However accurate – or wonkily constructed – these statistics, what’s not up for grabs is that more people than ever are getting a big chunk of their video entertainment over the Web. Given the infamous AWS outage, this means that providers are constantly at risk of seeing their best-laid-plans laid low by someone’s else’s poor typing skills.

Resiliency isn’t a nice-to-have, it’s a necessity. Services that were knocked out last week owing to AWS’ challenges were, to some degree, lucky: they may have lost out on direct revenue, but their reputations took no real hit, because the core outage was so broadly reported. In other words, everyone knew the culprit was AWS. But it turns out that outages happen all the time – smaller, shorter, more localized ones, which don’t draw the attention of the global media, and which don’t supply a scapegoat. In those circumstances, a CDN glitch is invisible to the consumer, and is therefore not considered: when the consumer’s video doesn’t work, only the publisher is available to take the blame.

It’s for this reason that many video publishers that are Cedexis customers first start to look at breaking from the one-CDN-to-rule-them-all strategy, and look to diversify their delivery infrastructure. As often as not,this starts as simply adding a second provider: not so much as an equal partner, but as a safety outlet and backup. Openmix intelligently directs traffic, using a combination of community data (the 6 billion measurements we collect from web users around the world each day) and synthetic data (e.g. New Relic and CDN records). All of a sudden, event though outages don’t stop happening, they do stop being noticeable because they are simply routed around. Ops teams stop getting woken up in the middle of the night, Support teams stop getting sudden call spikes that overload the circuits, and PR teams stop having to work damage control.

But a funny thing happens once the outage distractions stop: there’s time catch a breath, and realize there’s more to this multi-CDN strategy than just solving a pain. When a video publisher can seamlessly route between more than one CDN, based on its ability to serve customers at an acceptable quality level, there is a natural economic opportunity to choose the best-cost option – in real time. Publishers can balance traffic based simply on per-Gig pricing; ensure that commits are met, but not exceeded until every bit of pre-paid bandwidth throughout the network is exhausted; and distribute sudden spikes to avoid surge pricing. Openmix users have reported seeing cost savings that reach low to mid double-digit percentages – while they are delivering a superior, more consistent, more reliable service to their users.

Call it Video Voodoo: it shouldn’t be possible to improve service reliability and reduce the cost of delivery…and yet, there it is. It turns out that eliminating a single point of failure introduces multiple points of efficiency. And, indeed, we’ve seen great results for companies that already have multiple CDN providers: simply avoiding overages on each CDN until all the commits are met can deliver returns that fundamentally change the economics of a streaming video service.

And changing the economics of streaming is fundamental to the next round of evolution in the industry. Netflix, the 800 pound gorilla, has turned over more than $20 billion in revenue the last three years, and generated less than half a billion in net margin, a 5% rate; Hulu (privately- and closely-held) is rumored to have racked up $1.8B in losses so far and still be generating red ink on some $2B in revenues. The bottom line is that delivering streaming video is expensive, for any number of reasons. Any engine that can measurably, predictably, and reliably eliminate cost is not just intriguing for streaming publishers – it is mandatory to at least explore.

Amazon Outage: The Aftermath

Amazon AWS S3 Storage Service had a major, widely reported, multi-hour outage yesterday in their US-East-1 data center. The S3 service in this particular data center was one of the very first services Amazon launched when it introduced cloud computing to the world more than 10 years ago. It’s grown exponentially since–storing over a trillion objects and servicing a million requests/second supporting thousands of web properties (this article alone lists over 100 well-known properties that were impacted by this outage).

Amazon has today published a description of what happened. The summary is that this was caused by human error. One operator, following a published run book procedure, mis-typed a command parameter setting a sequence of failure events in motion. The outage started at 9:37 am PST.  A nearly complete S3 service outage lasted more than three hours and full recovery of other AWS S3-dependent services lasted several hours more.

A few months ago, Dyn taught the industry that single-sourcing your authoritative DNS creates the risk the military described as two is one, one is none. This S3 incident underscores the same lesson for object storage. No service tier is immune. If a website, content, service or application is important, redundant alternative capability at all layers is essential. And this requires appropriate capabilities to monitor and manage this redundancy. After all, fail-over capacity is only as good as the system’s ability to detect the need to, and to actually, failover. This has been at the heart of Cedexis’ vision since the beginning, and as we continue to expand our focus in streaming/video content and application delivery, this will continue to be an important and valuable theme as we seek to improve the Internet experience of every user around the world.

Even the very best, most experienced services can fail. And with increasing deconstruction of service-oriented architectures, the deeply nested dependencies between services may not always be apparent. (In this case, for example, the AWS status website had an underlying dependency on S3 and thus incorrectly reported the service at 100% health during most of the outage.)

We are dedicated to delivering data-driven, intelligent traffic management for redundant infrastructure of any type. Incidents like this should continue to remind the digital world that redundancy, automated failover, and a focus on the customer experience are fundamental to the task of delivering on the continued promise of the Internet.