More Than Science Fiction: Why We Need AI for Global Data Traffic Management

Originally published in Product Design and Development website

by Josh Gray, Chief Architect, Cedexis

Blogpost

 

 

 

 

 

 

Blade Runner 2049 struck a deep chord with science fiction fans. Maybe it’s because there’s so much talk these days of artificial intelligence and automation — some of it doom and gloom, some of it utopian, with every shade of promise and peril in between. Many of us, having witnessed the Internet revolution first hand — and still in awe of the wholesale transformation of commerce, industry, and daily life — find ourselves pondering the shape of the future. How can the current pace of change and expansion be sustainable? Will there be a breaking point? What will it be: cyber (in)security, the death of net neutrality, or intractable bandwidth saturation?

Only one thing is certain: there will never enough bandwidth. Our collectively insatiable need for streaming video, digital music and gaming, social media connectivity, plus all the cool stuff we haven’t even invented yet, will fill up whatever additional capacity we create. The reality is that there will always be buffering on video — we could run fiber everywhere, and we’d still find a way to fill it up with HD, then 4K, then 8K, and whatever comes next.

Just like we need smart traffic signals and smart cars in smart cities to handle the debilitating and dangerous growth of automobile traffic, we need intelligent apps and networks and management platforms to address the unrelenting surge of global Internet traffic. To keep up, global traffic management has to get smarter, even as capacity keeps growing.

Fortunately, we have Big Data metrics, crowd-sourced telemetry, algorithms, and machine learning to save us from breaking the Internet with our binge watching habits. But, as Isaac Asimov pointed out in his story Runaround, robots must be governed. Otherwise, we end up with a rogue operator like HAL, an overlord like Skynet, or (more realistically) the gibberish intelligence of the experimental Facebook chatbots. In the case of the chatbots, the researchers learned a valuable lesson about the importance of guiding and limiting parameters: they had neglected to specify use of recognizable language, so the independent bots invented their own.

In other words, AI is exciting and brimming over with possibilities, but needs guardrails if it is to maximize returns and minimize risks. We want it to work out all the best ways to improve our world (short of realizing that removing the human race could be the most effective pathway to extending the life expectancy of the rest of Nature).

It’s easy to get carried away by grand futuristic visions when we talk about AI. After all, some of our greatest innovators are actively debating the enormous dangers and possibilities. But let’s come back down to earth and talk about how AI can at least make the Internet work for the betterment of viewers and publishers alike, now and in the future.

We are already using basic AI to bring more control to the increasingly abstract and complex world of hybrid IT, multi-cloud, and advanced app and content delivery. What we need to focus on now is building better guardrails and establishing meaningful parameters that will reliably get our applications, content, and data where we want them to go without outages, slowdowns, or unexpected costs. Remember, AI doesn’t run in glorious isolation, unerring in the absence of continual course adjustment: this is a common misconception that leads to wasted effort and disappointing or possibly disastrous results. Even Amazon seems to have fallen prey to the set-it-and-forget-it mentality: ask yourself, how often does their shopping algorithm suggest the exact same item you purchased yesterday? Their AI parameters may need periodic adjustment to reliably suggest related or supplementary items instead.

For AI to be practically applied, we have to be sure we understand the intended consequences. This is essential from many perspectives:  marketing, operations, finance, compliance, and business strategy. For instance, we almost certainly don’t want automated load balancing to always route traffic for the best user experience possible — that could be prohibitively expensive. Similarly, sometimes we need to route traffic from or through certain geographic regions in order to stay compliant with regulations. And we don’t want to simply send all the traffic to the closest, most available servers when users are already reporting that quality of experience (QoE) there is poor.

When it comes right down to it, the thing that makes global traffic management work is our ability to program the parameters and rules for decision-making — as it were, to build the guardrails that force the right outcomes. And those rules are entirely reliant upon the data that flows in.  To get this right, systems need access to a troika of guardrails: real-time comprehensive metrics for server health, user experience health, and business health.

System Guardrails

Real-time systems health checks are the first element of the guardrail troika for intelligent traffic routing. Accurate, low-latency, geographically-dispersed synthetic monitoring answers the essential server availability question reliably and in real-time: is the server up and running at all.

Going beyond ‘On/Off’ confidence, we need to know the current health of those available servers. A system that is working fine right now may be approaching resource limits, and a simple On/Off measurement won’t know this. Without knowing the current state of resource usage, a system can cause so much traffic to flow to this near-capacity resource that it goes down, potentially setting off a cascading effect that takes down other working resources.

Without scriptable load balancing, you have to dedicate significant time to shifting resources around in the event of DDoS attacks, unexpected surges, launches, repairs, etc. — and problems mount quickly if someone takes down a resource for maintenance but forgets to make the proper notifications and preparations ahead of time. Dynamic global server load balancers (GSLBs) use real-time system health checks to detect potential problems, route around them, and send an alert before failure occurs so that you can address the root cause before it gets messy.

Experience Guardrails

The next input to the guardrail troika is Real User Measurements (RUM), which provide information about Internet performance at every step between the client and the clouds, data centers, or CDNs hosting applications and content. Simply put, RUM is the critical measurement of the experience each user is having. As they say, the customer is always right, even when Ops says the server is working just fine. To develop true traffic intelligence, you have to go beyond your own system. This data should be crowd-sourced by collecting metrics from thousands of Autonomous System Numbers, delivering billions of RUM data points each day.

Community-sourced intelligence is necessary to see what’s really happening at both the edges of the network as well as in the big messy pools of users where your own visibility may be limited (e.g. countries with thousands of ISPs like Brazil, Russia, Canada, and Australia). Granular, timely, real user experience data is particularly important at a time when there are so many individual peering agreements and technical relationships, all of which could be the source of unpredictable performance and quality.

Business Guardrails

Together, system and experience data inform intelligent, automated decisions so that traffic is routed to servers that are up and running, demonstrably providing great service to end users, and not in danger of maxing out or failing. As long as everything is up and running and users are happy, we’re at least halfway home.

We’re also at the critical divide where careful planning to avoid unintended consequences comes into play. We absolutely must have the third element of the troika: business guardrails.

After all, we are running businesses. We have to consider more than bandwidth and raw performance: we need to optimize the AI parameters to take care of our bottom line and other obligations as well. If you can’t feed cost and resource usage data into your global load balancer, you won’t get traffic routing decisions that are as good for profit margins as they are for QoE. As happy as your customers may be today, their joy is likely to be short-lived if your business exhausts its capital reserves and resorts to cutting corners.

Beyond cost control, automated intelligence is increasingly being leveraged in business decisions around product life cycle optimization, resource planning, responsible energy use, and cloud vendor management. It’s time to put all your Big Data streams (e.g., software platforms, APM, NGINX, cloud monitoring, SLAs, and CDN APIs) to work producing stronger business results. Third party data, when combined with real-time systems and user measurements, creates boundless possibilities for delivering a powerful decisioning tool that can achieve almost any goal.

Conclusion

Decisions made out of context produce optimal results rarely and only by sheer luck. Most companies have developed their own special blend of business and performance priorities (and anyone who hasn’t, probably should). Automating an added control layer provides comprehensive, up-to-the-minute visibility and control, which helps any Ops team to achieve cloud agility, performance, and scale, while staying in line with business objectives and budget constraints.

Simply find the GSLB with the right decisioning capabilities, as well as the capacity to ingest and use System, Experience, and Business data in real-time, then build the guardrails that optimize your environment for your unique needs.

When it comes to practical applications of AI, global traffic management is a great place to start. We have the data, we have the DevOps expertise, and we are developing the ability to set and fine-tune the parameters. Without it, we might break the Internet. That’s a doomsday scenario we all want to avoid, even those of us who love the darkest of dystopian science fiction.

Josh GrayAbout Josh Gray: Josh Gray has worked as both a leader in various startups as well as at large enterprise settings such as Microsoft. At Microsoft he was awarded multiple patents. As VP of Engineering for Home Comfort Zone his team designed and developed systems that were featured in Popular Science, HGTV, Ask this Old House and won #1 Cool product at introduction at the Pacific Coast Builders Show. Josh has been a part of many other startups and built on his success by becoming an Angel Investor in the Portland Community. Josh continues his run of success as Chief Architect at Cedexis. Linkedin profile

 

Announcing Cedexis Netscope: Advanced Network Performance and Benchmarking Analysis

The Cedexis Radar community collects tens of billions of real user monitoring data points each day, giving Cedexis users unparalleled insight into how applications, videos, websites, and large file downloads are actually being experienced by their users. We’re excited to announce a product that offers a new lens into the Radar community dynamic data set: Cedexis Netscope.

Know how your service stacks up, down to the IP subnet
Metrics like network throughput, availability, and latency don’t tell the whole story of how your service is performing, because they are network-centric, not user-centric: however comprehensively you track network operations, what matters is the experience at the point of consumption. Cedexis Netscope provides you with additional user-centric context to assess your service, namely the ability to compare your service’s performance to the results of the “best” provider in your market. With up-to-date Anonymous Best comparative data, you’ll have a data-driven benchmark to use for network planning, marketing, and competitive analysis.

Highlight your Service Performance:

  • Relative to peers in your markets
  • In specific geographies
  • Compared with specific ISPs
  • Down to the IP Sub-net
  • Including both IPv4 and IPv6 addresses
  • Comprehensive data on latency or throughput
  • Covering both static and dynamic delivery

Actionable insights
Netscope provides detailed performance data that can be used to improve your service for end users. IT Ops teams can use automated or custom reports to view performance from your ASN versus peer groups in the geographies you serve. This lets you fully understand how you stack up versus the “best” service provider, using the same criteria. Real-time logs organized by ASN can be used to inform instant service repairs or for longer-term planning.

Powered by: the world’s largest user experience community
Real User Monitoring (RUM) means fully understanding how internet performance impacts customer satisfaction and engagement. Cedexis gathers RUM data from each step between the client and any of the clouds, data centers, and CDNs hosting your applications to build a holistic picture of internet health. Every request creates more data, continuously updating this unique real-time virtual map of the web.

Data and alerts, your way
To effectively evaluate your service and enable real-time troubleshooting, Netscope lets you roll up data by the ASN, country, region, or state level. You can zoom in within a specific ASN at the IP subnet level, to dissect the data in any way your business requires. This data will be stored in the cloud on an ongoing basis. Netscope also allows users to easily set up flexible network alerts for performance and latency deviations.

Netscope helps ISP Product Managers and Marketers better understand:

  • How well users connect to the major content distributors
  • How well users/business connect to public clouds (AWS, Google Cloud, Azure, etc.)
  • When, where, and how often outages and throughput issues happen
  • What happens during different times of day
  • Where are the risks during big events (FIFA World Cup, live events, video/content releases)
  • How service on mobile looks versus web
  • How the ISP stacks up v. ”the best” ISP  in the region

Bring Advanced Network analysis to your network
Netscope provides a critical data set you need for your network planning and enhancement. With its real-time understanding of worldwide network health, Netscope gives you the context and actionable data you need to delight customers and increase your market share.

Ready to use this data with your team?

Set up a demo today

 

Software-Defined Application Delivery

The modern network isn’t all based in a static data center. Rather it combines self-managed data centers, clouds for storage and computation, and content delivery networks for delivery. Where once a monolithic application delivery controller acted as the master decision-maker for all load balancing, today organizations use software-based ADC solutions, caching servers, and open-source local traffic manager like NGINX.

Which is all to say: love it or hate it, today we live in a world of hybrid infrastructures. However hard we may try, we cannot get away without combining the best of private and public infrastructure. Few, if any, modern technology services are served through all self-managed hardware (and even the behemoths building de facto private clouds and CDNs must often manage hardware that spans numerous co-locations and POPs, where the actual hardware is outside their physical control).

This creates a pair of challenges, one inward-facing, the other outward:

  • Outward: how can a provider ensure that each user receives a high quality of experience (QOE)? With a heterogeneous computation, storage, and delivery mechanism, the only way to find out whether they are doing so is to measure at the point of consumption – but the very complexity of the infrastructure makes it challenging to know what to do with that monitoring data
  • Inward: with so many locations (many of them virtual and essentially placeless), how can a provider ensure that each piece of the puzzle (from computation to content retrieval to delivery to the user) is working efficiently – or that each request for resources is efficiently and cost-effectively selected from the myriad possibilities?

At Cedexis, we are announced today an approach that solves for both sides of this fundamental coin. We call it the Application Delivery Platform, and it is the culmination of millions of person-years’ work creating both the world’s largest and most trusted real user measurement (RUM) community, and the only user-configurable, cloud-native global traffic management engine. Because when you combine the best of private and public infrastructure, you have to have a trusted 3rd party global traffic manager, which can objectively assess the right pathways for every bit that passes between your system and your users.

The key is turning Big Data into Real Action.

To operate a smooth hybrid infrastructure, three things are needed: comprehensive data; adaptable, dynamic, user-configurable algorithms; and a real-time decisions engine that turns the first two into automated adjustments to the flow of bits. The Cedexis Application Delivery Platform

  • Combines any kind of data: we start, of course, with the RUM data collected at a rate of some 14 billion measurements per day in Radar. We then layer in synthetic monitoring data from our Sonar product. And then we’ll take into account any other data source you want to bring into the mix: just connect it to our Fusion ingestion engine, which comes with dozens of pre-built integrations. The outcome is the very definition of Big Data, brought together, and made available for use in seconds.
  • Delivers user-configurable algorithms: Traffic management is managed by what we call Openmix Applications, which are written in straight-up Javascript, committed through the always-on Cedexis portal, and therefore live in minutes. Many customers start with one of our pre-built items (take a look at a few here in Github); but pretty quickly they get to building algorithms that make sense for their environment. Balancing QOE against cost, for instance, is a popular approach; as is integrating a private CDN (perhaps based on Varnish cache servers strategically located near audience clusters) with public infrastructure. The ability to get in and change things creates the control and flexibility to experiment, to add and remove providers, and to tune the network without having to constantly spin up complex IT projects
  • Automates Traffic Management in real time: it’s all very well having a comprehensive view of the Internet and its users, and to have an algorithm that works out how to circumvent congestion and select the cheapest resources – but it’s only useful insofar as it makes adjustment automatically, and in real time. The Openmix global traffic manager starts making decisions based on the latest Big Data it receives in around 7 seconds, and never stops iterating and reacting to the very real fluctuations that are part of communicating through the Internet.

Everything you just read is real, proven, and available to you today: we’ve spent the last year working with some wonderful, innovative customers who have pushed us to build the cloud-native, real-time, automated global traffic manager they need to make hybrid infrastructure work for their businesses.

If you’d like to take a closer look, please let us know – we’d sure love to show you a demo

 

Re-Writing Streaming Video Economics


The majority of Americans – make that the vast majority of American Millennials – stream video. Every day, in every way. From Netflix to Hulu, YouTube to Twitch, CBS to HBO, there is no TV experience that isn’t being accessed on a mobile phone, a tablet, a PC, or some kind of streaming device attached to a genuine, honest-to-goodness television.

The trouble is, we aren’t really paying for it: just 9% of a household’s video budget goes to streaming services, while the rest goes to all the usual suspects: cable companies, satellite providers, DVD distributors, and so forth. This can make breaking a profit a tricky proposition – Netflix just started to churn out ‘material profits’, Hlu is suspected to be losing money, and Amazon is unlikely ever to break out profitability of its Prime video service from the other benefits of the program.

The challenge is there are really only so many levers that can be pulled to make streaming video profitable:

  1. Charge (more) for subscriptions: except that when the cost goes up, adoption goes down, and decelerating growth is anathema to a start-up business
  2. Spend less on (licensing/making/acquiring) content: except that if the content quality misses, audience growth will follow it
  3. Spend less on delivering the content: except that if the quality goes down, audiences will depart, never to be seen again

One and two are tricky, and rely upon the subjective skills of pricing and content acquisition experts. Number three though…maybe there’s something there that is available to everyone.

And indeed, there is. Most video traffic these days travels across Content Delivery Networks (CDNs), who do yeoman work caching popular traffic around the globe, and doing much of the heavy lifting in working out the quickest way to get content from publisher to consumer. Over the years, these vital members of the infrastructure have gradually improved and refined their craft, to the point where they about as reliable as they can be.

That said, no Ops team ever likes to have a single point of failure, which is why almost all large-scale internet outfits contract with at least two – if not more – CDNs. And that’s where the opportunity arises: it’s almost a guarantee that with two contracts, there will be differences in pricing for particular circumstances. Perhaps there is a pre-commit with one, or a time-of-day discount on the other; perhaps they simply offer different per-Gb pricing in return for varying feature sets.

With Openmix, you can actually build an algorithm that doesn’t just eliminate outages; and doesn’t just ensure consistent quality; you can make decisions on where to send the traffic based on financial parameters, once you ensure that the quality isn’t going to drop.

All of a sudden you have access to pull one of the three levers – without triggering the nasty side effects that make each one a mixed blessing. You can reduce your cost, without putting your quality at risk – it’s a win/win.

We’d love to show you more about this, so if you’re at NAB this week, do stop by.

Live and Generally Available: Impact Resource Timing

We are very excited to be officially launching Impact Resource Timing (IRT) for general availability.

IRT is Impact’s powerful window into the performance of different sources of content for the pages in your website. For instance, you may want to distinguish the performance of your origin servers relative to cloud sources, or advertising partners; and by doing so, establish with confidence where any delays stem from. From here, you can dive into Resource Timing data sliced by various measurements over time, as well as through a statistical distribution view.

What is Resource Timing? Broadly speaking, resource timing measures latency within an application (i.e. browser). It uses JavaScript as the primary mechanism to instrument various time-based metrics of all the resources requested and downloaded for a single website page by an end user. Individual resources are objects such as JS, CSS, images and other files that the website pages requests. The faster the resources are requested and loaded on the page, the better quality user experience (QoE) for users.  By contrast, resources that cause longer latency can produce a negative QoE for users.  By analyzing resourcing timing measurements, you can isolate the resources that may be causing degradation issues for your organization to fix.  

Resource Timing Process:

Cedexis IRT makes it easy for you to track resources from identified sources, normally identified through domain (*.myDomain.com), by sub-domain(e.g. images.myDomain.com), and by the provider serving your content. In this way, you can quickly group together types of content, and identify the source of any latency. For instance, you might find that origin-located content is being delivered swiftly, while cloud-hosted images are slowing down the load time of your page; in such a situation, you would now be in a position to consider a range of solutions, including adding a secondary cloud provider and a global server load balancer to protect QoE for your users.

Some benefits of tracking Resource Timing.

  • See which hostnames  – and thus which classes of content – are slowing down your site.
  • Determine which resources impact your overall user experience.
  • Correlate resource performance with user experience.

Impact Resource Timing from Cedexis allows you to see how content sources are performing across various measurement types such as Duration, TCP Connection Time, and Round Trip Time. IRT reports also give you the ability to drill down further by Service Providers, Locations, ISPs, User Agent (device, browsers, OS) and other filters.

Check out our User Guide to learn more about our Measurement Type calculations.

There are two primary reports in this release of Impact Resource Timing. The Performance report, which gives you a trending view of resource timing over time and the Statistical Distribution report, which reports Resource Timing data through a statistical distribution view.  Both reports have very dynamic reporting capabilities that allow you to easily pinpoint resource-related issues for further analysis.  


Using the Performance report, you can isolate which grouped resources are causing potential end user experience issues by hostname, page or service provider and when the issue happened. Drill down even further to see if this was a global issue or localized to a specific location or if it was by certain user devices or browsers.  

IRT is now available for all in the Radar portal – take it for a spin and let us know your experiences!

Better OTT Quality At Lower Cost? That Would Be Video Voodoo

According to the CTA, streaming video now claims as many subscribers as traditional Pay TV. Another study, from the Leichtman Research Group proposed that more households have streaming video than have a DVR. However accurate – or wonkily constructed – these statistics, what’s not up for grabs is that more people than ever are getting a big chunk of their video entertainment over the Web. Given the infamous AWS outage, this means that providers are constantly at risk of seeing their best-laid-plans laid low by someone’s else’s poor typing skills.

Resiliency isn’t a nice-to-have, it’s a necessity. Services that were knocked out last week owing to AWS’ challenges were, to some degree, lucky: they may have lost out on direct revenue, but their reputations took no real hit, because the core outage was so broadly reported. In other words, everyone knew the culprit was AWS. But it turns out that outages happen all the time – smaller, shorter, more localized ones, which don’t draw the attention of the global media, and which don’t supply a scapegoat. In those circumstances, a CDN glitch is invisible to the consumer, and is therefore not considered: when the consumer’s video doesn’t work, only the publisher is available to take the blame.

It’s for this reason that many video publishers that are Cedexis customers first start to look at breaking from the one-CDN-to-rule-them-all strategy, and look to diversify their delivery infrastructure. As often as not,this starts as simply adding a second provider: not so much as an equal partner, but as a safety outlet and backup. Openmix intelligently directs traffic, using a combination of community data (the 6 billion measurements we collect from web users around the world each day) and synthetic data (e.g. New Relic and CDN records). All of a sudden, event though outages don’t stop happening, they do stop being noticeable because they are simply routed around. Ops teams stop getting woken up in the middle of the night, Support teams stop getting sudden call spikes that overload the circuits, and PR teams stop having to work damage control.

But a funny thing happens once the outage distractions stop: there’s time catch a breath, and realize there’s more to this multi-CDN strategy than just solving a pain. When a video publisher can seamlessly route between more than one CDN, based on its ability to serve customers at an acceptable quality level, there is a natural economic opportunity to choose the best-cost option – in real time. Publishers can balance traffic based simply on per-Gig pricing; ensure that commits are met, but not exceeded until every bit of pre-paid bandwidth throughout the network is exhausted; and distribute sudden spikes to avoid surge pricing. Openmix users have reported seeing cost savings that reach low to mid double-digit percentages – while they are delivering a superior, more consistent, more reliable service to their users.

Call it Video Voodoo: it shouldn’t be possible to improve service reliability and reduce the cost of delivery…and yet, there it is. It turns out that eliminating a single point of failure introduces multiple points of efficiency. And, indeed, we’ve seen great results for companies that already have multiple CDN providers: simply avoiding overages on each CDN until all the commits are met can deliver returns that fundamentally change the economics of a streaming video service.

And changing the economics of streaming is fundamental to the next round of evolution in the industry. Netflix, the 800 pound gorilla, has turned over more than $20 billion in revenue the last three years, and generated less than half a billion in net margin, a 5% rate; Hulu (privately- and closely-held) is rumored to have racked up $1.8B in losses so far and still be generating red ink on some $2B in revenues. The bottom line is that delivering streaming video is expensive, for any number of reasons. Any engine that can measurably, predictably, and reliably eliminate cost is not just intriguing for streaming publishers – it is mandatory to at least explore.

Amazon Outage: The Aftermath

Amazon AWS S3 Storage Service had a major, widely reported, multi-hour outage yesterday in their US-East-1 data center. The S3 service in this particular data center was one of the very first services Amazon launched when it introduced cloud computing to the world more than 10 years ago. It’s grown exponentially since–storing over a trillion objects and servicing a million requests/second supporting thousands of web properties (this article alone lists over 100 well-known properties that were impacted by this outage).

Amazon has today published a description of what happened. The summary is that this was caused by human error. One operator, following a published run book procedure, mis-typed a command parameter setting a sequence of failure events in motion. The outage started at 9:37 am PST.  A nearly complete S3 service outage lasted more than three hours and full recovery of other AWS S3-dependent services lasted several hours more.

A few months ago, Dyn taught the industry that single-sourcing your authoritative DNS creates the risk the military described as two is one, one is none. This S3 incident underscores the same lesson for object storage. No service tier is immune. If a website, content, service or application is important, redundant alternative capability at all layers is essential. And this requires appropriate capabilities to monitor and manage this redundancy. After all, fail-over capacity is only as good as the system’s ability to detect the need to, and to actually, failover. This has been at the heart of Cedexis’ vision since the beginning, and as we continue to expand our focus in streaming/video content and application delivery, this will continue to be an important and valuable theme as we seek to improve the Internet experience of every user around the world.

Even the very best, most experienced services can fail. And with increasing deconstruction of service-oriented architectures, the deeply nested dependencies between services may not always be apparent. (In this case, for example, the AWS status website had an underlying dependency on S3 and thus incorrectly reported the service at 100% health during most of the outage.)

We are dedicated to delivering data-driven, intelligent traffic management for redundant infrastructure of any type. Incidents like this should continue to remind the digital world that redundancy, automated failover, and a focus on the customer experience are fundamental to the task of delivering on the continued promise of the Internet.

How To Deliver Content for Free!

OK, fine, not for free per se, but using bandwidth that you’ve already paid for.

Now, the uninitiated might ask what’s the big deal – isn’t bandwidth essentially free at this point? And they’d have a point – the cost per Gigabyte of traffic moved across the Internet has dropped like a rock, consistently, for as long as anyone can remember. In fact, Dan Rayburn reported in 2016 seeing prices as low as ¼ of a penny per gigabyte. Sounds like a negligible cost, right?

As it turns out, no. As time has passed, the amount of traffic passing through the Internet has grown. This is particularly true for those delivering streaming video: consumers now turn up their nose at sub-broadcast quality resolutions, and expect at least an HD stream. To put this into context, moving from HD as a standard to 4K (which keeps threatening to take over) would result in the amount of traffic quadrupling. So while CDN prices per Gigabyte might drop 25% or so each year, a publisher delivering 400% the traffic is still looking at an increasingly large delivery bill.

It’s also worth pointing out that the cost of delivery relative to delivering video through a traditional network, such as cable or satellite is surprisingly high. An analysis by Redshift for the BBC clearly identifies the likely reality that, regardless of the ongoing reduction in per-terabyte pricing “IP service development spend is likely to increase as [the BBA] faces pressure to innovate”, meaning that online viewers will be consuming more than their fair share of the pie.

Take back control of your content…and your costs

So, the price of delivery is out of alignment with viewership, and is increasing in practical terms. What’s a streaming video provider to do?

Allow us to introduce Varnish Extend, a solution combining the powerful Varnish caching engine that is already part of delivering 25% of the world’s websites; and Openmix, the real-time user-driven predictive load balancing system that uses billions of user measurements a day to direct traffic to the best pathway.

Cedexis and Varnish have both found that the move to the Cloud left a lot of broadcasters as well as OTT providers with unused bandwidth available on premise.Bymaking it easy to transform an existing data-center into a private CDN Point of Presence (PoP), Varnish Extend empowers companies to easily make the most out of all the bandwidth they have paid for, by setting up Varnish nodes on premise, or on cloud instances that offer lower operational costs than using CDN bandwidth.

This is especially valuable for broadcasters/service providers whose service is limited to one country: the global coverage of a CDN may be overkill, when the same quality of experience can be delivered by simply establishing POPs in strategic locations in-country.

Unlike committing to an all-CDN environment, using a private CDN infrastructure like Varnish Extend supports scaling to meet business needs – costs are based on server instances and decisions, not on the amount of traffic delivered. So as consumer demands grow, pushing for greater quality, the additional traffic doesn’t push delivery costs over the edge of sanity.

A global server load balancer like Openmix automatically checks available bandwidth on each Varnish node as well as each CDN, along with each platform’s performance in real-time. Openmix also uses information from the Radar real user measurement community to understand the state of the Internet worldwide and make smart routing decisions.

Your own private CDN – in a matter of hours

Understanding the health of both the private CDN and the broader Internet makes it a snap to dynamically switch end-users between Varnish nodes and CDNs, ensuring that cost containment doesn’t come at the expense of customer experience – simply establish a baseline of acceptable quality, then allow Openmix to direct traffic to the most cost-effective route that will still deliver on quality.

Implementing Varnish Extend is surprisingly simple (some customers have implemented their private CDN in as little as four hours):

  1. Deploy Varnish Plus nodes within existing data-centre or on public cloud,
  2. Configure Cedexis Openmix to leverage these nodes as well as existing CDNs.
  3. Result: End-users are automatically routed to the best delivery node based on performance, costs, etc.

Learn in detail how to implement Varnish Extend

Sign up for Varnish Software – Cedexis Summit in NYC

References/Recommended Reading:

Mobile Video is Devouring the Internet

In late 2009 – fully two years after the introduction of the extraordinary Apple iPhone – mobile was barely discernible on any measurement of total Internet traffic. By late 2016, it finally exceeded desktop traffic volume. In a terrifyingly short period of time, mobile Internet consumption moved from an also-ran to a behemoth, leaving behind the husks of marketing recommendations to “move to Web 2.0” and to “design for Mobile First”. And along the way, Apple encouraged us to buy into the concept that the future (of TV at least) is apps.

Unsurprisingly, the key driver of all this traffic is – as it always is – video. One in every three mobile device owners watches videos of at least 5 minutes’ duration, which is generally considered the point at which the user has moved from short-form, likely user-generated, content, to premium video (think: TV shows and movies). And once viewers pass the 5minute mark, it’s a tiny step to full-length, studio-developed content, which is a crazy bandwidth hog.  Consider that video is expected to represent fully 75% of all mobile traffic by 2020 – when it was just 55% in 2015.


As consumers get more interested in video, producers aren’t slowing down. By 2020, it is estimated that it would take an individual fully 5 million years to watch the video being published and made available in just a month. And while consumer demand varies around the world – 72% of Thailand’s mobile traffic is video, for instance, versus just 41% in the United States – the reality is that, without some help, the mobile Web is going to be straining under the weight of near-unlimited video consumption.

What we know is that, hungry as they are for content, streaming video consumers are fickle and impatient. Akamai demonstrated years ago the 2-second rule: if a requested piece of content isn’t available in under 2 seconds, Internet users simply move on to the next thing. And numerous studies have shown definitively that when re-buffering (the dreaded pause in playback while the viewing device downloads the next section of the video) exceeds just 1% of viewing time, audience engagement collapses, resulting in dwindling opportunities to monetize content that was expensive to acquire, and can be equally costly to deliver.

How big of a problem is network congestion? It’s true that big, public, embarrassing outages across CDNs or ISPs are now quite rare. However, when we studied the network patterns of one of our customers, we found that what we call micro-outages (outages lasting 5 minutes or less) happen literally hundreds to thousands of times a day. That single customer was looking at some 600,000 minutes of direct lost viewing time per month – and when you consider how long each customer might have stayed, and their decreased inclination to return in the future, that number likely translates to several million minutes of indirectly lost minutes.

While mobile viewers are more likely to watch their content through an app (48% of all mobile Internet users) than a browser (18%), they still receive the content through the chaotic maelstrom of a network that is the Internet. As such, providers have to work out the best pathways to use to get the content there, and to ensure that the stream will have consistency over time so that it doesn’t fall prey to the buffering bug.

Most providers use stats and analysis to work out the right pathways – so they can look at how various CDN/ISP combos are working, and pick the one that is delivering the best experience. Strikingly, though, they often have to make routing decisions for audience members who are in geographical locations that aren’t currently in play, which means choosing a pathway without any recent input on which is going to be the best pathway – this is literally gambling with the experience of each viewer. What is needed is something predictive: something that will help the provider to know the right pathway the first time they have to choose.

This is where the Radar Community comes in: by monitoring, tracking, and analyzing the activity of billions of Internet interactions every day, the community knows which pathways are at peak health, and which need a bit of a breather before getting back to full speed. So, when using Openmix to intelligently route traffic, the Radar community data provides the confidence that every decision is based on real-time, real-user data – even when, for a given provider, they are delivering to a location that has been sitting dormant.

Mobile video is devouring the Web, and will continue to do so, as consumers prefer their content to move, dance, and sing. Predictively re-routing traffic in real-time so that it circumvents the thousands of micro-outages that plague the Internet every day means never gambling with the experience of users, staying ahead of the challenges that congestion can bring, and building the sustainable businesses that will dominate the new world of streaming video.