Posts

Mobile Video is Devouring the Internet

In late 2009 – fully two years after the introduction of the extraordinary Apple iPhone – mobile was barely discernible on any measurement of total Internet traffic. By late 2016, it finally exceeded desktop traffic volume. In a terrifyingly short period of time, mobile Internet consumption moved from an also-ran to a behemoth, leaving behind the husks of marketing recommendations to “move to Web 2.0” and to “design for Mobile First”. And along the way, Apple encouraged us to buy into the concept that the future (of TV at least) is apps.

Unsurprisingly, the key driver of all this traffic is – as it always is – video. One in every three mobile device owners watches videos of at least 5 minutes’ duration, which is generally considered the point at which the user has moved from short-form, likely user-generated, content, to premium video (think: TV shows and movies). And once viewers pass the 5minute mark, it’s a tiny step to full-length, studio-developed content, which is a crazy bandwidth hog.  Consider that video is expected to represent fully 75% of all mobile traffic by 2020 – when it was just 55% in 2015.


As consumers get more interested in video, producers aren’t slowing down. By 2020, it is estimated that it would take an individual fully 5 million years to watch the video being published and made available in just a month. And while consumer demand varies around the world – 72% of Thailand’s mobile traffic is video, for instance, versus just 41% in the United States – the reality is that, without some help, the mobile Web is going to be straining under the weight of near-unlimited video consumption.

What we know is that, hungry as they are for content, streaming video consumers are fickle and impatient. Akamai demonstrated years ago the 2-second rule: if a requested piece of content isn’t available in under 2 seconds, Internet users simply move on to the next thing. And numerous studies have shown definitively that when re-buffering (the dreaded pause in playback while the viewing device downloads the next section of the video) exceeds just 1% of viewing time, audience engagement collapses, resulting in dwindling opportunities to monetize content that was expensive to acquire, and can be equally costly to deliver.

How big of a problem is network congestion? It’s true that big, public, embarrassing outages across CDNs or ISPs are now quite rare. However, when we studied the network patterns of one of our customers, we found that what we call micro-outages (outages lasting 5 minutes or less) happen literally hundreds to thousands of times a day. That single customer was looking at some 600,000 minutes of direct lost viewing time per month – and when you consider how long each customer might have stayed, and their decreased inclination to return in the future, that number likely translates to several million minutes of indirectly lost minutes.

While mobile viewers are more likely to watch their content through an app (48% of all mobile Internet users) than a browser (18%), they still receive the content through the chaotic maelstrom of a network that is the Internet. As such, providers have to work out the best pathways to use to get the content there, and to ensure that the stream will have consistency over time so that it doesn’t fall prey to the buffering bug.

Most providers use stats and analysis to work out the right pathways – so they can look at how various CDN/ISP combos are working, and pick the one that is delivering the best experience. Strikingly, though, they often have to make routing decisions for audience members who are in geographical locations that aren’t currently in play, which means choosing a pathway without any recent input on which is going to be the best pathway – this is literally gambling with the experience of each viewer. What is needed is something predictive: something that will help the provider to know the right pathway the first time they have to choose.

This is where the Radar Community comes in: by monitoring, tracking, and analyzing the activity of billions of Internet interactions every day, the community knows which pathways are at peak health, and which need a bit of a breather before getting back to full speed. So, when using Openmix to intelligently route traffic, the Radar community data provides the confidence that every decision is based on real-time, real-user data – even when, for a given provider, they are delivering to a location that has been sitting dormant.

Mobile video is devouring the Web, and will continue to do so, as consumers prefer their content to move, dance, and sing. Predictively re-routing traffic in real-time so that it circumvents the thousands of micro-outages that plague the Internet every day means never gambling with the experience of users, staying ahead of the challenges that congestion can bring, and building the sustainable businesses that will dominate the new world of streaming video.

How to Make Cloud Pay Its Own Way

Rightscale came out with a wonderful report on the state of the cloud industry, and we learned some important new things:

  • 77% of organizations are at least exploring private cloud implementations
  • 82% of enterprises are executing a hybrid cloud strategy
  • 26% of respondents are now listing cost as significant challenge – ironically, given the importance of cost-cutting in the early growth of cloud services

The growth in hybrid cloud adoption is particularly striking: by Rightscale’s count, only 6% of companies are exclusively looking at private cloud,  18% are exclusively looking at public cloud , while a full 71% have a toe dipped into each pool.

Meanwhile, Cisco estimates that two thirds of all Internet traffic will traverse at least one content delivery network by 2020 – which tends to imply that most organizations are, right now, invested in getting the most out of some combination of private cloud, public cloud, CDN, and, presumably, physically-managed data center.

Fundamentally, there are a few core ways that we see organizations using this market basket of delivery pathways – and, naturally, our Openmix global server load balancer – to better serve their customers, and to protect their economics as demand grows, apparently insatiable. The core strategies are:

  1. Balance CDNs, offload to origin. For web-centric businesses, delivering content across the Internet is fundamental to their success (possibly their survival), so they tend to rely upon one or more CDNs to get content to their users effectively. Over time, they tend to expand the number of CDN relationships, in order to improve quality across geographies, and to make the most of pricing differences between providers. Once they get this set to equilibrium, they discover that there is unused capacity at origin (or within a private or public cloud instance) to which they can offload traffic, maximizing the return they get on committed capacity, and minimizing unnecessary spend.
  2. Balance clouds, offload to CDN. For businesses that are highly geographically-focused, it is often more effective to create what is essentially a self-managed CDN, establishing PoPs through cloud providers in population centers where their customers actually originate. Even the most robust internally-managed system, however, is subject to traffic spikes that are way beyond expectations (and committed throughput limits), and so these companies build relationships with CDNs in which excess traffic is offloaded at peak times.
  3. Balance Hybrid Cloud. Organizations at the far right of Rightscale’s cloud maturity scale (in their words, the Cloud Explorers and Cloud Focused) are starting to view each of the delivery options not as wildly distinct options, but merely as similar-if-different-looking cogs in the machine. As such, they look at load and cost balancing through a pragmatic prism, in which each user is simply served through the lowest cost provider, so long as it can pass a pre-defined quality bar (a specified latency rate, for instance, or a throughput level). By shifting the mindset away from ‘primary’ and ‘offload’ networks, organizations are able to build strategies that optimize for both cost and quality.

Of course, to balance traffic across a heterogeneous set of delivery networks (and provider types), while adjusting for a combination of both economic and quality of service metrics, requires three things:

  1. Real-time visibility of the state of the Internet beyond the view of the individual publisher, in order to be able to evaluate Quality of Service levels prior to selecting a delivery provider
  2. Real-time visibility into the current economic situation with each contracted provider: which offers the lowest cost option, based on unit pricing, contract commitments, and so forth
  3. Real-time traffic routing, which takes the data inputs, compares them to the unique requirements of the requesting publisher, and seamlessly directs traffic along the right pathway

Not an easy recipe, perhaps, but when found, it results in the opportunity to apply sophisticated algorithms to delivery – in effect to exercise a Wall Street-level arbitrage approach, which results in a combination of delighted customers, and reduced infrastructure costs.

Or, put another way, the opportunity to make your hybrid cloud strategy pay for itself – and more.

To find out more about real-time predictive traffic routing, please take a look around our Openmix pages,  read about how to deliver 100% availability with a Hybrid CDN architecture, and visit our Github repository to see how easy it is to build your own real-time load balancing algorithm.

Make Mobile Video Stunning with Smart Load Balancing

If there’s one thing about which there is never an argument it’s this: streaming video consumers never want to be reminded that they’re on the Internet. They want their content to start quickly, play smoothly and uninterrupted, and be visually indistinguishable from traditional TV and movies. Meanwhile, the majority of consumers in the USA (and likely a similar proportion worldwide) prefer to consume their video on mobile devices. And as if that wasn’t challenging enough, there are now suggestions that live video consumption will grow – according to Variety by as much as 39 times! That seems crazy until you consider that Cisco predicted video would represent 82% of all consumer Internet traffic by 2020.

It’s no surprise that congestion can result in diminished viewing quality, leading over 50% of all consumers to, at some point, experience buffer rage from the frustration of not being able to play their show.

Here’s what’s crazy: there’s tons of bandwidth out there – but it’s stunningly hard to control.

The Internet is a best-efforts environment, over which even the most effective Ops teams can wield only so much control, because so much of it is either resident with another team, or is simply somewhere in the amorphous ‘cloud’.  While many savvy teams have sought to solve the problem by working with a Content Delivery Network (CDN), the sheer growth in traffic has meant that some CDNs are now dealing with as much traffic as the whole Internet transferred just a few years ago…and are themselves now subject to their own congestion and outage challenges. For this reason, plenty of organizations now contract with multiple CDNs, as well as placing their own virtual caching servers in public clouds, and even deploying their own bare-metal CDNs in data centers where their audiences are centered.

With all these great options for delivering content, Ops teams must make real-time decisions on how to balance the traffic across them all. The classic approaches to load balancing have been (with many thanks to Nginx):

  • Availability – Any servers that cannot be reached are automatically removed from the list of options (this prevents total link failure).
  • Round Robin – Requests are distributed across the group of servers sequentially.
  • Least Connections – A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
  • IP Hash – The IP address of the client is used to determine which server receives the request.

You might notice something each of those has in common: they all focus on the health of the system, not on the quality of the experience actually being had by the end user. Anything that balances based on availability tends to be driven by what is known as synthetic monitoring, which is essentially one computer checking another computer is available.

But we all know that just because a service is available doesn’t mean that it is performing to consumer expectations.

That’s why the new generation of Global Server Load Balancer(GSLB) solutions goes a step further. Today’s GSLB uses a range of inputs, including

  • Synthetic monitoring – to ensure servers are still up and running
  • Community Real User Measurements – a range of inputs from actual customers of a broad range of providers, aggregated, and used to create a virtual map of the Internet
  • Local Real User Measurements – inputs from actual customers of the provider’s own service
  • Integrated 3rd party measurements – including cost bases and total traffic delivered for individual delivery partners, used to balance traffic based not just on quality, but also on cost

Combined, these data sources allow video streaming companies not only to guarantee availability, but also to tune their total network for quality, and to optimize within that for cost. Or put another way – streaming video providers can now confidently deliver the quality of experience consumers expect and demand, without breaking the bank to do it.

When you know that you are running across the delivery pathway with the highest quality metrics, at the lowest cost, based on the actual experience of your users – that’s a stunning result. And it’s only possible with smart load balancing, combining traditional synthetic monitoring with the real-time feedback of users around the world, and the 3rd party data you use to run your business.

If you’d like to find out more about smart load balancing, keep looking around our site. And if you’re going to be at Mobile World Congress at the end of the month, make an appointment to meet with us there so we can show you smart load balancing in real life.

How To Prevent Network Fails in The Gaming Space

gaming-fail

When two Las Vegas Strip casinos lost power in early January owing to high winds, it represented a perfect metaphor for how much gaming businesses rely on something that is out of their direct control: the Internet.

Gaming continues to rely heavily on both large file downloads (for games sent to gaming consoles, for instance), and synchronous or near-synchronous communications (to enable multi-player action). When some element of the Internet goes down, or becomes so congested as to feel like it’s not working, the whole gaming experience can fall flat on its face – despite the provider having done everything in their power to guarantee a great experience. Meanwhile, the cost to provide great service continues to rise.

Well, nearly everything.

The Internet has evolved, and, while CDNs offer great value in reducing the consumer experience, it simply isn’t possible to serve a global audience with just a single CDN partner. Many Cedexis customers have as many as six to ten  CDNs, serving specific customer segments. By contrast, our first conversations with customers include them telling us that they are experiencing frequent outages and slowdowns, despite working with some of the best CDN providers in the world.

Here’s a number for you: 303. That’s the number of extra hours of downtime customers in Russia suffer using the 10th highest ranked CDN versus using a combination of providers, balanced with Cedexis Openmix (you can take a look at this by heading to our CDN and Cloud Performance Reports page).

Here are five specific hints for avoiding network fails:

  1. Know Your Experience: it’s easy to get caught up in server load, packet loss, and other technical terms – but it’s the human experience your players receive that defines their allegiance to your service. Your best indication that you’re meeting and/or exceeding customer expectations is by using Real User Measurements (RUM). Knowing what your players expect is a necessary data point for building something better.
  2. Know Your Calendar: every app that needs downloads will have scheduled updates. On those days, bandwidth needs will inevitably be higher, and even a distributed infrastructure that has been working fine up until now will be put to the test. If you have an upcoming release, this is the perfect time to bring a new CDN or two into the fold, and validate the impact of having extra partners to share the load.
  3. Know Your Location: every business needs to expand geographically – but every CDN isn’t equally robust in every location. Use a tool like Radar to evaluate your current partners’ results in new geographics – and take the opportunity to work with a local partner, who may be able to deliver better results at lower prices for a defined audience set.
  4. Know Your Capacity: many companies overprovision their datacenters, and actually have computing power and bandwidth to spare. If yours is one of those companies, consider introducing your own modest DIY CDN – that way you can get the most out of the technology you already have
  5. Know Your Numbers: every penny spent on delivery is a penny unavailable for other purposes. Look at your delivery costs, and ask whether there aren’t economic efficiencies to be found by working with more providers – lower base prices, say, or the option to offload peak traffic to avoid the always-maddening burst charges.

For more hints, explore the Cedexis website, or drop us a line.

And don’t forget to meet us at ICE Totally Gaming in London February 7 – 9.

Ice totally Gaming_signature - small

What Can Metrics Tell Us About Internet Video Delivery?

Over the last year or so, we’ve been working with some innovative streaming video leaders to collect and analyze the Quality of Experience (QoE) their consumers have been receiving. Using the results of several billion streams, we can start to see some fascinating trends emerge.

This data was collected through an updated (and still free!) Radar Community tag, which collected video-specific QoE metrics from HTML5 player elements, across 10 video service providers in Q4 of 2016, who served both live and video-on-demand (VOD) assets to audiences all around the world.

Let’s start with a thoroughly unsurprising result: higher throughput is distinctly correlated with higher bitrates:

thgoughput-bitrate

That said, we can also say that the return for getting from below 10K kbps to above that line is significantly greater than getting from below 30K to above. Importantly, we can also see that the largest clusters of chunks occur below and around 10K, so focusing on improvement here will have the most significant impact on customer viewing.

We see a not-dissimilar result when we compare throughput with video start failures (VSF). More throughput is very highly correlated with low video start failures:

throughput-videostartfailure

Once again, getting to above 10K kbps brings the greatest benefit, dropping VSF from a peak of 9% to a more manageable 4%. Doubling the throughput roughly halves the VSF, though the benefits are more modestas speeds exceed 30K.

Less obvious is the degree to which using multiple CDNs can measurably impact the QoE of users. Take a look at the following graph, which compares the Latency of two CDNs across a 7-day period:

two-cdns-7-days

CDN1 (in red) shows a very consistent series of results, with only a couple of spikes that really catch the eye. By contrast, CDN2 (in green) shows way more spikes, a couple of which are quite striking, and a clear pattern of higher latency. Based on this very high level view, one might conclude that the incremental benefit of distributing traffic across the two providers would be relatively low. However, look what happens when we double-click and look at a single day:

2-cdn-one-day

From midnight to around 5am, CDN2 is by far the superior option – and, tantalizingly, appears to become so again right around 11pm. This might be the perfect example of a situation in which some time-based traffic distribution could deliver QoE improvements. And, assuming the CDNs bear different cost structures, there may very well be an opportunity here to arbitrage some costs and improve margins.  Finally, let’s dig into what happens during a single, rather troublesome hour:

2-cdns-one-hour

Note that for this particular hour, CDN2 is outperforming CDN1 for around about 50 minutes, meaning that from a pure QoE perspective, we would probably prefer traffic to be sent via CDN2 than CDN1. This is something that would be effectively impossible to spot at the 7-day level, but by digging in deeply, it becomes clear that distributing our traffic across these two CDNs would result in detectable differences for users.

And what would that bring us? Using one more graph, we can see the relationship between latency and video start time (VST):

latency-to-vst

Unsurprisingly lower latency results in lower VST – which, you can be sure, will in turn contribute to higher VSF. Or, in more direct terms, will mean less people consuming video, and therefore seeing less ads, or becoming increasingly less likely to renew a subscription.

Real User Measurements (RUM) that are tracked through the Cedexis Radar Community provide a powerful set of signposts for how to deliver traffic most effectively and efficiently through the Internet. Adding video-specific metrics helps ensure that the right decisions are being made for a sparkling video experience.

To find out more about adding video metrics for free to your Radar account, drop us a line at <sales@cedexis.com>.

Cedexis Predictions for 2017

Content & Application Delivery Predictions for 2017

It’s that time of the year, already, when we look back and evaluate how the year went – and look forward to prognosticate about what’s right around the corner.

It’s been a huge year for Internet operators, between hacking scandals (and hints of hacking scandals), DDOS takedowns, and the continued mammoth uptake of cloud services and streaming video. We’re back over a billion websites worldwide (the number was first reached in 2014, but then saw a dip, for reasons that remain opaque); the US e-commerce economy seems on pace to exceed $400B; and streaming video is so popular that DirecTV Now is about to launch – not only will you not need a satellite, you’ll simply stream DirecTV over the Internet.

All these things have conspired to boost the amount of traffic on the Internet – one projection says that more traffic will flow through the Internet in 2017 than all prior years combined. And more traffic means more challenges in making sure your content reaches your customers properly. So here are some bold predictions on what will impact your choices as you plan to do battle for bandwidth in the next 12 months:

  • SSL / TLS will be adopted at scale by website publishers: as HTTP2 sees widespread adoption, there is a wide-open window for broad update to the more secure TLS protocol (seriously, if you’re still on SSL it’s only a matter of time…). Global Content Distribution Networks (CDNs) are investing heavily in expanding and optimizing their SSL/TLS services – and, believe it or not, more than half of all the measurements taken by the Radar community are now over SSL/TLS.
  • Real User Monitoring (RUM) will emerge as a critical Application Performance Management (APM) metric: serverless architectures are making existing APM solutions less valuable, as they simply can’t ‘see’ everything that is happening across the cloud. Instead, companies will turn to RUM: measures of what is actually happening at the point of consumption. Only a clear understanding the experience being enjoyed (or not!) by the consumer will permit meaningful tuning for all forms of content and application delivery, from file downloads to streaming video to API access – and beyond..
  • Hybrid CDN architectures will gather momentum: cloud ubiquity, scalability and cost effectiveness will drive ‘CDN offload’ scenarios: moving some traffic back to publisher delivery from the CDNs. Increasingly popular off-the-shelf content and application caching solutions like Varnish Software will continue to decrease the complexity of deploying private networks. And as large scale web publishers adopt do-it-yourself (DIY) content and application delivery strategies, enhancing experience by relying less on CDNs will become an industry-wide trend.
  • Content Delivery budgets will shift to Quality of Experience (QoE) spending: consumers don’t care how their content gets to them – but they do care about the experience they receive, and also vote with their feet when dissatisfied. With CDN pricing in decline, and differentiation harder to establish, publishers will feel the pressure to invest in solutions that optimize QoE in order to attract, and retain, their audiences. And as consumers come to expect ever more dynamic and personalized applications, efficient delivery will need to be balanced against QoE. The winners will not be those with the most complex applications, but rather those who can deliver captivating and seamless experiences.
  • Application Delivery becomes QoE driven: call 2017 the Year of the Human, because it will be the year that ‘application health checks’, typically synthetically generated by probes, don’t reflect real-world user experience. They will be replaced by RUM measurements that accurately reflect the QoE reality. This will lead to significant investments in APM and Big Data solutions, which will be used to sort through the voluminous data to deliver experiences that delight audiences.

2017 will continue to be an exciting year for everyone involved in content and application delivery performance.  We at Cedexis look forward to sharing our thoughts on the evolving space, and the data around what works and what does not in the coming year.

How Much Money Is Your Website Performance Costing You?

Hello friends. We are pleased to announce that Cedexis will be partnering with SOASTA to bring you a roundtable discussion covering recent research into the costs of poor website performance and the expectation of the modern web visitor has evolved.

If you’re interested in learning about:

  • How even slight web delays can lead to lost revenue
  • Understand how to use Real User Measurements to understand what’s not working & make that data actionable
  • What leading enterprises are doing to build high-performance sites

…then this is the event for you. Pete Mastin, Cedexis Product Evangelist, and Tammy Everts , SOASTA Director of Content, will be reviewing strategies and technologies to boost website performance, and show you how business KPIs can be used to ensure your site is contributing as much as possible to the bottom line.

This online event takes place Tuesday, Sept 20th and will be hosted by Aberdeen – the research group, not the Scottish city (though any region housing Balgownie Links is a place I’d like to visit).  Jim Rapoza, Sr Research Analyst from Aberdeen will be moderating.  Presentations will last approximately 30 minutes and there will be plenty of time for Q&A at the end.

This webinar is free as always.  For more detail & how to register, click here.  And if you want to learn more about Cedexis Openmix, which uses real-time data for global traffic management, click here.  Thanks for reading! Enjoy the roundtable.

 

 

Matt Radochonski is Cedexis’ Director of Demand Generation & Marketing Operations. He can be reached on twitter @MattAtCedexis or via email, and feel free to leave a comment below.

RUM vs Synthetic – why people matter

LightBulb-in-the-lead

“Cedexis Leads the Pack”.

It is nice to hear – especially when it comes from a prestigious analyst firm. If you have not seen this report, its worth the read. Cedexis was recognized for innovations in the monitoring space. While we are clearly honored, its somewhat ironic because we give our Real User Monitoring (RUM) away for FREE to all Radar community members. What this report clearly shows is why RUM is significantly better than synthetic monitoring for certain kinds of things. This is not to say that Synthetic monitoring does not have a place – but for real time traffic routing RUM is the best solution. Let me give you an example of why this is true.

As an experiment – lets take 6 global CDNs and point synthetic monitoring agents at them. The 6 CDNs are Akamai, Limelight, Level3, Edgecast, ChinaCache and Bitgravity. I am not going to list their results by name as we are not trying to call anyone out. Rather I mention them here just so the reader knows we are talking about true global CDNs. I am also not going to mention the synthetic monitoring company by name – but suffice it to say they are a major player in the space.

We point 88 agents, located all over the world, to the small test object on these 6 CDNs we benchmark. Now we can compare the synthetic agent’s measurements to the Cedexis Radar measurements for the same network from the same country, each downloading the same object. The only difference is volume of measurements and the location of the agent. The synthetic agent measures about every 5 minutes whereas Radar measurements can exceed 100 measurements per second from a single AS. Of course, the synthetic agents are sitting in big data centers versus Radar running on real user’s browsers.

One more point on the methodology: since we are focused on HTTP Response; we decided to take out DNS resolution time and TCP setup time and focusing on pure wire time. That is First Byte + Connect time. DNS resolution and TCP Setup time happen once for each domain or TCP stream whereas response time is going to impact every object on the page.

We will look at a single network in the US. The network is ASN 701: “UUNET – MCI Communications Services Inc. d/b/a Verizon Business” (USA). This is a backbone network and captures major metropolitan areas all over the US. Cedexis Radar received billions of measurements from browsers sitting on this network within the US.

Screen-Shot-2015-04-08-at-12.59.28-PM

Clearly, CDNs are much faster inside a big data center then they are in our homes! More interestingly are the changes in Rank; Notice how CDN1 moves from #5 to #1 under RUM! Also the scale changes dramatically, the synthetic agents data would have you believe CDN6 is nearly 6X slower than the fastest CDNs – yet when measured from the last mile they are only about 20% slower.

So if you had these 6 CDNs in your multCDN federation and were doing Latency Based load balancing based on these synthetic measurements – the people on this network would be poorly served. CDN1 would be getting very little (if any) of the traffic from this network even though its the fastest actual network. RUM matters because thats where the people are! By measuring from the datacenter you obfuscate this important point.

Synthetic agents can do many wonderful things but measuring actual Web Performance (from actual real People) is not among them; performance isn’t about being the fastest on a specific backbone network from a datacenter, it is about being fastest on the networks which provide service to the subscribers of your service. The actual people.

RUM based monitoring provides a much truer view of the actual performance of a web property than does synthetic, agent based monitoring. We urge you to go deploy our Radar tag and see for yourself who is performing best right now. Our real-time RUM measurements provide the best possible view into how global CDNs compare with each other in every region of the world.

Accor: Building with BRICs while ensuring performance

Paris-based Accor is the sixth-largest global hotel group in the world, operating in 92 countries. They wanted to ensure their website would be so fast worldwide that guests would not want to sleep elsewhere.

Cedexis Openmix was implemented to balance AccorHotels.com traffic across multiple federated CDNs, both regional and global, based on Radar measurements. Website response times sped up dramatically. See how Accor uses Cedexis to improve PLT and ultimately improve the P&L!

Screen-Shot-2015-03-11-at-12.40.42-PM

Why Geo-Routing misses the point

Internet-Logical-

Cool Picture huh? Its a diagram of the Internet done by Barrett Lyon of The Opte Project. Know what you do not see here? You do not see states, countries or continents. That is because the Internet only loosely correlates to geography. To really understand the Internet, networks and their peering relationships must be taken into consideration. In a previous blog we discussed why Round Robin load balancing does not cut the mustard. Today we will discuss why Geo based load balancing misses the point.

In many ways Round-Robin is pretty obviously flawed. The conclusion that many performance engineers jump to next is: “lets move to geo – that will offer better performance by considering distances that packets have to travel”. Unfortunately; they are off point with this leap. The Cedexis Radar community generates billions of Real User Measurements a day, and definitively shows that using Geo-Based routing masks significant network level differences in CDN and Cloud performance. Performance based on regional geographical location (let alone country) without considering ASN/ISP paints a very inaccurate picture. We also see that for CDNs, footprint and peering are dramatically impactful to performance. We will go into this in more detail below.

To understand some of what I stated above lets establish a basic truth: The ‘best’ CDN providers often flip-flop depending upon which network they are talking to and sometimes time of day.

The southwest US, for example, is one of the highest-traffic regions that Cedexis measures in the world.
Here’s the performance of the 5 randomly selected CDN’s from the Cedexis Radar community across all networks (50th percentile) on a random day in early December.

Pic1

(These are latency measurements – so lower is better in the chart)

There is a consistent ‘winner’: MaxCDN takes the crown. There is (seemingly) limited opportunity to improve by selecting alternatives. The lowest to highest is probably ~10 ms or less, or maybe 20% or so on Round Trip Time (RTT). A nominal difference between #1 and #2 looks to be maybe 5ms or so. It appears that there is little room for improvement here. However – lets consider how the traffic looks on some specific networks that are largest sources of traffic in this region. (Rather than just Muxing the measurement across all networks as the above does).

First lets limit the measurements to come from only ASN 7018, AT&T (source of 7.4% of radar traffic in this region, above):

Pic2

MaxCDN still performs well, but there are other competitive choices (except for 1.5 hours in the middle), and Akamai, which was middle of the pack above, is a stand-out poor decision. Going with Akamai globally costs you 20+ ms. Going with Edgecast universally would have cost you 15-20 ms for 1+ hours (note, this is NOT an availability lapse, just a performance lapse).

Let’s have a look at one of the largest wireless networks, AT&T wireless (delivering 2.4% of all data points in the region):

Pic3

In this case Edgecast, Akamai and CloudFlare all outperform MaxCDN by ~20+ ms.
If we’d have gone w/ MaxCDN as the regional consistent winner we’d be socking a 20 ms tax on all of this network’s traffic. (Also, note that the magnitude of measurements on mobile networks are 2-3x the latencies on wired networks and the spreads are often very different.)

Importantly, when you wash these networks together, the differences get muted.
A win by MaxCDN on AT&T wired cancels out some of the loss on AT&T wireless.

The maximum theoretical performance opportunity is the most favorable combination of both the ideal decision for each time period and for each network.
Josh Gray – Chief Architect, Cedexis

As illustrated above, the ‘best’ providers often flip-flop depending upon which network they are talking to and sometimes time of day.

So sometimes there are pretty consistent winners, but there are also the transient opportunities.
Here’s another example in the SW US among these 5 CDN’s in the same 24 hour period over Verizon business, delivering 5.8% of traffic.

Pic4

Cloudfront went ~20 ms slower than the best choice for almost five hours.

…and Comcast (12.7% of all traffic), which increased by ~10 ms on ~45 ms to Akamai for over an hour during the same time period.

Pic5

To summarize:
• At the country/region level across networks it often appears there is a clear, consistent winner.
• This winner is very often not consistent on a per-network basis – choosing the best on each network is opportunity.
• There are frequently opportunities in time to further capitalize on performance differentials.
For these reasons Latency Based Load balancing is the best choice – using RUM data.
• NOTE: It needs to be said that the results of specific CDNs above changes daily. As peering relationships are modified and as new Points of Presence (POPs) are enabled or disabled – performance of the CDNs in question change. These graphs were taken in early December (2014) and it is almost a certainty that the performance of these CDNs is different today. We urge you to sign up for a free account today and look at the performance of all the CDNs in the Radar community. The only sure way to insure the best possible performance is to use latency based load balancing across multiple CDNs. That – after all – is the point.