Posts

Announcing Cedexis Netscope: Advanced Network Performance and Benchmarking Analysis

The Cedexis Radar community collects tens of billions of real user monitoring data points each day, giving Cedexis users unparalleled insight into how applications, videos, websites, and large file downloads are actually being experienced by their users. We’re excited to announce a product that offers a new lens into the Radar community dynamic data set: Cedexis Netscope.

Know how your service stacks up, down to the IP subnet
Metrics like network throughput, availability, and latency don’t tell the whole story of how your service is performing, because they are network-centric, not user-centric: however comprehensively you track network operations, what matters is the experience at the point of consumption. Cedexis Netscope provides you with additional user-centric context to assess your service, namely the ability to compare your service’s performance to the results of the “best” provider in your market. With up-to-date Anonymous Best comparative data, you’ll have a data-driven benchmark to use for network planning, marketing, and competitive analysis.

Highlight your Service Performance:

  • Relative to peers in your markets
  • In specific geographies
  • Compared with specific ISPs
  • Down to the IP Sub-net
  • Including both IPv4 and IPv6 addresses
  • Comprehensive data on latency or throughput
  • Covering both static and dynamic delivery

Actionable insights
Netscope provides detailed performance data that can be used to improve your service for end users. IT Ops teams can use automated or custom reports to view performance from your ASN versus peer groups in the geographies you serve. This lets you fully understand how you stack up versus the “best” service provider, using the same criteria. Real-time logs organized by ASN can be used to inform instant service repairs or for longer-term planning.

Powered by: the world’s largest user experience community
Real User Monitoring (RUM) means fully understanding how internet performance impacts customer satisfaction and engagement. Cedexis gathers RUM data from each step between the client and any of the clouds, data centers, and CDNs hosting your applications to build a holistic picture of internet health. Every request creates more data, continuously updating this unique real-time virtual map of the web.

Data and alerts, your way
To effectively evaluate your service and enable real-time troubleshooting, Netscope lets you roll up data by the ASN, country, region, or state level. You can zoom in within a specific ASN at the IP subnet level, to dissect the data in any way your business requires. This data will be stored in the cloud on an ongoing basis. Netscope also allows users to easily set up flexible network alerts for performance and latency deviations.

Netscope helps ISP Product Managers and Marketers better understand:

  • How well users connect to the major content distributors
  • How well users/business connect to public clouds (AWS, Google Cloud, Azure, etc.)
  • When, where, and how often outages and throughput issues happen
  • What happens during different times of day
  • Where are the risks during big events (FIFA World Cup, live events, video/content releases)
  • How service on mobile looks versus web
  • How the ISP stacks up v. ”the best” ISP  in the region

Bring Advanced Network analysis to your network
Netscope provides a critical data set you need for your network planning and enhancement. With its real-time understanding of worldwide network health, Netscope gives you the context and actionable data you need to delight customers and increase your market share.

Ready to use this data with your team?

Set up a demo today

 

With a Multi-cloud Infrastructure, Control is Key

By Andrew Marshall, Cedexis Director of Product Marketing

Ask any developer or DevOps manager about their first experiences with the public cloud and it’s likely they’ll happily share some memories of quickly provisioning some compute instances for a small project or new app. For (seemingly) a few pennies, you could take full advantage of the full suite of public cloud services—as well as the scalability, elasticity, security and pay-as-you-go pricing model. All of this made it easy for teams to get started with the cloud, saving on both the IT budgets and infrastructure setup time. Public cloud providers AWS, Azure, Google Cloud, Rackspace and others made it easy to innovate.

Fast forward several years and the early promise of the cloud is still relevant: Services have expanded, costs have (in some cases) been reduced and DevOps teams have adapted to the needs of their team by spinning up compute instances whenever they’re needed. But for many companies, the realities of their hybrid-IT infrastructure necessitates support for more than one public cloud provider. To make this work, IT Ops needs a control layer that sits on top of their infrastructure, and can deliver applications to customers over any architecture, including multicloud. This is true, no matter what the reason teams need to support multi-cloud environments.

Prepare for the Worst

As any IT Ops manager, or anyone who has lost access to their web app knows, outages and cloud service degradation happen. Modern Ops teams need a plan in place for when they do. Many companies choose to utilize multiple public cloud providers to ensure their application is always available to worldwide customers, even during an outage. The process of manually re-routing traffic to a second public cloud in the event of an outage is cumbersome, to say the least. Adding an app delivery control plan on top of your infrastructure allows companies to seamlessly and automatically deliver applications over multiple clouds, factoring in real-time availability and performance.

Support Cloud-driven Innovation

Ops teams often support many different agile development teams, often in multiple countries or from new acquisitions. When this is the case, it’s likely that the various teams are using many architectures on more than one public cloud. Asking some dev teams to switch cloud vendors is not very DevOps-y. A better option is to control app delivery automation with a cloud-agnostic control plane that sits on top of any cloud, data center or CDN architectures. This allows dev teams to work in their preferred cloud environment, without worrying about delivery.

Avoid Cloud Vendor Lock-in

Public cloud vendors such as Amazon Web Services or Microsoft Azure aren’t just infrastructure-as-a-service (IaaS) vendors, they sell (or resell) products and services that could very well compete with your company’s offering. In the beginning, using a few cloud instances didn’t seem like such a big deal. But now that you’re in full production and depend on one of these cloud providers for your mission-critical app, this no longer feels like a great strategy. Adding a second public cloud to your infrastructure lessens your dependence on a single cloud vendor who you may be in “coopetition” with.

Multiple-vendor sourcing is a proven business strategy in many other areas of IT, giving you more options during price and SLA negotiations. The same is true for IaaS. Cloud services change often, as new services are added or removed, and price structures change. Taking control over these changes in public cloud service offerings, pricing models and SLAs is another powerful motivator for Ops teams to move to a multi-cloud architecture. An application delivery automation platform that can ingest and act on cloud service pricing data is essential.

Apps (and How They’re Delivered) Have Changed

Monolithic apps are out. Modern, distributed apps that are powered by microservices are in. Similarly, older application delivery controllers (ADCs) were built for a static infrastructure world, before the cloud (and SaaS) were commonly used by businesses. Using an ADC for application delivery requires a significant upfront capital expense, limits your ability to rapidly scale and hinders the flexibility to support dynamic (i.e. cloud) infrastructure. Using ADCs for multiple cloud environments compounds these issues exponentially. A software-defined application delivery control layer eliminates the need for older ADC technology and scales directly with your business and infrastructure.

Regain Control

Full support for multi-cloud in product may sound daunting. After all, Ops teams already have plenty to worry about daily. Adding a second cloud vendor requires a significant ramp-up period to get ready for production-level delivery, and the new protocols, alerts, competencies and other things you need to think about. You can’t be knee-deep in the details of each cloud and still manage infrastructure. Adding in the complexity of application delivery over multiple clouds can be a challenge, but much less so if you use a SaaS-based application delivery platform. With multi-cloud infrastructure, control is key.

Learn more about our solutions for  multi-cloud architectures and discover our application delivery platform.

You can also download our last ebook “Hybrid Cloud, the New Normal” for free here.

Cloud-First + DevOps-First = 81% Competitive Advantage

We recently ran across a fascinating article by Jason Bloomberg, a recognized expert on agile digital transformation, that examines the interplay between Cloud-First and DevOps-First models. That article led us, in turn, to an infographic centered on some remarkable findings from a CA Technologies survey of 900-plus IT pros from around the world. The survey set out to explore the synergies between Cloud and DevOps, specifically in regards to software delivery. You can probably guess why we snapped to attention.

The study found that 20 percent of the organizations represented identified themselves as being strongly committed to both Cloud and DevOps, and their software delivery outperformed other categories (Cloud only, DevOps only, and slow adopters) by 81 percent. This group earned the label “Delivery Disruptors” for their outlying success at maximizing agility and velocity on software projects. On factors of predictability, quality, user experience, and cost control, the Disruptor organizations soared above those employing traditional methods, as well as Cloud-only and DevOps-only methods, by large percentages. For example, Delivery Disruptors were 117 percent better at cost control than Slow Movers, and 75 percent better in this category than the DevOps-only companies.

These findings, among others, got us to thinking about the potential benefits and advantages such Delivery Disruptors can gain from adding Cedexis solutions into their powerful mix. Say, for example, you have agile dev teams working on new products and apps and you want to shorten the execution time for new cloud projects. To let your developers focus on writing code, you need an app delivery control layer that supports multiple teams and architectures. With the Cedexis application delivery platform, you can support agile processes, deliver frequent releases, control cloud and CDN costs, guarantee uptime and performance, and optimize hybrid infrastructure. Your teams get to work their way, in their specific environment, without worrying about delivery issues looming around every corner.

Application development is constantly changing thanks to advances like containerization and microservice architecture — not to mention escalating consumer demand for seamless functionality and instant rich media content. And in a hybrid, multi-cloud era, infrastructure is so complex and abstracted, delivery intelligence has to be embedded in the application (you can read more about what our Architect, Josh Gray, has to say about delivery-as-code here).

To ensure that an app performs as designed, and end users have a high quality experience, agile teams need to automate and optimize with software-defined delivery. Agile teams can achieve new levels of delivery disruption by bringing together global and local traffic management data (for instance, RUM, synthetic monitoring results, and local load balancer health), application performance management, and cost considerations to ensure the optimal path through datacenters, clouds, and CDNs.

Imagine the agility and speed a team can contribute to digital transformation initiatives with fully automated app delivery based on business rules, actionable data from real user and synthetic testing, and self-healing network optimizations. Incorporating these capabilities with a maturing Cloud-first and DevOps-first approach will likely put the top performers so far ahead of the rest of the pack, they’ll barely be on the same racetrack.

 

 

Why CapEx Is Making A Comeback

The meteoric rise of both the public cloud and SaaS have brought along a strong preference for OpEx vs. CapEx. To recap: OpEx means you stop paying for a thing up front, and instead just pay as you go. If you’ve bought almost any business software lately you know the drill: you walk away with a monthly or annual subscription, rather than a DVD-ROM and a permanent or volume license.

But the funny thing about business trends is the frequency with which they simply turn upside down and make the conventional wisdom obsolete.

Recently, we have started seeing interest in getting out of pay as you go (rather unimaginatively often shortened as PAYGO) as a model, and moving back toward making upfront purchases then holding on for the ride as capital items get amortized.

Why? It’s all about economies of scale.

Imagine, if you will, that you are able to rent an office building for $10 a square foot, then rent out the space for $15 a square foot. Seems like a decent deal at 50% margin; but of course you’re also on the hook for servicing the customers, the space, and so forth. You’ll get a certain amount of relief as you share janitorial services across the space, of course, but your economic ceiling is stuck at 50%.

Now imagine that you purchase that whole building for $10M and rent out the space for $15M. Your debt payment may cut into profits for a few years, but at some point you’re paid off – and every year’s worth of rent thereafter is essentially all profit.

The first scenario puts an artificial boundary on both risk and reward: you’re on the hook for a fixed  amount of rental cost, and can generate revenues only up to 150% of your outlay. You know how much you can lose, and how much you can gain. By contrast, in the second scenario, neither risk nor reward is bounded: with ownership comes risk (finding asbestos in the walls, say), as well as unlimited potential (raise rental prices and increase the profit curve).

This basic model applies to many cloud services – and to no small degree explains why so many companies are able to pop up – their growth is scaled with provisioned services.

If you were to decide to fire up a new streaming video service, say, that showed only the oeuvre of, say, Nicolas Cage, you’d want to have a fairly clear limit on your risk: maybe millions of people will sign up, but then again maybe they won’t. In order to be sure you’ve maximized the opportunity, though, you’ll need a rock solid infrastructure to ensure your early adopters get everything they expect: quick video start times, low re-buffering ratios, and excellent picture resolution. It doesn’t make sense to build that all out anew: you’re best off popping storage onto a cloud, maybe outsourcing CMS and encoding to an Online Video Platform (OVP), and delegating delivery to a global content delivery network (CDN). In this way you can have a world-class service, without having to pony up for servers, encoders, points of presence (POPs), load balancers, and all the other myriad elements necessary to compete.

In the first few months, this would be great – your financial risk is relatively low as you target your demand generation at the self-proclaimed “total Cage-heads”. But as you reach a wider and wider audience, and start to build a real revenue stream, you realize: the ongoing cost of all those outsourced, opex-based, services is flattening the curve that could bring you to profitability. By contrast, spinning up a set of machines to store, compute, and deliver your content could set a relatively fixed cost that, as you add viewers, would allow you to realize economies of scale and unbound profit.

We know that this is a real business consideration because Netflix already did it. Actually, they did it some time ago: while they do much (if not most) of their computation through cloud services, they decided in 2012 to move away from commercials CDNs in favor of their own Open Connect, and announced in 2016 that all its content delivery needs were now covered by their own network. Not only did this reduce their monthly opex bill, it also gave them control over the technology they used to guarantee an excellent quality of experience (QoE) for their users.

So for businesses nearing this op v. cap inflection point, the time really has arrived to put pencil to paper and calculate the cost of going it alone. The technology is relatively easy to acquire and manage, from server machines, to local load balancers and cache servers, and on up to global server load balancers. You can see a little bit more about how to actually build your own CDN here.

Opex solutions are absolutely indispensable in getting new services off the starting line; but it’s always worth keeping an eye on the economics, because with a large enough audience going it alone is the way to go.

Live and Generally Available: Impact Resource Timing

We are very excited to be officially launching Impact Resource Timing (IRT) for general availability.

IRT is Impact’s powerful window into the performance of different sources of content for the pages in your website. For instance, you may want to distinguish the performance of your origin servers relative to cloud sources, or advertising partners; and by doing so, establish with confidence where any delays stem from. From here, you can dive into Resource Timing data sliced by various measurements over time, as well as through a statistical distribution view.

What is Resource Timing? Broadly speaking, resource timing measures latency within an application (i.e. browser). It uses JavaScript as the primary mechanism to instrument various time-based metrics of all the resources requested and downloaded for a single website page by an end user. Individual resources are objects such as JS, CSS, images and other files that the website pages requests. The faster the resources are requested and loaded on the page, the better quality user experience (QoE) for users.  By contrast, resources that cause longer latency can produce a negative QoE for users.  By analyzing resourcing timing measurements, you can isolate the resources that may be causing degradation issues for your organization to fix.  

Resource Timing Process:

Cedexis IRT makes it easy for you to track resources from identified sources, normally identified through domain (*.myDomain.com), by sub-domain(e.g. images.myDomain.com), and by the provider serving your content. In this way, you can quickly group together types of content, and identify the source of any latency. For instance, you might find that origin-located content is being delivered swiftly, while cloud-hosted images are slowing down the load time of your page; in such a situation, you would now be in a position to consider a range of solutions, including adding a secondary cloud provider and a global server load balancer to protect QoE for your users.

Some benefits of tracking Resource Timing.

  • See which hostnames  – and thus which classes of content – are slowing down your site.
  • Determine which resources impact your overall user experience.
  • Correlate resource performance with user experience.

Impact Resource Timing from Cedexis allows you to see how content sources are performing across various measurement types such as Duration, TCP Connection Time, and Round Trip Time. IRT reports also give you the ability to drill down further by Service Providers, Locations, ISPs, User Agent (device, browsers, OS) and other filters.

Check out our User Guide to learn more about our Measurement Type calculations.

There are two primary reports in this release of Impact Resource Timing. The Performance report, which gives you a trending view of resource timing over time and the Statistical Distribution report, which reports Resource Timing data through a statistical distribution view.  Both reports have very dynamic reporting capabilities that allow you to easily pinpoint resource-related issues for further analysis.  


Using the Performance report, you can isolate which grouped resources are causing potential end user experience issues by hostname, page or service provider and when the issue happened. Drill down even further to see if this was a global issue or localized to a specific location or if it was by certain user devices or browsers.  

IRT is now available for all in the Radar portal – take it for a spin and let us know your experiences!

How To Deliver Content for Free!

OK, fine, not for free per se, but using bandwidth that you’ve already paid for.

Now, the uninitiated might ask what’s the big deal – isn’t bandwidth essentially free at this point? And they’d have a point – the cost per Gigabyte of traffic moved across the Internet has dropped like a rock, consistently, for as long as anyone can remember. In fact, Dan Rayburn reported in 2016 seeing prices as low as ¼ of a penny per gigabyte. Sounds like a negligible cost, right?

As it turns out, no. As time has passed, the amount of traffic passing through the Internet has grown. This is particularly true for those delivering streaming video: consumers now turn up their nose at sub-broadcast quality resolutions, and expect at least an HD stream. To put this into context, moving from HD as a standard to 4K (which keeps threatening to take over) would result in the amount of traffic quadrupling. So while CDN prices per Gigabyte might drop 25% or so each year, a publisher delivering 400% the traffic is still looking at an increasingly large delivery bill.

It’s also worth pointing out that the cost of delivery relative to delivering video through a traditional network, such as cable or satellite is surprisingly high. An analysis by Redshift for the BBC clearly identifies the likely reality that, regardless of the ongoing reduction in per-terabyte pricing “IP service development spend is likely to increase as [the BBA] faces pressure to innovate”, meaning that online viewers will be consuming more than their fair share of the pie.

Take back control of your content…and your costs

So, the price of delivery is out of alignment with viewership, and is increasing in practical terms. What’s a streaming video provider to do?

Allow us to introduce Varnish Extend, a solution combining the powerful Varnish caching engine that is already part of delivering 25% of the world’s websites; and Openmix, the real-time user-driven predictive load balancing system that uses billions of user measurements a day to direct traffic to the best pathway.

Cedexis and Varnish have both found that the move to the Cloud left a lot of broadcasters as well as OTT providers with unused bandwidth available on premise.Bymaking it easy to transform an existing data-center into a private CDN Point of Presence (PoP), Varnish Extend empowers companies to easily make the most out of all the bandwidth they have paid for, by setting up Varnish nodes on premise, or on cloud instances that offer lower operational costs than using CDN bandwidth.

This is especially valuable for broadcasters/service providers whose service is limited to one country: the global coverage of a CDN may be overkill, when the same quality of experience can be delivered by simply establishing POPs in strategic locations in-country.

Unlike committing to an all-CDN environment, using a private CDN infrastructure like Varnish Extend supports scaling to meet business needs – costs are based on server instances and decisions, not on the amount of traffic delivered. So as consumer demands grow, pushing for greater quality, the additional traffic doesn’t push delivery costs over the edge of sanity.

A global server load balancer like Openmix automatically checks available bandwidth on each Varnish node as well as each CDN, along with each platform’s performance in real-time. Openmix also uses information from the Radar real user measurement community to understand the state of the Internet worldwide and make smart routing decisions.

Your own private CDN – in a matter of hours

Understanding the health of both the private CDN and the broader Internet makes it a snap to dynamically switch end-users between Varnish nodes and CDNs, ensuring that cost containment doesn’t come at the expense of customer experience – simply establish a baseline of acceptable quality, then allow Openmix to direct traffic to the most cost-effective route that will still deliver on quality.

Implementing Varnish Extend is surprisingly simple (some customers have implemented their private CDN in as little as four hours):

  1. Deploy Varnish Plus nodes within existing data-centre or on public cloud,
  2. Configure Cedexis Openmix to leverage these nodes as well as existing CDNs.
  3. Result: End-users are automatically routed to the best delivery node based on performance, costs, etc.

Learn in detail how to implement Varnish Extend

Sign up for Varnish Software – Cedexis Summit in NYC

References/Recommended Reading:

Mobile Video is Devouring the Internet

In late 2009 – fully two years after the introduction of the extraordinary Apple iPhone – mobile was barely discernible on any measurement of total Internet traffic. By late 2016, it finally exceeded desktop traffic volume. In a terrifyingly short period of time, mobile Internet consumption moved from an also-ran to a behemoth, leaving behind the husks of marketing recommendations to “move to Web 2.0” and to “design for Mobile First”. And along the way, Apple encouraged us to buy into the concept that the future (of TV at least) is apps.

Unsurprisingly, the key driver of all this traffic is – as it always is – video. One in every three mobile device owners watches videos of at least 5 minutes’ duration, which is generally considered the point at which the user has moved from short-form, likely user-generated, content, to premium video (think: TV shows and movies). And once viewers pass the 5minute mark, it’s a tiny step to full-length, studio-developed content, which is a crazy bandwidth hog.  Consider that video is expected to represent fully 75% of all mobile traffic by 2020 – when it was just 55% in 2015.


As consumers get more interested in video, producers aren’t slowing down. By 2020, it is estimated that it would take an individual fully 5 million years to watch the video being published and made available in just a month. And while consumer demand varies around the world – 72% of Thailand’s mobile traffic is video, for instance, versus just 41% in the United States – the reality is that, without some help, the mobile Web is going to be straining under the weight of near-unlimited video consumption.

What we know is that, hungry as they are for content, streaming video consumers are fickle and impatient. Akamai demonstrated years ago the 2-second rule: if a requested piece of content isn’t available in under 2 seconds, Internet users simply move on to the next thing. And numerous studies have shown definitively that when re-buffering (the dreaded pause in playback while the viewing device downloads the next section of the video) exceeds just 1% of viewing time, audience engagement collapses, resulting in dwindling opportunities to monetize content that was expensive to acquire, and can be equally costly to deliver.

How big of a problem is network congestion? It’s true that big, public, embarrassing outages across CDNs or ISPs are now quite rare. However, when we studied the network patterns of one of our customers, we found that what we call micro-outages (outages lasting 5 minutes or less) happen literally hundreds to thousands of times a day. That single customer was looking at some 600,000 minutes of direct lost viewing time per month – and when you consider how long each customer might have stayed, and their decreased inclination to return in the future, that number likely translates to several million minutes of indirectly lost minutes.

While mobile viewers are more likely to watch their content through an app (48% of all mobile Internet users) than a browser (18%), they still receive the content through the chaotic maelstrom of a network that is the Internet. As such, providers have to work out the best pathways to use to get the content there, and to ensure that the stream will have consistency over time so that it doesn’t fall prey to the buffering bug.

Most providers use stats and analysis to work out the right pathways – so they can look at how various CDN/ISP combos are working, and pick the one that is delivering the best experience. Strikingly, though, they often have to make routing decisions for audience members who are in geographical locations that aren’t currently in play, which means choosing a pathway without any recent input on which is going to be the best pathway – this is literally gambling with the experience of each viewer. What is needed is something predictive: something that will help the provider to know the right pathway the first time they have to choose.

This is where the Radar Community comes in: by monitoring, tracking, and analyzing the activity of billions of Internet interactions every day, the community knows which pathways are at peak health, and which need a bit of a breather before getting back to full speed. So, when using Openmix to intelligently route traffic, the Radar community data provides the confidence that every decision is based on real-time, real-user data – even when, for a given provider, they are delivering to a location that has been sitting dormant.

Mobile video is devouring the Web, and will continue to do so, as consumers prefer their content to move, dance, and sing. Predictively re-routing traffic in real-time so that it circumvents the thousands of micro-outages that plague the Internet every day means never gambling with the experience of users, staying ahead of the challenges that congestion can bring, and building the sustainable businesses that will dominate the new world of streaming video.

How to Make Cloud Pay Its Own Way

Rightscale came out with a wonderful report on the state of the cloud industry, and we learned some important new things:

  • 77% of organizations are at least exploring private cloud implementations
  • 82% of enterprises are executing a hybrid cloud strategy
  • 26% of respondents are now listing cost as significant challenge – ironically, given the importance of cost-cutting in the early growth of cloud services

The growth in hybrid cloud adoption is particularly striking: by Rightscale’s count, only 6% of companies are exclusively looking at private cloud,  18% are exclusively looking at public cloud , while a full 71% have a toe dipped into each pool.

Meanwhile, Cisco estimates that two thirds of all Internet traffic will traverse at least one content delivery network by 2020 – which tends to imply that most organizations are, right now, invested in getting the most out of some combination of private cloud, public cloud, CDN, and, presumably, physically-managed data center.

Fundamentally, there are a few core ways that we see organizations using this market basket of delivery pathways – and, naturally, our Openmix global server load balancer – to better serve their customers, and to protect their economics as demand grows, apparently insatiable. The core strategies are:

  1. Balance CDNs, offload to origin. For web-centric businesses, delivering content across the Internet is fundamental to their success (possibly their survival), so they tend to rely upon one or more CDNs to get content to their users effectively. Over time, they tend to expand the number of CDN relationships, in order to improve quality across geographies, and to make the most of pricing differences between providers. Once they get this set to equilibrium, they discover that there is unused capacity at origin (or within a private or public cloud instance) to which they can offload traffic, maximizing the return they get on committed capacity, and minimizing unnecessary spend.
  2. Balance clouds, offload to CDN. For businesses that are highly geographically-focused, it is often more effective to create what is essentially a self-managed CDN, establishing PoPs through cloud providers in population centers where their customers actually originate. Even the most robust internally-managed system, however, is subject to traffic spikes that are way beyond expectations (and committed throughput limits), and so these companies build relationships with CDNs in which excess traffic is offloaded at peak times.
  3. Balance Hybrid Cloud. Organizations at the far right of Rightscale’s cloud maturity scale (in their words, the Cloud Explorers and Cloud Focused) are starting to view each of the delivery options not as wildly distinct options, but merely as similar-if-different-looking cogs in the machine. As such, they look at load and cost balancing through a pragmatic prism, in which each user is simply served through the lowest cost provider, so long as it can pass a pre-defined quality bar (a specified latency rate, for instance, or a throughput level). By shifting the mindset away from ‘primary’ and ‘offload’ networks, organizations are able to build strategies that optimize for both cost and quality.

Of course, to balance traffic across a heterogeneous set of delivery networks (and provider types), while adjusting for a combination of both economic and quality of service metrics, requires three things:

  1. Real-time visibility of the state of the Internet beyond the view of the individual publisher, in order to be able to evaluate Quality of Service levels prior to selecting a delivery provider
  2. Real-time visibility into the current economic situation with each contracted provider: which offers the lowest cost option, based on unit pricing, contract commitments, and so forth
  3. Real-time traffic routing, which takes the data inputs, compares them to the unique requirements of the requesting publisher, and seamlessly directs traffic along the right pathway

Not an easy recipe, perhaps, but when found, it results in the opportunity to apply sophisticated algorithms to delivery – in effect to exercise a Wall Street-level arbitrage approach, which results in a combination of delighted customers, and reduced infrastructure costs.

Or, put another way, the opportunity to make your hybrid cloud strategy pay for itself – and more.

To find out more about real-time predictive traffic routing, please take a look around our Openmix pages,  read about how to deliver 100% availability with a Hybrid CDN architecture, and visit our Github repository to see how easy it is to build your own real-time load balancing algorithm.

Make Mobile Video Stunning with Smart Load Balancing

If there’s one thing about which there is never an argument it’s this: streaming video consumers never want to be reminded that they’re on the Internet. They want their content to start quickly, play smoothly and uninterrupted, and be visually indistinguishable from traditional TV and movies. Meanwhile, the majority of consumers in the USA (and likely a similar proportion worldwide) prefer to consume their video on mobile devices. And as if that wasn’t challenging enough, there are now suggestions that live video consumption will grow – according to Variety by as much as 39 times! That seems crazy until you consider that Cisco predicted video would represent 82% of all consumer Internet traffic by 2020.

It’s no surprise that congestion can result in diminished viewing quality, leading over 50% of all consumers to, at some point, experience buffer rage from the frustration of not being able to play their show.

Here’s what’s crazy: there’s tons of bandwidth out there – but it’s stunningly hard to control.

The Internet is a best-efforts environment, over which even the most effective Ops teams can wield only so much control, because so much of it is either resident with another team, or is simply somewhere in the amorphous ‘cloud’.  While many savvy teams have sought to solve the problem by working with a Content Delivery Network (CDN), the sheer growth in traffic has meant that some CDNs are now dealing with as much traffic as the whole Internet transferred just a few years ago…and are themselves now subject to their own congestion and outage challenges. For this reason, plenty of organizations now contract with multiple CDNs, as well as placing their own virtual caching servers in public clouds, and even deploying their own bare-metal CDNs in data centers where their audiences are centered.

With all these great options for delivering content, Ops teams must make real-time decisions on how to balance the traffic across them all. The classic approaches to load balancing have been (with many thanks to Nginx):

  • Availability – Any servers that cannot be reached are automatically removed from the list of options (this prevents total link failure).
  • Round Robin – Requests are distributed across the group of servers sequentially.
  • Least Connections – A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
  • IP Hash – The IP address of the client is used to determine which server receives the request.

You might notice something each of those has in common: they all focus on the health of the system, not on the quality of the experience actually being had by the end user. Anything that balances based on availability tends to be driven by what is known as synthetic monitoring, which is essentially one computer checking another computer is available.

But we all know that just because a service is available doesn’t mean that it is performing to consumer expectations.

That’s why the new generation of Global Server Load Balancer(GSLB) solutions goes a step further. Today’s GSLB uses a range of inputs, including

  • Synthetic monitoring – to ensure servers are still up and running
  • Community Real User Measurements – a range of inputs from actual customers of a broad range of providers, aggregated, and used to create a virtual map of the Internet
  • Local Real User Measurements – inputs from actual customers of the provider’s own service
  • Integrated 3rd party measurements – including cost bases and total traffic delivered for individual delivery partners, used to balance traffic based not just on quality, but also on cost

Combined, these data sources allow video streaming companies not only to guarantee availability, but also to tune their total network for quality, and to optimize within that for cost. Or put another way – streaming video providers can now confidently deliver the quality of experience consumers expect and demand, without breaking the bank to do it.

When you know that you are running across the delivery pathway with the highest quality metrics, at the lowest cost, based on the actual experience of your users – that’s a stunning result. And it’s only possible with smart load balancing, combining traditional synthetic monitoring with the real-time feedback of users around the world, and the 3rd party data you use to run your business.

If you’d like to find out more about smart load balancing, keep looking around our site. And if you’re going to be at Mobile World Congress at the end of the month, make an appointment to meet with us there so we can show you smart load balancing in real life.

Network Resilience for the Cloud

If we’ve learned nothing else over the last few weeks, it has been that the Internet is an unruly, inherently insecure network. The ups and downs of Dyn – taken offline by hackers, yet subsequently purchased for what appears to be north of half a billion dollars – remind us that we are still a generation or two away from comfortable consistency. More importantly, they remind cloud businesses that they are relying upon a network over which they have only limited control.

Peter Deutch and others at Sun Microsystems proposed, a number of years ago, the Fallacies of Distributed Computing.  They are:

  1. The network is reliable.
  2. Latency is zero.
  3. Bandwidth is infinite.
  4. The network is secure.
  5. Topology doesn’t change.
  6. There is one administrator.
  7. Transport cost is zero.
  8. The network is homogeneous.

The briefest look through this list tells you that these are brilliantly conceived, and as valid today as they were when first introduced in 1994 (in fairness, number 8 was added in 1997). Indeed, turn them upside down and you can already see the poster to go on every Operations team’s wall, reminding them that:

  1. The Internet is inherently unreliable
  2. Internet latency is a fact of life, and must be anticipated
  3. Bandwidth is shared, precious, and limited
  4. No Internet-connected system is 100% secure
  5. Internet topology changes quicker than the staircases at Hogwarts
  6. There are so many administrators of the Internet there may as well be none
  7. Transport always has a cost – your job is to keep it low
  8. The Internet consists of an infinite number of misfitting pieces

In a recent paper commissioned by Cedexis from The FactPoint Group, a new paradigm is proposed: stop building fault-tolerant systems, and start building failure-tolerant systems. Simply stated, an internally-managed network can be constructed with redundancy and failover capabilities, with a reasonable goal of near-100% consistent service.  Cloud architectures, however, have so many moving parts and interdependencies, that no amount of planning can eliminate failures. Cloud architecture, therefore, requires a design that assumes failures will happen and plans for them.

data-center-evolution-icons

(Click here if you’d like to read the paper in full)

This means there’s more to this than load balancing – we’re really talking about resource optimization. Take, for example, caching within a private cloud. First, we can use a Global Traffic Manager (GTM) to maximize Quality of Experience (QoE) by routing traffic along the pathways that will deliver content the most quickly and efficiently. Secondly we can use intelligent caching to protect against catastrophic failure: a well-tuned Varnish server for instance, can continue to serve cached content while an unavailable origin server is repaired and put back into service. In a situation where DNS services are down, the GTM can use Real User Measurements (RUM) to spot the problem, and direct requests to the right Varnish server (and, in case DNS is down, a well-constructed decision set can contain IP addresses for emergencies). The Varnish server can check for the availability of its origin and, if DNS problems prevent it from sourcing fresh content, can serve cached content.

Will this solve for every challenge? Assuredly not – but the multi-layered preparation for failure greatly improves our chances of protecting against extended outages. Meanwhile, the agility that is adopted by Operations teams as they prepare for failure means a more subtle, sophisticated set of network architectures, which lend themselves to way greater resiliency.

As applications increasingly become a tightly-knit conglomeration of web-connected services and resources, planning for failure is not a choice, it is an imperative. Protecting against the variety of threats to the shared Internet requires agility, forethought, and the zen-like acceptance that failure is inevitable.