Cedexis 2017 Highlights

2017 has been a thrilling year for our whole company.

Read our infographic “2017 in a Nutshell” to get a 360 degree view of the year’s biggest highlights.

Click inside the infographic on the text to get direct access to dedicated contents.

Quarter 1:

Q2:

Q3:

Q4:

New Feature: Reason Code reporting

Cedexis’ Global Load Balancing solution Openmix makes over 2.5 billion real-time delivery decisions every day. These routing decisions are based on a combination of Radar community’s 14 billion daily real user measurements and our customers’ defined business logic.

One thing we hear time and time again is “It’s great that you are making all these decisions, but it would be very valuable into why you are switching pathways.”  The “why” is hugely valuable in understanding the “what” (Decisions) and “when” (Time) of the Openmix decision-routing engine.

And so, we bring you: Reason Codes.

Reason Codes in Openmix applications are used to log and identify decisions being made, so you can easily establish why users were routed, such as to certain providers or geographical locations.  Reason Codes reflect factors such as Geo overrides, Best Round Trip Time, Data Problems, Preferred Provider Availability and or whatever other logic is built into your Openmix applications. Having the ability to see which Reason Codes (the “why”) impacted what decisions were made allows you to see clearly where problems are arising in your delivery network, and make adjustments where necessary.

Providing these types of insights is core to Cedexis’ DNA, so we are pleased to announce the general availability of Reason Codes as part of the Decision Report.  You can now view Reason Codes as both Primary and Secondary Dimensions, as well as through a specific filter for Reason Codes.

As a Cedexis Openmix user, you’ll want to get in on this right away. Being able to see what caused Openmix to route users from your preferred Cloud or CDN provider to another one because of a certain event (perhaps a data outage in the UK) allows you to understand what transpired over a specific time period. No second guessing of why decisions spiked in a certain country or network. Using Reason Codes, you can now easily see which applications are over- and under-performing and why.

Here is an example of how you can start gaining insights.

You will notice in the first screenshot below that for a period of time, there was a spike in the number of decisions that Openmix made for two of the applications.

Now all you have to do is switch the view from looking at the Application as your primary dimension to Reason Code and you can quickly see that “Routed based [on] Availability data” was the main reason for Openmix re-routing users

Drilling down further, you can add Country as your Secondary Dimension and you can see that this was happening primarily in the United States.

All of a sudden, you’re in the know: there wasn’t just ‘something going on’ – there was a major Availability event in the US. Now it’s time to hunt down your rep from that provider and find out what happened, what the plan is to prevent it in the future, and how you can adjust your network to ensure continued excellent service for all your users.

Live and Generally Available: Impact Resource Timing

We are very excited to be officially launching Impact Resource Timing (IRT) for general availability.

IRT is Impact’s powerful window into the performance of different sources of content for the pages in your website. For instance, you may want to distinguish the performance of your origin servers relative to cloud sources, or advertising partners; and by doing so, establish with confidence where any delays stem from. From here, you can dive into Resource Timing data sliced by various measurements over time, as well as through a statistical distribution view.

What is Resource Timing? Broadly speaking, resource timing measures latency within an application (i.e. browser). It uses JavaScript as the primary mechanism to instrument various time-based metrics of all the resources requested and downloaded for a single website page by an end user. Individual resources are objects such as JS, CSS, images and other files that the website pages requests. The faster the resources are requested and loaded on the page, the better quality user experience (QoE) for users.  By contrast, resources that cause longer latency can produce a negative QoE for users.  By analyzing resourcing timing measurements, you can isolate the resources that may be causing degradation issues for your organization to fix.  

Resource Timing Process:

Cedexis IRT makes it easy for you to track resources from identified sources, normally identified through domain (*.myDomain.com), by sub-domain(e.g. images.myDomain.com), and by the provider serving your content. In this way, you can quickly group together types of content, and identify the source of any latency. For instance, you might find that origin-located content is being delivered swiftly, while cloud-hosted images are slowing down the load time of your page; in such a situation, you would now be in a position to consider a range of solutions, including adding a secondary cloud provider and a global server load balancer to protect QoE for your users.

Some benefits of tracking Resource Timing.

  • See which hostnames  – and thus which classes of content – are slowing down your site.
  • Determine which resources impact your overall user experience.
  • Correlate resource performance with user experience.

Impact Resource Timing from Cedexis allows you to see how content sources are performing across various measurement types such as Duration, TCP Connection Time, and Round Trip Time. IRT reports also give you the ability to drill down further by Service Providers, Locations, ISPs, User Agent (device, browsers, OS) and other filters.

Check out our User Guide to learn more about our Measurement Type calculations.

There are two primary reports in this release of Impact Resource Timing. The Performance report, which gives you a trending view of resource timing over time and the Statistical Distribution report, which reports Resource Timing data through a statistical distribution view.  Both reports have very dynamic reporting capabilities that allow you to easily pinpoint resource-related issues for further analysis.  


Using the Performance report, you can isolate which grouped resources are causing potential end user experience issues by hostname, page or service provider and when the issue happened. Drill down even further to see if this was a global issue or localized to a specific location or if it was by certain user devices or browsers.  

IRT is now available for all in the Radar portal – take it for a spin and let us know your experiences!

Amazon Outage: The Aftermath

Amazon AWS S3 Storage Service had a major, widely reported, multi-hour outage yesterday in their US-East-1 data center. The S3 service in this particular data center was one of the very first services Amazon launched when it introduced cloud computing to the world more than 10 years ago. It’s grown exponentially since–storing over a trillion objects and servicing a million requests/second supporting thousands of web properties (this article alone lists over 100 well-known properties that were impacted by this outage).

Amazon has today published a description of what happened. The summary is that this was caused by human error. One operator, following a published run book procedure, mis-typed a command parameter setting a sequence of failure events in motion. The outage started at 9:37 am PST.  A nearly complete S3 service outage lasted more than three hours and full recovery of other AWS S3-dependent services lasted several hours more.

A few months ago, Dyn taught the industry that single-sourcing your authoritative DNS creates the risk the military described as two is one, one is none. This S3 incident underscores the same lesson for object storage. No service tier is immune. If a website, content, service or application is important, redundant alternative capability at all layers is essential. And this requires appropriate capabilities to monitor and manage this redundancy. After all, fail-over capacity is only as good as the system’s ability to detect the need to, and to actually, failover. This has been at the heart of Cedexis’ vision since the beginning, and as we continue to expand our focus in streaming/video content and application delivery, this will continue to be an important and valuable theme as we seek to improve the Internet experience of every user around the world.

Even the very best, most experienced services can fail. And with increasing deconstruction of service-oriented architectures, the deeply nested dependencies between services may not always be apparent. (In this case, for example, the AWS status website had an underlying dependency on S3 and thus incorrectly reported the service at 100% health during most of the outage.)

We are dedicated to delivering data-driven, intelligent traffic management for redundant infrastructure of any type. Incidents like this should continue to remind the digital world that redundancy, automated failover, and a focus on the customer experience are fundamental to the task of delivering on the continued promise of the Internet.

Mobile Video is Devouring the Internet

In late 2009 – fully two years after the introduction of the extraordinary Apple iPhone – mobile was barely discernible on any measurement of total Internet traffic. By late 2016, it finally exceeded desktop traffic volume. In a terrifyingly short period of time, mobile Internet consumption moved from an also-ran to a behemoth, leaving behind the husks of marketing recommendations to “move to Web 2.0” and to “design for Mobile First”. And along the way, Apple encouraged us to buy into the concept that the future (of TV at least) is apps.

Unsurprisingly, the key driver of all this traffic is – as it always is – video. One in every three mobile device owners watches videos of at least 5 minutes’ duration, which is generally considered the point at which the user has moved from short-form, likely user-generated, content, to premium video (think: TV shows and movies). And once viewers pass the 5minute mark, it’s a tiny step to full-length, studio-developed content, which is a crazy bandwidth hog.  Consider that video is expected to represent fully 75% of all mobile traffic by 2020 – when it was just 55% in 2015.


As consumers get more interested in video, producers aren’t slowing down. By 2020, it is estimated that it would take an individual fully 5 million years to watch the video being published and made available in just a month. And while consumer demand varies around the world – 72% of Thailand’s mobile traffic is video, for instance, versus just 41% in the United States – the reality is that, without some help, the mobile Web is going to be straining under the weight of near-unlimited video consumption.

What we know is that, hungry as they are for content, streaming video consumers are fickle and impatient. Akamai demonstrated years ago the 2-second rule: if a requested piece of content isn’t available in under 2 seconds, Internet users simply move on to the next thing. And numerous studies have shown definitively that when re-buffering (the dreaded pause in playback while the viewing device downloads the next section of the video) exceeds just 1% of viewing time, audience engagement collapses, resulting in dwindling opportunities to monetize content that was expensive to acquire, and can be equally costly to deliver.

How big of a problem is network congestion? It’s true that big, public, embarrassing outages across CDNs or ISPs are now quite rare. However, when we studied the network patterns of one of our customers, we found that what we call micro-outages (outages lasting 5 minutes or less) happen literally hundreds to thousands of times a day. That single customer was looking at some 600,000 minutes of direct lost viewing time per month – and when you consider how long each customer might have stayed, and their decreased inclination to return in the future, that number likely translates to several million minutes of indirectly lost minutes.

While mobile viewers are more likely to watch their content through an app (48% of all mobile Internet users) than a browser (18%), they still receive the content through the chaotic maelstrom of a network that is the Internet. As such, providers have to work out the best pathways to use to get the content there, and to ensure that the stream will have consistency over time so that it doesn’t fall prey to the buffering bug.

Most providers use stats and analysis to work out the right pathways – so they can look at how various CDN/ISP combos are working, and pick the one that is delivering the best experience. Strikingly, though, they often have to make routing decisions for audience members who are in geographical locations that aren’t currently in play, which means choosing a pathway without any recent input on which is going to be the best pathway – this is literally gambling with the experience of each viewer. What is needed is something predictive: something that will help the provider to know the right pathway the first time they have to choose.

This is where the Radar Community comes in: by monitoring, tracking, and analyzing the activity of billions of Internet interactions every day, the community knows which pathways are at peak health, and which need a bit of a breather before getting back to full speed. So, when using Openmix to intelligently route traffic, the Radar community data provides the confidence that every decision is based on real-time, real-user data – even when, for a given provider, they are delivering to a location that has been sitting dormant.

Mobile video is devouring the Web, and will continue to do so, as consumers prefer their content to move, dance, and sing. Predictively re-routing traffic in real-time so that it circumvents the thousands of micro-outages that plague the Internet every day means never gambling with the experience of users, staying ahead of the challenges that congestion can bring, and building the sustainable businesses that will dominate the new world of streaming video.

How to Make Cloud Pay Its Own Way

Rightscale came out with a wonderful report on the state of the cloud industry, and we learned some important new things:

  • 77% of organizations are at least exploring private cloud implementations
  • 82% of enterprises are executing a hybrid cloud strategy
  • 26% of respondents are now listing cost as significant challenge – ironically, given the importance of cost-cutting in the early growth of cloud services

The growth in hybrid cloud adoption is particularly striking: by Rightscale’s count, only 6% of companies are exclusively looking at private cloud,  18% are exclusively looking at public cloud , while a full 71% have a toe dipped into each pool.

Meanwhile, Cisco estimates that two thirds of all Internet traffic will traverse at least one content delivery network by 2020 – which tends to imply that most organizations are, right now, invested in getting the most out of some combination of private cloud, public cloud, CDN, and, presumably, physically-managed data center.

Fundamentally, there are a few core ways that we see organizations using this market basket of delivery pathways – and, naturally, our Openmix global server load balancer – to better serve their customers, and to protect their economics as demand grows, apparently insatiable. The core strategies are:

  1. Balance CDNs, offload to origin. For web-centric businesses, delivering content across the Internet is fundamental to their success (possibly their survival), so they tend to rely upon one or more CDNs to get content to their users effectively. Over time, they tend to expand the number of CDN relationships, in order to improve quality across geographies, and to make the most of pricing differences between providers. Once they get this set to equilibrium, they discover that there is unused capacity at origin (or within a private or public cloud instance) to which they can offload traffic, maximizing the return they get on committed capacity, and minimizing unnecessary spend.
  2. Balance clouds, offload to CDN. For businesses that are highly geographically-focused, it is often more effective to create what is essentially a self-managed CDN, establishing PoPs through cloud providers in population centers where their customers actually originate. Even the most robust internally-managed system, however, is subject to traffic spikes that are way beyond expectations (and committed throughput limits), and so these companies build relationships with CDNs in which excess traffic is offloaded at peak times.
  3. Balance Hybrid Cloud. Organizations at the far right of Rightscale’s cloud maturity scale (in their words, the Cloud Explorers and Cloud Focused) are starting to view each of the delivery options not as wildly distinct options, but merely as similar-if-different-looking cogs in the machine. As such, they look at load and cost balancing through a pragmatic prism, in which each user is simply served through the lowest cost provider, so long as it can pass a pre-defined quality bar (a specified latency rate, for instance, or a throughput level). By shifting the mindset away from ‘primary’ and ‘offload’ networks, organizations are able to build strategies that optimize for both cost and quality.

Of course, to balance traffic across a heterogeneous set of delivery networks (and provider types), while adjusting for a combination of both economic and quality of service metrics, requires three things:

  1. Real-time visibility of the state of the Internet beyond the view of the individual publisher, in order to be able to evaluate Quality of Service levels prior to selecting a delivery provider
  2. Real-time visibility into the current economic situation with each contracted provider: which offers the lowest cost option, based on unit pricing, contract commitments, and so forth
  3. Real-time traffic routing, which takes the data inputs, compares them to the unique requirements of the requesting publisher, and seamlessly directs traffic along the right pathway

Not an easy recipe, perhaps, but when found, it results in the opportunity to apply sophisticated algorithms to delivery – in effect to exercise a Wall Street-level arbitrage approach, which results in a combination of delighted customers, and reduced infrastructure costs.

Or, put another way, the opportunity to make your hybrid cloud strategy pay for itself – and more.

To find out more about real-time predictive traffic routing, please take a look around our Openmix pages,  read about how to deliver 100% availability with a Hybrid CDN architecture, and visit our Github repository to see how easy it is to build your own real-time load balancing algorithm.

Make Mobile Video Stunning with Smart Load Balancing

If there’s one thing about which there is never an argument it’s this: streaming video consumers never want to be reminded that they’re on the Internet. They want their content to start quickly, play smoothly and uninterrupted, and be visually indistinguishable from traditional TV and movies. Meanwhile, the majority of consumers in the USA (and likely a similar proportion worldwide) prefer to consume their video on mobile devices. And as if that wasn’t challenging enough, there are now suggestions that live video consumption will grow – according to Variety by as much as 39 times! That seems crazy until you consider that Cisco predicted video would represent 82% of all consumer Internet traffic by 2020.

It’s no surprise that congestion can result in diminished viewing quality, leading over 50% of all consumers to, at some point, experience buffer rage from the frustration of not being able to play their show.

Here’s what’s crazy: there’s tons of bandwidth out there – but it’s stunningly hard to control.

The Internet is a best-efforts environment, over which even the most effective Ops teams can wield only so much control, because so much of it is either resident with another team, or is simply somewhere in the amorphous ‘cloud’.  While many savvy teams have sought to solve the problem by working with a Content Delivery Network (CDN), the sheer growth in traffic has meant that some CDNs are now dealing with as much traffic as the whole Internet transferred just a few years ago…and are themselves now subject to their own congestion and outage challenges. For this reason, plenty of organizations now contract with multiple CDNs, as well as placing their own virtual caching servers in public clouds, and even deploying their own bare-metal CDNs in data centers where their audiences are centered.

With all these great options for delivering content, Ops teams must make real-time decisions on how to balance the traffic across them all. The classic approaches to load balancing have been (with many thanks to Nginx):

  • Availability – Any servers that cannot be reached are automatically removed from the list of options (this prevents total link failure).
  • Round Robin – Requests are distributed across the group of servers sequentially.
  • Least Connections – A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
  • IP Hash – The IP address of the client is used to determine which server receives the request.

You might notice something each of those has in common: they all focus on the health of the system, not on the quality of the experience actually being had by the end user. Anything that balances based on availability tends to be driven by what is known as synthetic monitoring, which is essentially one computer checking another computer is available.

But we all know that just because a service is available doesn’t mean that it is performing to consumer expectations.

That’s why the new generation of Global Server Load Balancer(GSLB) solutions goes a step further. Today’s GSLB uses a range of inputs, including

  • Synthetic monitoring – to ensure servers are still up and running
  • Community Real User Measurements – a range of inputs from actual customers of a broad range of providers, aggregated, and used to create a virtual map of the Internet
  • Local Real User Measurements – inputs from actual customers of the provider’s own service
  • Integrated 3rd party measurements – including cost bases and total traffic delivered for individual delivery partners, used to balance traffic based not just on quality, but also on cost

Combined, these data sources allow video streaming companies not only to guarantee availability, but also to tune their total network for quality, and to optimize within that for cost. Or put another way – streaming video providers can now confidently deliver the quality of experience consumers expect and demand, without breaking the bank to do it.

When you know that you are running across the delivery pathway with the highest quality metrics, at the lowest cost, based on the actual experience of your users – that’s a stunning result. And it’s only possible with smart load balancing, combining traditional synthetic monitoring with the real-time feedback of users around the world, and the 3rd party data you use to run your business.

If you’d like to find out more about smart load balancing, keep looking around our site. And if you’re going to be at Mobile World Congress at the end of the month, make an appointment to meet with us there so we can show you smart load balancing in real life.

How Much Money Is Your Website Performance Costing You?

Hello friends. We are pleased to announce that Cedexis will be partnering with SOASTA to bring you a roundtable discussion covering recent research into the costs of poor website performance and the expectation of the modern web visitor has evolved.

If you’re interested in learning about:

  • How even slight web delays can lead to lost revenue
  • Understand how to use Real User Measurements to understand what’s not working & make that data actionable
  • What leading enterprises are doing to build high-performance sites

…then this is the event for you. Pete Mastin, Cedexis Product Evangelist, and Tammy Everts , SOASTA Director of Content, will be reviewing strategies and technologies to boost website performance, and show you how business KPIs can be used to ensure your site is contributing as much as possible to the bottom line.

This online event takes place Tuesday, Sept 20th and will be hosted by Aberdeen – the research group, not the Scottish city (though any region housing Balgownie Links is a place I’d like to visit).  Jim Rapoza, Sr Research Analyst from Aberdeen will be moderating.  Presentations will last approximately 30 minutes and there will be plenty of time for Q&A at the end.

This webinar is free as always.  For more detail & how to register, click here.  And if you want to learn more about Cedexis Openmix, which uses real-time data for global traffic management, click here.  Thanks for reading! Enjoy the roundtable.

 

 

Matt Radochonski is Cedexis’ Director of Demand Generation & Marketing Operations. He can be reached on twitter @MattAtCedexis or via email, and feel free to leave a comment below.

Cedexis Portal Update: New Menus and Reports

As part of our ongoing efforts to surface powerful data to help our customers discover, analyze and improve performance and availability of their digital assets, we’ve made an update to our award winning Cedexis Portal to consolidate our powerful analytics reporting under our Impact product umbrella.

Our Impact menu now exposes Navigation Timing Data (previously called “Page Load Time” in the Portal), our brand new Impact Resource Timing reports (currently in Beta), and our Impact Business Analytics data, which provides incredible details of how site performance impacts business metrics and KPIs.

If you aren’t currently collecting data for any of these reports, we now provide a “sample report” showing what the report looks like, and a description of the report features.  The additional reporting data generally comes from advanced features within the Radar tag.  Check it out and contact us if you have any questions, or would like to see more about these powerful features.

For those customers who have been using the Cedexis Portal already, the previous menu “Page Load Time” can now be found under the Impact “Navigation Timing Data” menu as seen below:

PLTMenu

If you are new to Cedexis, and haven’t seen the powerful analytics available in our Portal, or how easy it is to configure our Openmix solution to deliver optimal RUM-based, global load balancing across CDNs, Clouds or your data centers, please sign up for a free Portal account now and dive in!

More Impact! The value of the funnel.

With the release last week of the Cedexis Impact product, an enterprise is now able to understand the performance of its site within the context of customer experience. Understanding the customer journey (where the visitors are dropping off unexpectedly for instance) helps to improve the site and the experience. Impact gives you a view of the whole journey and where to focus the efforts. Impact provides visibility into the before and after perspective, and allows the web architect to have a continuous feedback loop of improvement.impact-funnel1

Using the funnel in the picture above, customers can gain a window into their data that no one has seen before. Specifically, as the user moves from the Catalog to the Product page and then on to the Checkout, you can see exactly how many of the users bailed and what the average Page Load Time (PLT) was for those users. You can see that on average the users that moved on to the next page were the ones who were having a responsive experience. The users who bailed by either going elsewhere in the site or (worse case) leaving the site altogether were generally having less performant experiences. The smart web architect can then do what he or she needs to do to improve the performance of the specific user journey. This may include adding a CDN or Cloud for performance – or it may involve other activities. In either case, these types of Web Performance Analytics are extremely important in increasing the value of the site.


impact_400ms
Everybody knows that customers stay on site and buy more stuff if the site performs well. The biggest factor in performance tuning is understanding where a site is failing. Impact can help an enterprise understand where people are leaving and why. Understanding this causal relationship is the first step in improving your sites performance.

For more information on this important new tool from Cedexis, read our Impact Fact Sheet.