Amazon vs. Azure messaging face-off

This week I had the opportunity to attend a face-off sponsored by the Seattle Web Technology Meetup Group.  Rather than a pitch from vendors, the face-off featured a company using each technology stack to educate the audience on the respective merits and gotchas of the stacks.  I thought it would be interesting to examine the messages from each technology provider by creating wordles for AWS and Azure, from their respective product home pages.

 

 

 

From these views, a few interesting points emerge.  Amazon’s brand is in the forefront for AWS, while Microsoft takes a backseat to Windows.  While “service”, “elastic”, and “AWS” occupy a predominant place in the Amazon message, “applications”, “cloud”, and “sign” (and “service”) take the forefront for Azure.

Here at Cedexis, we’re tracking the public clouds, private cloud (data centers) and CDNs for our clients the world over.  Come help yourself to some free data and answer critical cloud migration questions for your enterprise.

 

To Tweet or Not to Tweet : that is the question

So you have a witty set of 140 characters with a shortened URL linked to some brilliant content.  What’s next?  Well, you’re in your favorite app or web site and you tweet it of course.  What are the chances that the tweet will go through?  How long will it take?  We collected data from real user probes over the course of the year to answer these questions.

(The following data set consisted of nearly 700,000 summary data points with the originally collected data points numbering over 150 billion. It’s big data.)

1) Over a year after Twitter moved to its own data center, has the new Twitter-managed data center improved network availability and latency from the perspective of real users worldwide?  The trend looks good, but there’s certainly room for improvement.

2)  Across the G20 countries, a comparison of  Twitter availability and connect time to the other social media powerhouse, Facebook, represents another interesting data slice.  If you wanted to make a status update, which would you choose? It’s a toss up.

Twitter is usually more available (threshold for green is set low at 90%) but connect time is consistently longer.  The longer the bar, the longer the wait.  The redder the bar, the higher the chances that your tweet or status update won’t even get through.

 

Finally, if I were running the Twitter or Facebook engineering teams, I’d use the data above to improve the experience for users in G20 and beyond… @cedexis is just a simple tweet away!

Going BRIC? Go in with the right data.

As more and more of online traffic comes from countries outside of North America and Western Europe, online companies are looking to Brazil, Russia, India, and China as sources for growth.  When entering a new market, your business would consider the economics of the local markets and examine environmental factors: political stability, economic conditions, social acceptance of your product, etc.  We might read the news to under the political situation, look for economic indicators such as GDP, and under the local conditions through market research or partnerships.

For an online presence in new markets, such as the BRIC economies, consider a similar set of “Cloudonomics” metrics:  Availability and Response Time.  These metrics can be sliced across clouds and markets, as we’ve done below over the course of the year.  The variability across clouds in these markets is clear especially in certain markets like China.  No one cloud serves each and every market well across all form factors and content types.

[Click the image to enlarge it]

In future posts we’ll slice this data by form factors (including smartphones and tablets) and tease out insights about which clouds offer the best services for your strategy.  We’re also planning a webinar session to present strategies on entering BRIC markets with a data-driven cloud strategy. If entering BRIC markets is important to your online business and you’re looking for the right data across clouds to make important decisions, you don’t want to miss this.  If you’re a new cloud provider specializing in serving a specific geography with better service quality for specific content (such as video) for specific content (such as mobile), we can help you certify your cloud and include you in our rankings.

Contact us below if you’re interested in our upcoming free webinars for online businesses and cloud providers.

http://cedexis.wpengine.com/i/lose-the-wait/

Sample Country Report : Example of France

Philosophically, our goal is to show companies how fast they could be, by leveraging an effective multi-platform strategy. We’ve playing around with the idea of replacing our public charts with a set of Country Reports which answer a specific set of questions.

Here’s an example for France…

ISP Marketshare:
Where are my end-users (most likely to be) coming from within this country?

ISP Performance:
What are average page load times for end-users coming from these ISPs?

Web Benchmarks:
On average, how do the biggest sites in the world compare for end-users in this country?

Cloud Performance & Availability:
Where should I deploy my applications in order to deliver the best results to this country?

CDN Performance & Availability:
Features-aside (although this is often the most important consideration), what can I do about my static content to achieve the best results in this country?

Dynamic Content Acceleration (coming soon):
Which technologies can have the biggest impact on end-user perceived performance of my dynamic content?

Cloud Performance : Measure It and Take Action

Having spent nearly a decade with the traditional set of site monitoring tools entrenched in large-scale web sites including MSN, Bing, and Windows Live (you know where I worked), I’ve formed a few opinions on what’s important to measure and monitor.  First and foremost, I chose to focus my energy on the customer experience.  If you’re engaged in an activity which does not clearly impact the customer experience in a positive way, you should ask yourself whether you should be doing the activity at all.  Impacting the customer experience requires measuring the customer experience.  Recently some esteemed industry colleagues at Cloudsleuth measured cloud performance using a backbone & synthetic agent approach.   Our point of view and cloud benchmark results are markedly different and here’s why.

Let’s start with the user.  Does this scene sound remotely familiar? You’re sitting in a room full Engineering leads/architects who build and run your live site.  You’re talking about site performance and the big, bad “S” words comes up:  Customers are saying the site is “SLOW”. So like a good engineer you search for actionable data.   You ask yourself, “what can I do to make the site experience better?”  You pull up the usual suspects of browser plug-ins, developer tool du jour, and go to work to optimize Javascript loading, static file sizes, and the many site optimization tools out there.  Here’s a waterfall snapshot from Chrome that should look familiar.

Then, someone notices that some content which is seemingly out of your control in a particular browser session on a developer’s machine.  It’s a third-party ad image or rich content coming from a cloud or CDN that is slowing down the site, presumably for many end-users.   At this point, you either say it’s someone else’s problem or you direct your attention to more “front-end” issues in the page or “back-end” issues in the mysterious data center.  I’ve personally sat in many meetings in which there was no further thought beyond the page and the servers.  Are all users affected this way?  Which users (in which geographies) are experienced the worst latency?  What happened to the network in the equation?  It turns out, most folks lump the network into front-end analysis.  Is that the right approach?  Is the network (broadly speaking, the “cloud”) that sits between your end-user and service bits REALLY completely out of your control?

Real user monitoring (RUM) is too passive.  In a pure sense, passive monitoring means that the measurement does not interfere with the user experience of the page load.  (In some cases, RUM measurements have been known to add latency to the page load.)  Typical RUM data focuses on the page load which allows the front-end developer to inspect each page component and optimize page delivery at the browser.  RUM page load data is a great way to determine whether you’re doing what YSlow says you should be doing such as minimizing requests, optimizing CSS and JS, image & cookie usage, etc.  But a large piece of the problem (the public network between the browser) and your service is not actively monitored.

Cedexis Radar provides actionable network insights into real user experience.  In addition to the page load, a remote probe measures the networks used to deliver the content to the user.  This experience is measured from the browser (passively) to inform real business decisions such as where to intelligently route user traffic in real-time across the entire world of public networks.  We consider this a real user experience expressed in terms of the network latency, not page load, because this data is actionable.   Once content is loaded into the user’s browser, your code behaves as the browser chooses to let it behave– from AJAX to HTML5.  A remote probe, however, detects a network anomaly (slow or unreliable public cloud/CDN) and intelligently decides on the site owner’s behalf to improve the user experience.  Voila! Dynamic user experience improvement at your fingertips.

The public network is no longer out of your control.  Imagine a world where the engineering discussion of “front-end and back-end” suddenly transformed into “front-end, network, and back-end.”  With a comprehensive view from a community of remote probes, you can make an informed decision about a particular cloud of delivery platform.  The historical network probe data is useful in planning decisions to choose specific platforms in specific regions.  In real-time, the network probe data can be used to make decisions to route traffic according to your business rules.

We can not only tell you how fast you are, but how fast you *could be* if you used other CDN or Cloud Providers.

Know the improvement to your user (and your business) before you make any changes.  We provide data on page load times, but we also specialize in giving you meaningful network latency measurements.  Then we go one better and give you a comparison, from the perspective of your own web visitors, of how your existing content delivery or hybrid cloud strategy would look if you used different vendors or data centers.

So here’s an example of actionable network latency data.  Remember that slow-loading image you saw from Chrome?  It turns out users in some part of the world see an even poorer experience and others see a completely different experience even when using a CDN.  But I thought one of the most basic rules in web performance was “use a CDN”.  It isn’t.  It’s “use multiple CDNs (and clouds).”  Over the course of 2011, here’s what a sample of our Response Time data showing Rackspace, EC2, and Azure as the Top 3 clouds.  The height of the bar shows the actual value of measured Response Time (shorter the better) and the shade of bar’s color (darker has more variance) shows the standard deviation of the measured Response Time.

So what’s the take-away here?  Measure network performance from the perspective of real web visitors, not the network backbone or data centers from which your real users do not visit your site.   Different cloud platforms yield varying performance characteristics, so hedge your bets with multiple providers.  Finally, keep an eye on Cedexis, since we’ll be keeping an eye on the cloud for you!

 

 

Which cloud is the right cloud? It depends.

This year has been a rough year for the cloud. We’ve seen the biggest names including Amazon AWS, Google AppEngine and Microsoft Azure demonstrate the challenges in delivering on the promise of “always on” utility computing.  Imagine the world’s brightest engineers with massive budgets empowered to capture a piece of the $148 billion cloud market.  It’s definitely not for lack of effort and resources.  Nonetheless, we expect more.

In their defense, it’s not a trivial problem to run huge data centers worldwide running cloud services at the scale demanded by cloud customers worldwide.  What’s a CIO to do in evaluating her move to the cloud?  Should she wait while her competitors become more agile and deliver greater value to the business?  Any executive worth her Harvard MBA would be looking for data to increase clarity and confidence in moving forward.  Enter the Cloud Availability Grid (activate vertical scrolling muscles now).

 

We calculated the monthly availability year-to-date from our worldwide Radar measurements.  On a daily basis, there are some 800 million data points being collected from real users across 32,000 networks on a continuous basis. The “fail” threshold for falling below the acceptable green range is 97%. Here are a few “a-ha”s that we quickly noticed (I’m sure you’ll find more):

BRIC markets are struggling for good cloud coverage.  We recently detected that Amazon CloudFront has likely launched in Brazil.  But if you’re delivering a cloud-based service to Brazil, Russia, India or China, buyer beware.  The number of fails your users will experience will be noticeably more than they’re used to in other markets such the USA, Great Britain, and Japan.

Australia, but not all of Oceania, is well served by specific clouds. Amazon EC2 California, Rackspace, and GoGrid appear to provide consistently high availability to real users in the land down under.  However, other markets nearby such as Indonesia and Philippines do not enjoy nearly the same level of relatively higher availability.

Change is the only constant.  Interdependencies across optimizers, CDNs, Clouds, and Data Centers necessarily complicate the abilities to manage changes to guarantee a quality end-user experience.  Some markets in the Availability Grid show widely varying availability month-to-month even within the same cloud provider.

Cloud computing is evolving and new services are being released at an amazing clip. It’s impossible to expect any cloud to be the best everywhere at all times. The challenge is that your customers expect nothing but the best quality of service at all times wherever they are from their device of choice. With the right data in hand, choose the right cloud for the job and realize the benefits of the cloud before your competitor, and more importantly, before your users choose another service.  To a faster and more reliable cloud!

Is Amazon CloudFront in Brazil?

Here at Cedexis, we like to think we have a pulse on the internet.  Today we found something very juicy that got us very excited, so we had to share it.  In a nutshell, it looks like Amazon’s CloudFront is now in Brazil.  Across the 9 in-country networks where we collect Radar measurements, we see a 15.2% improvement in Http Connect Time and 12.8% improvement in Http Response Time.  So if you’re doing business online in Brazil and you’re using Amazon CloudFront, the internet just got faster for your users.  That’s a great thing!

The Pretty Graphs

Here’s what connect time looked like over the past month.  Notice the sharp dip at the end!

Here’s what response time looked like over the past month.  Isn’t that dip the prettiest thing you’ve ever seen?

The Details

  • In-country Networks – we collect data from 32,000 networks worldwide from real users in-country and use that data to drive decisions in our global load balancing platform.
  • HTTP Connect Time— how long it takes for the browser to establish a connection with the server, including DNS time and the establishment of the initial TCP and HTTP sessions to the provider.
  • HTTP Response Time— how long it takes for the server to respond to a subsequent request, once all of the noise of establishing a connection is completed. This is a relatively close approximation of TCP round-trip time (RTT) from the browser to the provider.

 

This just might have me singing Rio! all day long today.

Need to know what your real users are experiencing anywhere in the world?  Need to improve your user experience and reduce IT spend across clouds and delivery networks?  Contact us today.

 

 


GeoLocation: Does Distance Matter?

Here at Cedexis,  we receive more than 800 million measurements a day via the Cedexis Radar tag from browsers located all over the world. These data can answer all sorts of questions, including – “how effective is geo-location as part of a global performance strategy?” If we read the various white papers it would appear that geo-location is, in fact, an excellent way to ensure optimal page load speeds. For example, here’s an excerpt from a major, hardware based load balancer:

"(it is) necessary to determine the client’s location with as much accuracy as possible in order to intelligently route application requests to the nearest data center for optimal user performance..."

Nearly all of the major Content Delivery Networks (CDNs) like Contendo or Akamai market geo-location as one of the  strategies they employ to optimize content delivery. And why not? Surely the physically closest location is going to be the best place to direct your web visitors.

Let’s drill into our data and see if this theory holds up by asking the question: “Is geo-location the best strategy for routing users based in Australia to one of Amazon AWS’ 5 data centers?”

Amazon’s public cloud compute product, EC2, is available in 5 different data centers world wide (Singapore, Tokyo, California, Virginia and Ireland). Moreover, Amazon provides tools to simplify building applications which are “active-active” across multiple locations. (Meaning visitors should receive the same content if they interact with an application hosted in Ireland or in Tokyo). As a result, many companies are designing their web presence to be available in multiple places around the globe and they utlize geo-location as the principal strategy for working out to which Amazon location should a specific user request be routed. In our example, a geo-location strategy would route Australian users to Singapore, as Amazon’s Singapore data center is the closest location from a physical distance perspective. Surely, in most cases, Singapore will deliver a faster response time versus any of Amazon’s other locations from Australia.

To find out, we looked at HTTP Response time measurements, in milliseconds, across all 5 Amazon locations from 13:00 – 14:00 local time in Sydney, on Monday September 19th. Each minute we average the measurements from real-user browsers based in Australia to each Amazon location:

date-time Singapore

Avg Response

Tokyo

Avg Response

California

Avg Response

Virginia

Avg Response

Ireland

Avg Response

9/19/2011 13:00 284 311 320 351 430
9/19/2011 13:01 276 216 432 314 443
9/19/2011 13:02 411 448 372 501 523

 

As you can see, looking at the first three minutes worth of data Singapore is the fastest only 1 in 3 times. Looking at all 60 minutes, unsurprisingly, EC2 Singapore is the fastest most often, what is surprising is that 59% of the time, other Amazon locations were faster.

These data suggest that using a pure geo-location strategy may result in sub-optimal performance more than half the time. What if there were a way to make these data actionable when routing users to your content? There is, to learn more, go to http://cedexis.wpengine.com/i/lose-the-wait/

Types of Openmix Applications

Last week we took a look at the types of Openmix applications that are currently in use to inform some development priorities. It turns out that the vast majority of the applications, at 82%, are coded to return the platform with the best response time.

Openmix Application Types

Openmix Application Types

When we drilled into these response time-optimized applications we saw that over half of them, at 64%, perform no adjustments based on cost, time, or geography. However, nearly a third of the applications do some type of adjustment based on these factors.

Response Time Sub-types

Response Time Sub-types

It is great to see a clear pattern of Openmix application types emerging.

Term Definitions – Openmix Application Types

Availability – Choose a platform based on availability. In some cases a primary platform is used unless it is not available.
Best Response Time – Optimize for response time, sometimes taking other factors into consideration.
Complex – Complex logic based on multiple criteria.
Geographic – Direct traffic based on user’s location.
Static –
Route to the same platform always.
Weighted Round Robin – Direct traffic across a set of platforms in some proportion per platform.

Term Definitions – Response Time Sub-types

Cost-weighted – Adjust the response time by some amount proportional to platform cost.
Cost-handicapped – Handicap the response time based on cost. (Handicapping is adding some fixed amount to values whereas weighting is multiplying by some factor.)
Geo-adjusted – Adjust the response time by geography.
Standard – No adjustment for cost, geography, or time.
Static – Coded to optimize response time but only returning a single platform since only one platform is available.
Time-adjusted – Alter the platforms to return based on the time of day.

 

Akamai outage

based on 4m measurements

Lots of inquiries today from customers, partners, and analysts looking for quantitative data to support rumors that Akamai was experiencing availability problems this afternoon.