It is Chinese new year on the 8th of February! This year is the year of the Fire Monkey. The last time that happened was 1956. The fire monkey is the most aggressive monkey sign, and if this is your sign you are probably one who takes matters into your own hands! We at Cedexis tip our hats to the fire monkeys of the world!
To celebrate this momentous occasion let’s take a look at how some of the clouds that are being deployed in China are performing both inside the country and outside. For this experiment we selected 4 Cloud instances – 2 inside mainland China and 2 outside – (one in Hong Kong and one in Singapore).
The 4 Clouds that we are evaluating (using the free Cedexis Radar RUM community data) are:
Rackspace Cloud – HKG
Cloud Santé – Netplus – CN
Ecritel E2C – Shanghai
AWS EC2 – APAC Singapore
So how would the average Chinese consumer perceive performance to these 4 clouds? To answer this question let’s look at the 3 most heavily trafficked networks within China – they are:
Chinanet (AS 4134 – with roughly 44% of the traffic on this day)
CNCGroup China (AS 4837 – with roughly 25% of the traffic on this day)
China Telecom Group (AS 4812 – with roughly 8% of the traffic this day)
Note this is RUM data collected solely from within China. The RUM data we show here is for 1 month. This represented around 7 million measurements from each of the 3 networks to each of the 4 Clouds – or to net it out – about 11 measurements per second per network/Cloud combination.
So let us take a look at the 4 clouds from the average (50th Percentile) user on these networks. Note that this is latency – so lower numbers are better! The numbers represent the number of Milliseconds that it takes to go round trip from the users computer to the Cloud in question. We sometimes call this Round Trip Time or RTT.
Some very interesting results here. Note that one of the Clouds “external” to China (AWS Singapore), has higher latency from all three networks within China, while the other Cloud external to China (Rackspace HK) has performance metrics closer to the Clouds in mainland China – but still not great. Lets look at this same data a slightly different way – as a heat map:
This view makes it very clear – the ISP CNCGroup China performs the worse to all of the Clouds that we are measuring. With 25% of the China traffic this is a significant issue for someone within China on this network.
But we know that Latency is not the only story.
We also know that Availability is extremely important. Let’s see how these Cloud/ISP combinations faired with regard to Availability! Availability is represented as a percentage of connection attempts and higher is better.
Note that unlike some of the other studies we have done on Availability, in China the highest Availability ISP/Cloud pair (Cloud Sante/CNCGroup China) is only 98.3% available. Note also that CNCGroup China also has the worst availability (with AWS) at 97% available for the measurement timeframe. That means they were dropping 3 out of every 100 connection attempts for the entire month!
Let’s take a look at this same data in the barchart format.
More on Availability
The above Availability numbers are important to understand averages – but to really understand availability you have to look at it over time. Within the Cedexis Portal this is easily done. If you have not created a free Radar account do so here. The report below is directly from that interface.
The plunging blue line is a failing Cloud – it happens to be one of the two clouds within China! What you are seeing is this clouds availability drop dramatically for around 2 days (of the 30). This is referred to as a MicroOutage
As you can see – I have blotted out the names of the Clouds in this case, as we are not looking to embarrass our Cloud partners.
But it’s important to note that Clouds DO have these microOutages. This one lasted almost the entirety of 2 days – with Availability dropping to 85% from every network within China. And the unfortunate thing is – this is entirely normal
Any application that is singly homed on any one Cloud has a high likelihood of user abandonment. This is what the evidence shows us. Page Load times over 3 seconds increase abandonment rates DRAMATICALLY.
Talk about turning your customers into Fire Monkeys! They will be firing you!.
The smart application architect or Devops professional will avoid this fate. They will multi-home their web app on multiple regional instances and load balance traffic between them. The smarter DevOps professional will load balance their traffic using performance based measurements to determine where the traffic goes. As you can see – sometimes clouds in a region go down – and when they do you need a performance based backup.
To learn more about Performance based Traffic Management go check out how we do it here at Cedexis. And Happy New Year to all you Fire Monkeys!