Self-Sourced Data Won’t Get You Where You Want to Go

Crowdsourced intelligence is a powerful concept, brought to reality and fueled by the Internet and mobile devices. Without the communications technology and vast but addressable audience of smartphone addicts, marketing projects dependent on gathering data or pooling microtasks from an undefined, unrelated audience would remain mostly theoretical. Of course, companies outsourced tasks and polled customers for data before crowdsourcing came along, but soliciting a thousand survey responses takes a surprising amount of work – and, importantly, time –  without the mobile Internet. Massive data projects like the Census or NIH-funded longitudinal studies were carried out long before the Internet, but required complex coordination, significant resources, and a long timeline. But ever since the term “crowdsource” was coined more than a decade ago, the rapid, large-scale assembling of ideas, opinions, funds, collaborators, computing power and labor online has become commonplace. In its myriad, still evolving forms, crowdsourcing has proven to be one of the most groundbreaking tools of the mobile era.

Crowdsourcing technology has moved well beyond market research and fundraising. One of the most popular examples is Waze, where driver experience reports, construction plans, emergency dispatches, and traffic metrics are combined with algorithms to produce real-time guidance for drivers. You may drive around the Bay Area a lot and have a strong sense of traffic patterns based on experience. But what happens if there’s an unexpected event— let’­s say a tornado in Palo Alto—or you have two hours to get to a meeting near Sacramento, where you’ve never driven before. Do you want to rely on your individual experience, or do you want to consult the Waze app and figure out how to avoid gridlock? (Bonus points for having enough time for a side trip to Starbucks.)

Amazingly, we’re now able to apply the power of crowdsourcing to networks, endpoints, and bytes just as we to people and cars.

In the Zettabyte era, community-powered intelligence is a fundamental need if we’re really going to move 20,000 GBps of global Internet traffic without constant outages, slowdowns, and the inefficient yet miraculous workarounds we too often demand from Ops teams — while also controlling costs. Unless you are in the very highest echelon of traffic delivery, your service simply won’t create enough data to truly map the Internet. Heck, even if you got 100 million hits a day, the chances are good they’d all be clustered in a handful of major regions. If you expand into a new market, or your app suddenly becomes popular in Saskatchewan, or for any number of reasons traffic from a previously quiet region surges suddenly (consider the stochastic outcomes of natural disasters, widespread cyber attacks, and political and celebrity happenings), you won’t have enough visibility or intelligence on hand to manage your traffic in that particular –  suddenly vital! –  corner of the Internet. You need to crowdsource the data that will give you the comprehensive view of the Internet you need to avoid outages, ensure high quality of experience (QoE) for users, and make efficient use of your resources.

Let’ use Cedexis’ real user monitoring community Radar to illustrate the challenge you’re up against. Our data sets are based on Real User Monitoring (RUM) of Internet performance at every step between the client and the clouds, data centers, or CDNs hosting your application. To source this data, we have created the world’s largest user experience community, in which publishers large and small deposit network observations from their own service, then share the aggregate total, so they can benefit from one another’s presence.

Banding these services together makes it possible to see essentially all the networks in the world each day. We’re talking about all the “little Internets ”Radar collects data from more than 50,000 of these networks daily.  More than 130 major service providers feed metrics into the system each day. All told, hundreds of millions of clients generate over 14 billion RUM data points every day. That’s quite a crowd – and one that basically no service could pull together all on its own.

Community doesn’t just give you breadth of course, but also depth: we get at least 10,000 individual data points each day from over 13,000 of those ASNs. You simply can’t glean that kind of traffic intelligence from your own system. Do the math: you’ll see that even if you have 100 million visitors each day, likely less than half are coming from outside the major pockets (concentrated ASNs), leaving you very few data points spread across the thousands of remaining ASNs.  So when the first visitor of the day turns up from Des Moines, Iowa, or Ulan Bator, or Paris, France, you’d have no data handy to make intelligent traffic decisions.

Everyone needs community intelligence. Not just for the newest, least understood pockets of users on the edges of your network. Crowdsourced community intelligence from the big, messy pools of so provides the early warning system every Operations team needs to keep the wheels turning, and the user using.

Many countries have thousands of ISPs. In Brazil, the Radar community supplies 10,000 daily data points from 1,595 different ASNs. Russia, Canada, Australia, and Argentina also have enormous diversity of ISPs, especially in relation to their populations. These locations are likely to be central to your business success and global content delivery needs. Having user experience data of this breadth and depth is particularly important where there are so many individual peering agreements and technical relationships, representing countless causes for variable performance. When there are countless ways for application delivery to go wrong, you need granular data to feed into intelligent delivery algorithms to ensure that, in the end, everything goes right.

When you’re trying to manage application and content delivery globally, you need visibility into thousands of ASNs on a continuous basis. Unless you’re Google, you’re not going to touch most of these networks at the depth you need. So unless you have a really cool crystal ball, you have no idea how they are performing. You’d have to conduct an enormous amount of traffic to gain a comprehensive, real-time view on your own.

Instead, rely on community-based telemetry to produce the actionable intelligence and predictive power you need to serve your end-users, no matter where they pop up.

With crowdsourced data sets from the global Internet community, you already have instant access to intelligence about QoE, bottlenecks, and slowdowns — before you even know you need it.