What happens when the internet grows 100x?
How edge networks will change the landscape … again
Today, approximately 51% of the world’s population is connected to the internet. Said another way, if every existing internet user doesn’t connect a single additional device (mobile phone, tablet, SmartTV, thermostat etc) the internet will likely still double in size by 2022. But… the rise of the connected everything world means there is a real possibility we will move from 10’s of billions of devices to trillions of devices within the next 5 years. Andreessen Horowitz general partner Peter Levine has looked at this recently in his “Return to the Edge and the End of Cloud Computing” presentation. His conclusion was simple: in order to support the next stage of growth the model for the internet has to change to a edge enabled, peer-to-peer delivery model.
But before we get to the future, let’s take a few moments to review how the internet works today and why we’re already experiencing growing pains.
The Internet backbone is made up of many large networks which interconnect with each other. At the top, the largest providers are called Tier 1 Networks (also Network Service Providers or NSPs). These are the largest of the internet backbone providers (telecom companies). These networks peer (directly connect) with each other to exchange traffic and enable cross network connectivity (so a user in Brazil can reach a site hosted in Seattle for example).
These large providers also connect to multiple Internet Exchange Points (IXPs) where they can interconnect with a larger customer base — including smaller networks, such as regional (Tier 2) and metropolitan (Tier 3) internet service providers.
Below is a picture showing this hierarchical infrastructure.
So , in order to reach your favorite web site your computer travels from your home network up to a local then a regional internet service provider , where it meets at a larger backbone , arrives at an internet exchange point , crosses a private connection (peer) and finally reaches the web server that houses your website data (hopefully in that same location/datacenter).
The web server then calculates the required content you’ve requested and the data travels back to you the same way it came.
If that sounds like it takes a long time… well it does. But traffic moves along fiber optic cables (light inside glass) and light is pretty fast so the whole one way trip takes ~50 milliseconds (1/20th of a second). To help alleviate some of the problem, large Content Distribution Networks place caches (e.g. copies of the data you request) all over the globe in as many metropolitan area networks as economically feasible.
There a a few things worth calling out here:
- Smaller networks at the edge connect to many users
- Smaller networks consolidate into larger networks
- Money flows up stream : e.g. smaller networks pay larger networks for capacity
- Content is ‘originated’ on larger networks and consumed on smaller networks
When the internet breaks : congestion
The current delivery model depends on the movement of traffic into larger and larger centralized backbones — similar to onramps to a highway. But as anyone who has spent more than 5 minutes in Los Angeles can tell you that model breaks down pretty quickly when everyone wants to goto the same place at the same time. Even if you only need to go one block, if it’s rush hour it’s going to take a while.
When too much demand is placed on a single backbone choke point, that is called contention. Luckily there is a traffic cop built into the design of the network (the Transmission Control Protocol) so that when the internet gets jammed up , the client request (e.g. your browser) is told to stop and ask for the content again later (whats called a Retransmission request). I’ve said for many years that Retransmission Requests are the silent killer of performance. When a 'retrans' occurs, the client will back off (wait) and then retry. If the 2nd attempt also hits a Retransmission , then the client will wait exponentially longer (exponential back off).
When the internet breaks: chokepoints and correlated risk
Now that we have a good feeling of the data flow, let’s take a look at ‘the internet’. Durairajan, Barford and Willinger ‘s 2015 paper “InterTubes: A Study of the US Long-haul Fiber-optic Infrastructure” provides an exhaustive survey of the US internet backbone, including the map of the US backbone (shown below).
The primary aim of the study was to analyze if there were more effective routes for internet communication, and to what extent the US backbone was prone to failure are key chokepoints.
What they discovered was that despite multiple paths — there was an extremely high concentration of the internet along a few major routing points. Specifically, more than 89% of routing points serve as major choke points for at least 2 or more ISPs, and half of all routing points are chokepoints for 4 or more ISPs. There are also major points that effect entire backbone provider segments.
From the paper:
For example, our physical map establishes that the conduit between Portland, OR and Seattle, WA is shared by 18 ISPs. Upon analysis of the traceroute data, we inferred the presence of an additional 13 ISPs that also share that conduit.
Section. 4.3 of http://pages.cs.wisc.edu/~pb/tubes_final.pdf
So maybe Senator Ted Stevens wasn’t all that far off when he famously said “the internet is a series of tubes” in 2006.
Does the internet break today?
The short answer is yes, all the time. Luckily the built in mechanisms of the net allow for graceful failover, retransmission and congestion avoidance so we most commonly experience these issues as ‘slow pages’. Although, sometimes we see a complete drop of availability. To a large extent, Content Delivery networks provide some level of resiliency, as they bring the data closer to the end user. But their design is fundamentally limited to the economic reality that they cannot place server caches everywhere. As a result, even CDN providers experience frequent and highly correlated failure. Show below is a 30 day sample of CDN availability in the easter United States from Cedexis. We can see the drops in availability amongst all providers, and these drops tend to correlate with increased user demand (e.g. EXACTLy when you won’’t want a failure 😞 ).
The above graph shows complete drops in availability, but we can also look at latency (the time required to access content) to get a glimpse into the effect of retransmissions and congestions on the same time period.
Finally let’s look at how often these congestion and outage events occur and what their effects are.
For a truly terrifying glimpse into the internet, checkout the Cedexis live portal available here: https://live.cedexis.com/ .
It’s amazing it works at all!
Yes, yes it is! Of course, the internet is still growing — and the near term future of 10’s of billions to 100’s of billions to trillions of devices seems like a recipe for disaster.
Bandwidth (or the total capacity of a channel) is one thing but connections (the number of open sockets a device can support) is an entirely different problem. For example, it is much simpler to design a server that can stream 1GB of data to 1 device than it is to build a server that steam 1KB of data to 1,000,000 devices. So in the (incredibly) unlikely event that our infrastructure upgrades stay ahead of the future, it’s the connection limit that will kill you. In many ways , this is akin to the problem of handing 1 person a 1 pound box, vs handing 1 million people a feather.
For an exhaustive review of this, please see “from C[onnection] 10k to C[onnection] 100k”.
The new (old) model: Peer to Peer
Peer-to-peer (P2P) delivery models date back to the metazoic period of the internet (1999) with the introduction of Napster. Unfortunately, do their torrid history they often get a bad rap. But P2P as a design was always viewed as more scalable option for the internet.
In a P2P network, edge devices (peers) communicate directly with each other — thereby avoiding the centralized congestion and bottlenecks of the traditional north south design.
Unfortunately for the internet, implementing large scale Peer to Peer networks has been difficult — although there are many commercial examples (Spotify, Microsoft). It wasn’t really until 2011 that peer to peer as a Web first class citizen came into existence with WebRTC (Web Real Time Communications). Lead by Google, WebRTC was viewed as a new an scalable protocol for Video, Audio and other data transmission (think Skype, Whatsapp, WeChat etc.).
In order to facilitate this, Edgemesh utilizes the WebRTC functionality built into the modern browser to create a fast path between devices. This takes strain off the internet backbone and dramatically reduces latency. How much? Here’s an example:
Best of all, Edgemesh enabled sites have reverse scaling, meaning the more devices that come online … the faster and more resilient the overall network becomes!
PS: We’ve detailed the overall design over at the Association of Computing Machinery
Until Next time!
Like it, Hate it? Let us know @Edgemesh