Geography-based analysis of the Internet infrastructure
Abstract
In this paper, we study some geographic aspects of the Internet. We base our analysis on a large set of geolocated IP hop-level session data (including about 300, 000 backbone routers, 130 million end hosts, and one billion sessions) that we synthesized from a variety of different input sources such as US census data, computer usage statistics, Internet market share data, IP geolocation data sets, CAIDA's Skitter data set for backbone connectivity, and BGP routing tables. We use this model to perform a nationwide and statewide geographic analysis of the Internet. Our main observations are: (1) There is a dominant coast-to-coast pattern in the US Internet traffic. In fact, in many instances even if the end-devices are not near either coast, still the traffic between them takes a long detour through the coasts. (2) More than half of the Internet paths are inflated by 100% or more compared to their corresponding geometric straight-line distance. This circuitousness makes the average ratio between the routing distance and geometric distance big (around 10). (3) The weighted mean hop count is around 5, but the hop counts are very loosely correlated with the distances. The weighted mean AS count (number of ASes traversed) is around 3. © 2011 IEEE.