Self-similarity in the web
Abstract
Algorithmic tools for searching and mining the web are becoming increasingly sophisticated and vital. In this context, algorithms which use and exploit structural information about the web perform better than generic methods in both efficiency and reliability. We present an extensive characterization of the graph structure of the web, with a view to enabling high-performance applications that make use of this structure. In particular, we show that the web emerges as the outcome of a number of essentially independent stochastic processes that evolve at various scales. A striking consequence of this scale invariance is that the structure of the web is "fractal" - cohesive sub-regions display the same characteristics as the web at large. An understanding of this underlying fractal nature is therefore applicable to designing data services across multiple domains and scales. We describe potential applications of this line of research to optimized algorithm design for web-scale data analysis.