In this chapter, we will be addressing the problem of building a web crawler. Web crawlers are important, for example, in the domain of indexing and searching the web.
Problem statement
The graph structure of the web
All the websites can be imagined as a graph of pages. Every page contains some HTML markup and content. As part of this content, most web pages contain links to other pages. Since links are supposed to take you from one page to another, we can visualize the web as a graph. We can visualize links as edges that take you from one node to another node.
Given such a model for the entire internet, it's possible to address the problem of searching for information over the web:
data:image/s3,"s3://crabby-images/eec45/eec4549d862476b78370156646baa2d54c9bf217" alt=""
We are talking about a problem that...