Given the nature of hyperlink pages, starting from a known place and following links to other pages is a very important tool in the arsenal when scraping the web.
To do so, we crawl a page looking for a small phrase, and will print any paragraph that contains it. We will search only in pages that belong to the same site. I.e. only URLs starting with www.somesite.com. We won't follow links to external sites.