You can see that there are various ways of extracting data from HTML pages using different tools. Basic string searches and regex searches can collect information using very simple techniques, but there are cases where more structured query languages are needed. XPath provides great searching capabilities by assuming the document is XML-formatted and can cover generic searches. CSS selectors are the simplest way to search for and extract data from HTML documents and provide many helpful features that are HTML-specific.
In Chapter 5, Web Scraping Navigation, we will look at the best ways to crawl the internet efficiently and safely.