Web scraping is a procedure for extracting data from web documents. For data collection or extracting data from web documents, identifying and traversing through elements (of HTML, XML) is the basic requirement. Web documents are built with various types of elements that can exist either individually or nested together.
Parsing is an activity of breaking down, exposing, or identifying the components with contents from any given web content. Such activity enhances features such as searching and collecting content from the desired element or elements. Web documents obtained, parsed, and traversed through looking for required data or content is the basic scraping task.
In Chapter 3, Using LXML, XPath, and CSS Selectors, we explored lxml for a similar task and used XPath and CSS Selectors for data-extraction purposes. lxml is also used for scraping...