In this section, we will learn about how we can extract the content of a web page by automating the extraction of information.
Introduction to web scraping
Web content extraction
Among the techniques available to extract content from the web, we can highlight the following:
- Screen scraping: A technique that allows you to obtain information by moving around the screen, registering user pulsations.
- Web scraping: The aim is to obtain the information of a resource, such as a web page in HTML, and process that information to extract relevant data.
- Report mining: A technique that also tries to obtain information, but in this case from a file (HTML, RDF, CSV, and so on). So, with this approach, we can create a simple and fast mechanism...