When we want to extract the content of a web page by automating the extraction of information, we often find that the website does not offer any API to obtain the data you need and it is necessary to resort to scraping techniques to recover data automatically. Some of the most powerful tools can be found in Python 3.7, among which we shall highlight BeautifulSoup and Scrapy.
Scrapy is a framework written in Python for the extraction of data in an automated way that can be used for a wide range of applications, such as the processing of data mining.
The following topics will be covered in this chapter:
- Introduction to web scraping
- Extracting information from web pages and parsing HTML with BeautifulSoup
- Introduction to Scrapy components and architecture
- Scrapy as a framework for performing web crawling processes and data analysis
- Working...