The Basics of Web Scraping and the Beautiful Soup Library
In today's connected world, one of the most valued and widely used skill for a data wrangling professional is the ability to extract and read data from web pages and databases hosted on the web. Most organizations host data on the cloud (public or private), and the majority of web microservices these days provide some kind of API for the external users to access data:
It is necessary that, as a data wrangling engineer, you know about the structure of web pages and Python libraries so that you are able to extract data from a web page. The World Wide Web is an ever-growing, ever-changing universe, in which different data exchange protocols and formats are used. A few of these are widely used and have become standard.
Libraries in Python
Python comes equipped with built-in modules, such as urllib 3, which that can place HTTP requests over the internet and receive data from the...