Introduction
We learned about databases in the previous chapter, so now it is time to combine the knowledge of data wrangling and Python with a real-world scenario. In the real world, data from one source is often inadequate to perform analysis. Generally, a data wrangler has to distinguish between relevant and non-relevant data and combine data from different sources.
The primary job of a data wrangling expert is to pull data from multiple sources, format and clean it (impute the data if it is missing), and finally combine it in a coherent manner to prepare a dataset for further analysis by data scientists or machine learning engineers.
In this topic, we will try to mimic such a typical task flow by downloading and using two different datasets from reputed web portals. Each of the datasets contains partial data pertaining to the key question that is being asked. Let's examine it more closely.