Introduction
In the previous chapter, we learned about databases. It is time to combine our knowledge of data wrangling and Python with a realistic scenario. Usually, data from one source is often inadequate to perform analysis. Generally, a data wrangler has to distinguish between relevant and non-relevant data and combine data from different sources.
The primary job of a data wrangling expert is to pull data from multiple sources, format and clean it (impute the data if it is missing), and finally combine it in a coherent manner to prepare a dataset for further analysis by data scientists or machine learning engineers.
In this chapter, we will try to mimic a typical task flow by downloading and using two different datasets from reputed web portals. Each dataset contains partial data pertaining to the key question that is being asked. Let's examine this more closely.