Cleaning up and combining the data
Before merging all the data into one dataset, some cleanup needs to take place to ensure the merge is viable. In order to merge datasets, there needs to be a column or multiple columns that match in both data frames. In this case, the merge will occur on the year and the country name columns.
The year columns have no variations, but the name of the country columns may differ slightly in spelling or if abbreviations are used. The data frame from the World Bank data, df_wb
, only has an abbreviation to represent countries and contains data for regions, in addition to country names. The actual country names will need to be added and the rows containing region data will need to be removed.
Luckily, the world_bank_data
API has a readily available dataset containing all the IDs, country names, and information about region data; specifically, the column 'region'
, which specifies whether an entry is a combination of countries.
-
...