Merging/joining datasets
Merging or joining is a mission critical step for predictive modelling and, more often than not, while working on actual problems, an analyst will be required to do it. The readers who are familiar with relational databases know how there are multiple tables connected by a common key column across which the required columns are scattered. There can be instances where two tables are joined by more than one key column. The merges and joins in Python are very similar to a table merge/join in a relational database except that it doesn't happen in a database but rather on the local computer and that these are not tables, rather data frames in pandas. For people familiar with Excel, you can find similarity with the VLOOKUP
function in the sense that both are used to get an extra column of information from a sheet/table joined by a key column.
There are various ways in which two tables/data frames can be merged/joined. The most commonly used ones are Inner Join, Left Join...