Joining datasets
Datasets can come from different sources or different tables within the same database or data lake. Many times, those tables are related to each other by key columns, which means that you will be able to find a certain column A in table 1 and a column A in table 2 that hold similar information so they can be related to each other using that common key element.
To better explain the join concept, imagine we are engineers from a retail company. Our goal is to store data about transactions from each store, including date, product, descriptions, quantity, and amount. Well, we can put everything in the same table, resulting in a big heavy file that the database will have to deal with every time we want to query some information. Think about that for a moment: it won’t be every time that we will need to pull the product description, or store address, for example. Consequently, the optimal solution for that problem is splitting that information into smaller tables...