You often need to combine data from multiple sources. For example, you might have customer information such as personal characteristics and purchase history in a customer database. Then, you learn that the marketing department has conducted an attitudinal survey on a subset of your customers, giving rise to new measures on some of your customers. Combining these two data sources enables you to analyze all of the variables together, which can lead to new insights and better predictions of customer behavior.
The preceding scenario describing customer data and survey data is an example of relational data, which means that there are relationship between pairs of datasets. In these datasets, there exist one or more variables called keys that are used to connect each pair of datasets. A key is a variable (or variables) that uniquely identifies an observation...