Extracting Features from Relational Data with Featuretools
In previous chapters, we worked with data organized in rows and columns, where the columns are the variables, the rows are the observations, and each observation is independent. In this chapter, we will focus on creating features from relational datasets. In relational datasets, data is structured across various tables, which can be joined together via unique identifiers. These unique identifiers indicate the relationships that exist between the different tables.
A classic example of relational data is that held by retail companies. One table can contain information about customers, such as names and addresses. A second table can contain information about the purchases made by the customers, such as the type and number of items bought per purchase. A third table can contain information about the customers’ interactions with the company’s website, with variables such as session duration, the mobile device used...