Extracting Features from Relational Data with Featuretools
In previous chapters, we worked with data organized in rows and columns, where the columns are the variables, the rows are the observations, and each observation is independent. In this chapter, we will focus on creating features from relational datasets. In relational datasets, data is structured across various tables, which can be joined together via unique identifiers. These unique identifiers indicate relationships that exist between the different tables.
A classic example of relational data is that held by retail companies. One table contains information about customers, such as names and addresses. A second table has information about the purchases made by the customers, such as the type and number of items bought per purchase. A third table contains information about the customers’ interactions with the company’s website, variables such as session duration, the mobile device used, and pages visited....