Setting up an entity set and creating features automatically
Relational datasets or databases contain data spread across multiple tables, and the relationships between tables are dictated by a unique identifier that tells us how we can join those tables. To automate feature creation with featuretools
, we first need to enter the different data tables and establish their relationships within what is called an entity set. The entity set then informs featuretools
how these tables are connected so that the library can automatically create features based on those relationships.
We will work with a dataset containing information about customers, invoices, and products. First, we will set up an entity set highlighting the relationships between these three items. This entity set will be the starting point for the remaining recipes in this chapter. Next, we will create features automatically by aggregating the data at the customer, invoice, and product levels, utilizing the default parameters...