Feature Engineering for Regression
Raw data is a term that is used to refer to the data as you obtain it from the source – without any manipulation from your side. Rarely, a raw dataset can directly be employed for a modeling activity. Often, you perform multiple manipulations on data and the act of doing so is termed feature engineering. In simple terms, feature engineering is the process of taking data and transforming it into features for use in predictions. There can be multiple motivations for feature engineering:
- Creating features that capture aspects of what is important to the outcome of interest (for example, creating an average order value, which could be more useful for predicting revenue from a customer, instead of using the number of orders and total revenue)
- Using your domain understanding (for example, flagging certain high-value indicators for predicting revenue from a customer)
- Aggregating variables to the required level (for example, creating customer...