Once we are done with the cleaning of our data, we are ready to get down to the business of extracting actual features from the data, with which our machine learning model can be trained.
Features refer to the variables that we use to train our model. Each row of data contains information that we would like to extract into a training example.
Almost all machine learning models ultimately work on numerical representations in the form of a vector; hence, we need to convert raw data into numbers.
Features broadly fall into a few categories, which are as follows:
- Numerical features: These features are typically real or integer numbers, for example, the user age that we used in an example earlier.
- Categorical features: These features refer to variables that can take one of a set of possible states at any given time. Examples from our dataset might include a user's gender...