As a simplified representation of reality, a model also includes a set of variables that contain the relevant information that describes the different parts of the problem we are representing. These variables can be something as concrete as 1 kg of ice cream, as we saw in our previous example, or as abstract as a numerical value that represents how similar the meaning is of two words in a text document.
In the particular case of a machine learning model, these variables are called features. Choosing significant features that provide relevant information about the phenomenon that we try to explain or predict is of paramount importance. If we consider unsupervised learning, then the relevant features are those that better represent the clustering or association of information in the dataset. For supervised learning, the most important features are those...