Machine learning models are based on matrix calculations: our observations are organised into rows in a table, while the features are columns or vectors. Representing complex objects such as text or graphs as matrices of a reasonable size can be a challenge. This is the issue that embedding techniques are designed to address.
Why is embedding needed?
In Chapter 8, Using Graph-Based Features in Machine Learning, we drew the following schema:
The Feature engineering step involves extracting features from our dataset. When this dataset consists of observations that already have numerical or categorical characteristics, it is easy to imagine how to build features from these characteristics.
However, some datasets do not have that tabular structure. In such cases, we need to create that structure before feeding the dataset into a machine learning model.
Take a text, such as a book, for example, that contains thousands of words. Now imagine that your task is to predict...