Introducing graph embedding algorithms
This section will introduce the principles of graph embeddings and explain the idea behind two of the most famous algorithms: Node2Vec and GraphSAGE. In the following sections, we will use the GDS implementation of these two algorithms to extract embeddings for nodes stored in a Neo4j database.
Defining embeddings
Machine learning (ML) algorithms—classification or regression—require an input matrix made of observations (rows) and features (columns). While this is trivial for tabular datasets (for example, the Iris or Titanic datasets), this is already a challenge for datasets made of more complex objects such as texts, images, or graphs. The question is: how can we build a matrix from these objects while preserving their nature? By nature, we mean here the key characteristics that will not kill the predictive power of our data. In the case of texts, this is the meaning of the sentences or words. We will see later in this chapter...