In all the graph analytics problems we have studied so far, our observations were the nodes of the graph. Now, however, we are moving on to a different concept where the observations are the edges. Each row of the dataset should contain information about one edge of the graph. Since our goal is to predict whether a link will appear in the future or is missing from our current knowledge, we can turn the problem into a binary classification one, that is, the edge can either have:
- the class True, the link exists or is likely to be created, or
- the class False, the link is very unlikely to appear.
Since we are about to build a classification model, our dataset must include both existing and non-existing edges (the two classes of the binary classifier).
Importing the data into Neo4j
The data we are going to use in the rest of this chapter is a randomly generated geometric graph. This kind of graph has many interesting features, one of them...