Building a GDS Pipeline for Node Classification Model Training
Classifying observations within categories is a classical machine learning (ML) task. As we learned in the preceding chapters, we can use existing ML models such as decision trees to classify a graph’s nodes. The graph structure is used to find extra features, bringing more knowledge into the model. In this chapter, we will discover another key feature of the Neo4j GDS library: pipelines. They let you configure and train an ML model, before using it to make predictions on unseen nodes. You can do all of this from Neo4j, without having to add another library such as scikit-learn to the tech stack.
Also, we are going to work on the Netflix dataset we created earlier in this book (the code is available on GitHub if you don’t have it yet). We will try and make predictions by building a node classification pipeline, focusing on the how rather than the why.
In this chapter, we’re going to cover the...