Now that we are able to use Neo4j and graph algorithms to answer many questions, it is time to worry about how Neo4j manages data of different sizes. Neo4j versions prior to 4.0 were already able to deal with huge amounts of data (billions of nodes and relationships), only limited by disk size. But Neo4j 4.0 has overcome these limitations (almost) by introducing sharding, a technique that spans a graph across several machines. This chapter will provide an introduction to sharding, both in terms of shard definition and a new Cypher statement that was introduced especially to query such graphs. In the last section, we will investigate the GDS performances of large graphs.
The following topics will be covered in this chapter:
- Measuring GDS performance
- Configuring Neo4j 4.0 for big data
Let's get started!