Neo4j is a Java-based highly scalable graph database whose code is publicly available on GitHub at github.com/neo4j/neo4j. This section describes its building blocks and its main cases, but also the cases where it won't perform well because no system is perfect.
Building blocks
As we discussed with graph theory, a graph database such as Neo4j is made of at least two essential building blocks:
- Nodes
- Relationships between nodes:
Let's look at each of of these in detail.
Nodes
In Neo4j, vertices are called nodes. Nodes can be of different types, like Question and Answer were in our former example. In order to differentiate those entities, nodes can have a label. If we continue the parallel with SQL, all nodes with a given label would be in the same table. But the analogy ends here, because nodes can have multiple labels. For instance, Clark Kent is a journalist, but he is also a superhero; the node representing this person can then have the two labels: Journalist and SuperHero.
Labels define the kind of entity the node belongs to. It is also important to store the characteristics of that entity. In Neo4j, this is done by attaching properties to nodes.
Relationships
Like nodes, relationships carry different pieces of information. A relationship between two persons can be of type MARRIED_TO or FRIEND_WITH or many other types of relationship. That's why Neo4j's relationships must have one and only one type.
One of the main powers of Neo4j is that relationships also have properties. For instance, when adding the relationship MARRIED_TO between two persons, we could add the wedding date, place, whether they signed a prenuptial agreement, and so on, as relationship properties.
Properties
Properties are saved as key-value pairs where the key is a string capturing the property name. Each value can then be of any of the following types:
- Number: Integer or Float
- Text: String
- Boolean: Boolean
- Time properties: Date, Time, DateTime, LocalTime, LocalDateTime, or Duration
- Spatial: Point
Internally, properties are saved as a LinkedList, each element containing a key-value pair. The node or relationship is then linked to the first element of its property list.
SQL to Neo4j translator
Here are a few guidelines to be able to easily go from your relational model to a graph model:
SQL world | Neo4j world |
Table | Node label |
Row | Node |
Column | Node property |
Foreign key | Relationship |
Join table | Relationship |
NULL | Do not store null values inside properties; just omit the property |
Applying those guidelines to the SQL model in the question and answer table diagram, we built up the graph model displayed in the whiteboard model for our simple Q&A website, earlier in the chapter. The full graph model can be seen as follows:
http://www.apcjones.com/arrows.
Neo4j use cases
Like any other tool, Neo4j is very good in some situations, but not well suited to others. The basic principle is that Neo4j provides amazing performance for graph traversal. Everything requiring jumping from one node to another is incredibly fast.
On the other hand, Neo4j is probably not the best tool if you want to do the following:
- Perform full DB scans, for instance, answering the question "What is?"
- Do full table aggregates
- Store large documents: the key-values properties list needs to be kept small (let's say no more than around 10 properties)
Some of those pain points can be addressed with a proper graph model. For instance, instead of saving all information as node properties, can we consider moving some of them to another node with a relationship between them? Depending on the kind of requests you are interested in, the graph schema most suited to your application may differ. Before going into the details of graph modeling, we need to stop briefly and and talk about the different kinds of graph properties which will also influence our choice.