Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Learning Cypher
Learning Cypher

Learning Cypher: Write powerful and efficient queries for Neo4j with Cypher, its official query language

eBook
$9.99 $18.99
Paperback
$29.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Learning Cypher

Chapter 1. Querying Neo4j Effectively with Pattern Matching

Querying a graph database using the Java API can be very tedious; you would need to visit the whole graph and skip nodes that don't match what you are searching for. Any changes to the query will result in rethinking the code, changing it, and building it all over again. Why? The reason is that we are using an imperative language to do pattern matching, and traditional imperative languages don't work well in this task. Cypher is the declarative query language used to query a Neo4j database. Declarative means that it focuses on the aspects of the result rather than on methods or ways to get the result so that it is human-readable and expressive.

In this chapter, we will cover the following topics:

  • Setting up a Neo4j database

  • Querying the database in a simpler way than using the Java API

Setting up a new Neo4j database


If you already have experience in creating a Neo4j database, you can skip this and jump to the next section.

Neo4j is a graph database, which means that it does not use tables and rows to represent data logically; instead, it uses nodes and relationships. Both nodes and relationships can have a number of properties. While relationships must have one direction and one type, nodes can have a number of labels. For example, the following diagram shows three nodes and their relationships, where every node has a label (language or graph database), while relationships have a type (QUERY_LANGUAGE_OF and WRITTEN_IN).

The properties used in the graph shown in the following diagram are name, type, and from. Note that every relation must have exactly one type and one direction, whereas labels for nodes are optional and can be multiple.

Neo4j running modes

Neo4j can be run in two modes:

  • An embedded database in a Java application

  • A standalone server via REST

In any case, this choice does not affect the way you query and work with the database. It's an architectural choice driven by the nature of the application (whether a standalone server or a client server), performance, monitoring, and safety of data.

Neo4j Server

Neo4j Server is the best choice for interoperability, safety, and monitoring. In fact, the REST interface allows all modern platforms and programming languages to interoperate with it. Also, being a standalone application, it is safer than the embedded configuration (a potential crash in the client wouldn't affect the server), and it is easier to monitor. If we choose to use this mode, our application will act as a client of the Neo4j server.

To start Neo4j Server on Windows, download the package from the official website (http://www.neo4j.org/download/windows), install it, and launch it from the command line using the following command:

C:\Neo4jHome\bin\Neo4j.bat

You can also use the frontend, which is bundled with the Neo4j package, as shown in the following screenshot:

To start the server on Linux, you can either install the package using the Debian package management system, or you can download the appropriate package from the official website (http://www.neo4j.org/download) and unpack it with the following command:

# tar -cf  <package>

After this, you can go to the new directory and run the following command:

# ./bin/neo4j console

Anyway, when we deploy the application, we will install the server as a Windows service or as a daemon on Linux. This can be done easily using the Neo4j installer tool.

On the Windows command launch interface, use the following command:

# bin\Neo4jInstaller.bat install

When installing it from the Linux console, use the following command:

# neo4j-installer install

To connect to Neo4j Server, you have to use the REST API so that you can use any REST library of any programming language to access the database. Though any programming language that can send HTTP requests can be used, you can also use online libraries written in many languages and platforms that wrap REST calls, for example, Python, .NET, PHP, Ruby, Node.js, and others.

An embedded database

An embedded Neo4j database is the best choice for performance. It runs in the same process of the client application that hosts it and stores data in the given path. Thus, an embedded database must be created programmatically. We choose an embedded database for the following reasons:

  • When we use Java as the programming language for our project

  • When our application is standalone

For testing purposes, all Java code examples provided with this book are made using an embedded database.

Preparing the development environment

The fastest way to prepare the IDE for Neo4j is using Maven. Maven is a dependency management as well as an automated building tool. In the following procedure, we will use NetBeans 7.4, but it works in a very similar way with the other IDEs (for Eclipse, you will need the m2eclipse plugin). The procedure is described as follows:

  1. Create a new Maven project as shown in the following screenshot:

  2. In the next page of the wizard, name the project, set a valid project location, and then click on Finish.

  3. After NetBeans has created the project, expand Project Files in the project tree and open the pom.xml file. In the <dependencies> tag, insert the following XML code:

    <dependencies>
      <dependency>
       <groupId>org.neo4j</groupId>
       <artifactId>neo4j</artifactId>
       <version>2.0.1</version>
      </dependency>
    </dependencies>
    
    <repositories>
      <repository>
        <id>neo4j</id>
    <url>http://m2.neo4j.org/content/repositories/releases/</url>
        <releases>
          <enabled>true</enabled>
        </releases>
      </repository>
    </repositories>

This code informs Maven about the dependency we are using on our project, that is, Neo4j. The version we have used here is 2.0.1. Of course, you can specify the latest available version.

If you are going to use Java 7, and the following section is not present in the file, then you'll need to add the following code to instruct Maven to compile Java 7:

<build>
  <plugins>
    <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-compiler-plugin</artifactId>
      <version>3.1</version>
      <configuration>
        <source>1.7</source>
        <target>1.7</target>
      </configuration>
    </plugin>
  </plugins>
</build>

Once saved, the Maven file resolves the dependency, downloads the JAR files needed, and updates the Java build path. Now, the project is ready to use Neo4j and Cypher.

Creating an embedded database

Creating an embedded database is straightforward. First of all, to create a database, we need a GraphDatabaseFactory class, which can be done with the following code:

GraphDatabaseFactory graphDbFactory = new GraphDatabaseFactory();

Then, we can invoke the newEmbeddedDatabase method with the following code:

GraphDatabaseService graphDb = graphDbFactory
      .newEmbeddedDatabase("data/dbName");

Now, with the GraphDatabaseService class, we can fully interact with the database, create nodes, create relationships, and set properties and indexes.

Configuration

Neo4j allows you to pass a set of configuration options for performance tuning, caching, logging, file system usage, and other low-level behaviors. The following code sets the size of the memory allocated for mapping the node store to 20 MB:

import org.neo4j.graphdb.factory.GraphDatabaseSettings;
// ...
GraphDatabaseService db = graphDbFactory
                .newEmbeddedDatabaseBuilder(DB_PATH)
                .setConfig(GraphDatabaseSettings
                   .nodestore_mapped_memory_size, "20M")
                .newGraphDatabase();

You will find all the available configuration settings in the GraphDatabaseSettings class (they are all static final members).

Note that the same result can be achieved using the properties file. Clearly, reading the configuration settings from a properties file comes in handy when the application is deployed because any modification to the configuration won't require a new build. To replace the preceding code, create a file and name it, for example, neo4j.properties. Open it with a text editor and write the following code in it:

neostore.nodestore.db.mapped_memory=20M

Then, create the database service with the following code:

GraphDatabaseService db = graphDbFactory
                    .newEmbeddedDatabaseBuilder(DB_PATH)
                    .loadPropertiesFromFile("neo4j.properties")
                    .newGraphDatabase();

HR management tool – an example


For the first example in this book, I chose an enterprise application, such as human resource (HR) management, because I think Neo4j is a great persistence tool for enterprise applications. In fact, they are famous for having very complex schemas with a lot of relationships and entities and requirements that often change during the life of the software; therefore, the queries are also complicated and prone to change frequently.

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

In our human resources tool, we have two kinds of nodes: employees and cost centers. So, we can define the two labels with the following code:

public enum HrLabels implements Label {
  Employee,
  CostCenter
}

Labels are usually defined using an enumeration, but Neo4j just requires those labels that implement the Label interface.

Tip

Labels are a very useful feature introduced in Neo4j 2.0 that allow us to label a node and make it easier to find them later. A node can have one or more labels, so you can express complex domain concepts. Labels can be indexed to improve the performance of a search as well.

We have three types of relationships:

  • Employees that belong to cost centers

  • Employees that report to other employees

  • Employees that can be managers of a cost center

So, we have to define the relationships. This is usually done using the enum function, as shown in the following code snippet:

public enum EmployeeRelationship implements RelationshipType {
   REPORTS_TO,
   BELONGS_TO,
   MANAGER_OF;

   public static final String FROM = "from";
}

The FROM constant represents the name of a property. We will use it to store the start date of the validity of the relationship. Clearly, a real-world HR application would have a lot of relationships and properties; here we have just a subset.

Creating nodes and relationships using the Java API

The next step is to fill in the database. First of all, to work with Neo4j using the Java API, we always need a transaction created from the GraphDatabaseService class. While building with Java 7, you can use the following syntax:

import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.GraphDatabaseService;

// ...
try (Transaction tx = graphDb.beginTx()) {

   // work with the graph...
   tx.success();
}

The first line in the preceding code creates a transaction named tx. The call to success marks the transaction successful; every change will be committed once the transaction is closed. If an exception is thrown from inside the try statement, the transaction automatically ends with a rollback. When you use Java 6, the code is a little longer because you have to close the transaction explicitly within a finally clause, as shown in the following code:

Transaction tx = graphDb.beginTx();
try {
   // work with the graph...
   tx.success();
} finally {
  tx.close();
}

Now, in our application, cost centers are identified only by their code, while employees can have the following properties:

  • Name

  • Surname

  • Middle name

Our relationships (REPORTS_TO, BELONGS_TO, and MANAGER_OF) can have a property (From) that specifies the dates of validity. The following code creates some examples of nodes and the relationships between them, and then sets the property values of nodes and some relationships:

import java.util.GregorianCalendar;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;

public class DatabaseSetup {

/**
* Properties of a cost center
*/
public static class CostCenter {
    public static final String CODE = "code";
}

/**
* Properties of an employee
*/
public static class Employee {
    public static final String NAME = "name";
    public static final String MIDDLE_NAME = "middleName";
    public static final String SURNAME = "surname";
}

Public static void setup(GraphDatabaseService graphDb) {

        try (Transaction tx = graphDb.beginTx()) {
            // set up of center costs
            Node cc1 = graphDb.createNode(HrLabels.CostCenter);
            cc1.setProperty(CostCenter.CODE, "CC1");

            Node cc2 = graphDb.createNode(HrLabels.CostCenter);
            cc2.setProperty(CostCenter. CODE, "CC2");

            Node davies = graphDb.createNode(HrLabels.Employee);
            davies.setProperty(Employee.NAME, "Nathan");
            davies.setProperty(Employee.SURNAME, "Davies");

            Node taylor = graphDb.createNode(HrLabels.Employee);
            taylor.setProperty(Employee.NAME, "Rose");
            taylor.setProperty(Employee.SURNAME, "Taylor");

            Node underwood = graphDb.createNode(HrLabels.Employee);
            underwood.setProperty(Employee.NAME, "Heather");
            underwood.setProperty(Employee.MIDDLE_NAME, "Mary");
            underwood.setProperty(Employee.SURNAME, "Underwood");

            Node smith = graphDb.createNode(HrLabels.Employee);
            smith.setProperty(Employee.NAME, "John");
            smith.setProperty(Employee.SURNAME, "Smith");

            // There is a vacant post in the company
            Node vacantPost = graphDb.createNode();

            // davies belongs to CC1
            davies.createRelationshipTo(cc1, EmployeeRelationship.BELONGS_TO)
                    .setProperty(EmployeeRelationship.FROM,
                            new GregorianCalendar(2011, 1, 10).getTimeInMillis());

            // .. and reports to Taylor
            davies.createRelationshipTo(taylor, EmployeeRelationship.REPORTS_TO);

            // Taylor is the manager of CC1
            taylor.createRelationshipTo(cc1, EmployeeRelationship.MANAGER_OF)
                    .setProperty(EmployeeRelationship.FROM,
                            new GregorianCalendar(2010, 2, 8).getTimeInMillis());

            // Smith belongs to CC2 from 2008
            smith.createRelationshipTo(cc2, EmployeeRelationship.BELONGS_TO)
                    .setProperty(EmployeeRelationship.FROM,
                            new GregorianCalendar(2008, 9, 20).getTimeInMillis());

            // Smith reports to underwood
            smith.createRelationshipTo(underwood, EmployeeRelationship.REPORTS_TO);

            // Underwood belongs to CC2
            underwood.createRelationshipTo(cc2, EmployeeRelationship.BELONGS_TO);

            // Underwood will report to an employee not yet hired
            underwood.createRelationshipTo(vacantPost, EmployeeRelationship.REPORTS_TO);

            // But the vacant post will belong to CC2
            vacantPost.createRelationshipTo(cc2, EmployeeRelationship.BELONGS_TO);

            tx.success();
        }
    }
}

In the preceding code, we used the following functions of the GraphDatabaseService class:

  • createNode: This creates a node and then returns it as result. The node will be created with a long, unique ID.

    Note

    Unlike relational databases, node IDs in Neo4j are not guaranteed to remain fixed forever. In fact, IDs are recomputed upon node deletion, so don't trust IDs, especially for long operations.

  • createRelationshipTo: This creates a relationship between two nodes and returns that relationship in a relationship instance. This one too will have a long, unique ID.

  • setProperty: This sets the value of a property of a node or a relationship.

We put the time in milliseconds in the property because Neo4j supports only the following types or an array of one of the following types:

  • boolean

  • byte

  • short

  • int

  • long

  • float

  • double

  • String

To store complex types of arrays, we can code them using the primitive types, as seen in the preceding list, but more often than not, the best approach is to create nodes. For example, if we have to store a property such as the entire address of a person, we can convert the address in JSON and store it as a string.

This way of storing data in a JSON format is common in document-oriented DBs, such as MongoDB, but since Neo4j isn't a document database, it won't build indexes on the properties of the document. So, for example, it would be difficult or very slow to query people by filtering on any field of the address, such as the ZIP code or the country. In other words, you should use this approach only for raw data that won't be filtered or processed with Cypher; in other cases, creating nodes is a better approach.

A querying database

A typical report of our application is a list of all the employees. In our database, an employee is a node labeled Employee, so we have to find all nodes that match with the label Employee pattern. In Cypher, this can be expressed with the following query:

MATCH (e:Employee)
RETURN e

The MATCH clause introduces the pattern we are looking for. The e:Employee expression matches all e nodes that have the label Employee; this expression is within round brackets because e is a node. So, we have the first rule of matching expressions—node expressions must be within round brackets.

With the RETURN clause, we can specify what we want; for example, we can write a query to return the whole node with all its properties. In this clause, we can use any variable used in the MATCH clause. In the preceding query, we have specified that we want the whole node (with all its properties). If we are interested only in the name and the surname of the employees, we can make changes only in the RETURN clause:

MATCH (e:Employee)
RETURN e.name,e.surname

If any node does not have either of the properties, a null value is returned. This is a general rule for properties from version 2 of Cypher; missing properties are evaluated as null values.

The next question is how to invoke Cypher from Java.

Invoking Cypher from Java

To execute Cypher queries on a Neo4j database, you need an instance of ExecutionEngine; this class is responsible for parsing and running Cypher queries, returning results in a ExecutionResult instance:

import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
// ...
ExecutionEngine engine = 
  new ExecutionEngine(graphDb);
ExecutionResult result = 
  engine.execute("MATCH (e:Employee) RETURN e");

Note that we use the org.neo4j.cypher.javacompat package and not the org.neo4j.cypher package even though they are almost the same. The reason is that Cypher is written in Scala, and Cypher authors provide us with the former package for better Java compatibility.

Now with the results, we can do one of the following options:

  • Dumping to a string value

  • Converting to a single column iterator

  • Iterating over the full row

Dumping to a string is useful for testing purposes:

String dumped = result.dumpToString();

If we print the dumped string to the standard output stream, we will get the following result:

Here, we have a single column (e) that contains the nodes. Each node is dumped with all its properties. The numbers between the square brackets are the node IDs, which are the long and unique values assigned by Neo4j on the creation of the node.

When the result is a single column, or we need only one column of our result, we can get an iterator over one column with the following code:

import org.neo4j.graphdb.ResourceIterator;
// ...
ResourceIterator<Node> nodes = result.columnAs("e");

Then, we can iterate that column in the usual way, as shown in the following code:

while(nodes.hasNext()) {
   Node node = nodes.next();
   // do something with node
}

However, Neo4j provides a syntax-sugar utility to shorten the code that is to be iterated:

import org.neo4j.helpers.collection.IteratorUtil;
// ...
for (Node node : IteratorUtil.asIterable(nodes)) {
   // do something with node
}

If we need to iterate over a multiple-column result, we will write this code in the following way:

ResourceIterator<Map<String, Object>> rows = result.iterator();
for(Map<String,Object> row : IteratorUtil.asIterable(rows)) {
   Node n = (Node) row.get("e");
   try(Transaction t = n.getGraphDatabase().beginTx()) {
       // do something with node
   }
}

The iterator function returns an iterator of maps, where keys are the names of the columns. Note that when we have to work with nodes, even if they are returned by a Cypher query, we have to work in transaction. In fact, Neo4j requires that every time we work with the database, either reading or writing to the database, we must be in a transaction. The only exception is when we launch a Cypher query. If we launch the query within an existing transaction, Cypher will work as any other operation. No change will be persisted on the database until we commit the transaction, but if we run the query outside any transaction, Cypher will open a transaction for us and will commit changes at the end of the query.

Finding nodes by relationships

If you have ever used the Neo4j Java API, you might wonder why we should write the following code:

ExecutionEngine engine = 
  new ExecutionEngine(graphDb, StringLogger.SYSTEM);
ExecutionResult result = 
  engine.execute("MATCH (e:Employee) RETURN e");
ResourceIterator<Node> nodes = result.columnAs("e");

You can get the same result with the Java API with a single line of code:

import org.neo4j.tooling.GlobalGraphOperations;
// ...
ResourceIterable<Node> empl = GlobalGraphOperations.at(graphDb)
                    .getAllNodesWithLabel(HrLabels.Employee);

However, pattern matching is much more powerful. By making slight changes to the query, we can get very important and different results; for example, we can find nodes that have relationships with other nodes. The query is as follows:

MATCH (n:Employee) --> (cc:CostCenter)
RETURN cc,n

The preceding query returns all employees that have a relation with any cost center:

Again, as you can see, both n and cc are within round brackets. Here, the RETURN clause specifies both n and cc, which are the two columns returned. The result would be the same if we specified an asterisk instead of n and cc in the RETURN clause:

MATCH (n:Employee) --> (cc:CostCenter)
RETURN *

In fact, similar to SQL, the asterisk implies all the variables referenced in the patterns, but unlike SQL, not all properties of the entities are involved, just those of the referenced ones. In the previous query, relationships were not returned because we didn't put a variable in square brackets.

Filtering properties

By making another slight change to the query, we can get all the employees that have a relation with a specific cost center, for example CC1. We have to filter the code property as shown in the following code:

MATCH (n:Employee) --> (:CostCenter { code: 'CC1' })
RETURN n

If we compare this query with the previous one, we can note three differences, which are listed as follows:

  • The query returns only the employee node n because we don't care about the center cost here.

  • Here, we omitted the cc variable. This is possible because we don't need to give a name to the cost center that matches the expression.

  • In the second query, we added curly brackets in the cost center node to specify the property we are looking for. So, this is another rule of pattern-matching expressions: properties are expressed within curly brackets.

The --> symbol specifies the direction of the relation; in this case, outgoing from n. In the case of MATCH expressions, we can also use the <-- symbol for inverse direction. The following expression is exactly equivalent to the previous expression:

MATCH (:CostCenter { code: 'CC1' } ) <-- (n:Employee)
RETURN n

The preceding expression will give the same result:

+-----------------------------------------+
| n                                       |
+-----------------------------------------+
| Node[2]{name:"Nathan",surname:"Davies"} |
| Node[3]{name:"Rose",surname:"Taylor"}   |
+-----------------------------------------+

If we don't have a preferred direction, we will use the -- symbol:

MATCH (n:Employee) -- (:CostCenter { code: 'CC1' } )
RETURN n

In our example, the latter query will return the same result as the previous one because in our model, relationships go from employees to cost centers.

Filtering relationships

If we wish to know the existing relationships between the employees and cost centers, we will have to introduce another variable:

MATCH (n:Employee) -[r]- (:CostCenter { code: 'CC1' })
RETURN n.surname,n.name,r

The variable r matches any relationship that exists between the employees and cost center CC1 and is returned in a new column:

+-----------------------------------------------------------+
| n.surname | n.name   | r                                  |
+-----------------------------------------------------------+
| "Davies"  | "Nathan" | :BELONGS_TO[0]{from:1297292400000} |
| "Taylor"  | "Rose"   | :MANAGER_OF[2]{from:1268002800000} |
+-----------------------------------------------------------+

So, here we have the last rule: relationship expressions must be specified in square brackets.

To filter the employees who belong to a specific cost centre, we have to specify the relationship type:

MATCH (n) -[:BELONGS_TO]-> (:CostCenter { code: 'CC1' } )
RETURN n

This query matches any node n, which has a relation of the BELONGS_TO type with any node cc that has the value CC1 as a property code:

+-----------------------------------------+
| n                                       |
+-----------------------------------------+
| Node[2]{name:"Nathan",surname:"Davies"} |
+-----------------------------------------+

We can specify multiple relationships using the | operator. The following query will search for all employees who belong to or are managers of the cost center CC1:

MATCH (n) -[r:BELONGS_TO|MANAGER_OF]-> (:CostCenter{code: 'CC1'})
RETURN n.name,n.surname,r

This time we returned only the name and surname, while the relationship is returned in the second column:

+-----------------------------------------------------------+
| n.name   | n.surname | r                                  |
+-----------------------------------------------------------+
| "Nathan" | "Davies"  | :BELONGS_TO[0]{from:1297292400000} |
| "Rose"   | "Taylor"  | :MANAGER_OF[2]{from:1268002800000} |
+-----------------------------------------------------------+

By making a slight change to the query in the preceding code, we can return the manager as well as the employees of the cost center as the result. This can be implemented as shown in the following query:

MATCH (n) -[:BELONGS_TO]-> (cc:CostCenter) <-[:MANAGER_OF]- (m)
RETURN n.surname,m.surname,cc.code

In this query, we can see the expressivity of Cypher—a very intuitive syntax to translate the "node n belonging to the cost center having a manager m" pattern. The result is the following code:

+---------------------------------+
| n.surname | m.surname | cc.code |
+---------------------------------+
| "Davies"  | "Taylor"  | "CC1"   |
+---------------------------------+

Of course, we can chain an increasing number of relationship expressions to describe very complex patterns:

MATCH (n) -[:BELONGS_TO]->
      (cc:CostCenter) <-[:MANAGER_OF]- (m) <-[:REPORTS_TO]- (k)
RETURN n.surname,m.surname,cc.code, k.surname

Another query that is very useful in real-world applications is finding nodes reachable from one node with a certain number of steps and a certain depth. The ability to execute this kind of query, and search the neighborhood, is one of the strong points of graph databases:

MATCH (:Employee {surname: 'Smith'}) -[*2]- (neighborhood)
RETURN neighborhood

This query returns the nodes that you can reach, starting from the Davies node, by visiting exactly two relationships of the graph. The result contains duplicated nodes because we have several paths to reach each of them:

+---------------------------------------------------------------+
| neighborhood                                                  |
+---------------------------------------------------------------+
| Node[4]{name:"Heather",surname:"Underwood",middleName:"Mary"} |
| Node[6]{}                                                     |
| Node[1]{code:"CC2"}                                           |
| Node[6]{}                                                     |
+---------------------------------------------------------------+
4 rows

Tip

To get different values, we can use the DISTINCT keyword:

MATCH (:Employee {surname: 'Smith'}) -[ *2]- (neighborhood)
RETURN DISTINCT neighborhood

This time, we haven't specified any relationship type in the square brackets, so it matches any type. The expression *2 means exactly two steps. With a little change, we can also ask for the relationships we visited:

MATCH (:Employee {surname: 'Davies'}) -[r*2]- (neighborhood)
RETURN neighborhood, r

Of course, by changing the number in the expression, we can get the query to navigate any number of relationships. However, we could also want all the nodes that are reachable from a number of relationships in a range of step numbers, for example, from two to three:

MATCH (:Employee {surname: 'Smith'}) -[r*2..3]- (neighborhood)
RETURN neighborhood,r

This is very useful in real-world applications such as social networks because it can be used to build lists, for example, a list of people you may know.

If we also want the starting node in the result, we can modify the range to start from 0:

MATCH (:Employee{surname: 'Smith'}) -[r*0..2]- (neighborhood)
RETURN neighborhood,r

Dealing with missing parts

In our applications, we often need to get some information related to something that could be missing. For example, if we want to get a list of all employees who have a specific number of employees reporting to them, then we must deal with those employees too who have no employees reporting to them. In fact, we can write:

MATCH (e:Employee) <-[:REPORTS_TO]- (m:Employee)
RETURN e.surname,m.surname

From this, the following result is obtained:

+-------------------------+
| e.surname   | m.surname |
+-------------------------+
| "Taylor"    | "Davies"  |
| "Underwood" | "Smith"   |
+-------------------------+

However, this is not what we are looking for. In fact, we want all the employees, with all the employees that report to them as an option. This type of relation is similar to the OUTER JOIN clause of SQL and can be done in Cypher using OPTIONAL MATCH. This keyword allows us to use any pattern expression that can be used in the MATCH clause, but it describes only a pattern that could match. If the pattern does not match, the OPTIONAL MATCH clause sets any variable to null variable:

MATCH (e:Employee)
OPTIONAL MATCH (e) <-[:REPORTS_TO]- (m:Employee)
RETURN e.surname,m.surname, c.code

In this query, we slightly changed the previous one; we just inserted OPTIONAL MATCH (e). The effect is that the first part (e:Employee) must match, but the pattern following OPTIONAL MATCH may or may not match. So, this query returns any employee e, and if e has a relationship of the REPORTS_TO type with any other employee, this query is returned in m; otherwise, m will be a null value. The result is as follows:

 +-------------------------+
| e.surname   | m.surname |
+-------------------------+
| "Davies"    | <null>    |
| "Taylor"    | "Davies"  |
| "Underwood" | "Smith"   |
| "Smith"     | <null>    |
+-------------------------+

Note

Unlike object-oriented languages where referencing any property of a null object will result in a null-reference exception, in Cypher referencing, which is a property of the null node, we get a null value again.

Now, let's say that we also want to know whether the employee is the manager of any center cost, and if so, which one. Also, we want to know the cost center of any employee. For this, we can write the following code:

MATCH (e:Employee)
OPTIONAL MATCH (c:CostCenter) <–[:MANAGER_OF]- (e) <-[:REPORTS_TO]- (m:Employee)
RETURN e.surname,m.surname

The preceding code returns the following result:

+----------------------------------+
| e.surname   | m.surname | c.code |
+----------------------------------+
| "Davies"    | <null>    | <null> |
| "Taylor"    | "Davies"  | "CC1"  |
| "Underwood" | <null>    | <null> |
| "Smith"     | <null>    | <null> |
+----------------------------------+

What happened? Does it look like Smith does not report to Underwood anymore? This weird result is due to the fact that the whole pattern in OPTIONAL MATCH must match. We can't have partially matched patterns. Since we can add as many OPTIONAL MATCH expressions as we want to, we have to write the following code to get the result we are looking for:

MATCH (e:Employee)
OPTIONAL MATCH (e) <-[:REPORTS_TO]- (m:Employee)
OPTIONAL MATCH (e) -[:MANAGER_OF]-> (c:CostCenter)
RETURN e.surname, m.surname, c.code

In fact, the result is the following code:

+----------------------------------+
| e.surname   | m.surname | c.code |
+----------------------------------+
| "Davies"    | <null>    | <null> |
| "Taylor"    | "Davies"  | "CC1"  |
| "Underwood" | "Smith"   | <null> |
| "Smith"     | <null>    | <null> |
+----------------------------------+

This query works because we have two OPTIONAL MATCH clauses that can independently generate a successful match.

Working with paths

As we have seen earlier, graph databases are useful to find paths between two nodes:

MATCH path = (a{surname:'Davies'}) -[*]- (b{surname:'Taylor'})
RETURN path

This query uses a construct which we have not used so far—the path assignment, path =. The assignment of variables can be done only with paths. Note that the query in the preceding code returns all the possible paths from two nodes. Here, the result is two paths in our database:

[Node[2]{name:"Nathan",surname:"Davies"},:BELONGS_TO[0]{from:1297292400000},Node[0]{code:"CC1"},:MANAGER_OF[2]{from:1268002800000},Node[3]{name:"Rose",surname:"Taylor"}] |
 [Node[2]{name:"Nathan",surname:"Davies"},:REPORTS_TO[1]{},Node[3]{name:"Rose",surname:"Taylor"}]

However, what if we need the shortest path between them? The shortest path is the path with the least number of nodes visited. Clearly, we could iterate over all the paths and take the shortest, but Cypher provides a function that does the work for us:

MATCH (a{surname:'Davies'}), (b{surname:'Taylor'})
RETURN allShortestPaths((a)-[*]-(b)) as path

Let's see what is new in this query:

  • MATCH: In this clause, we have two node expressions (in round brackets) separated by a comma. These expressions, a and b, match any node independently, just like a Cartesian product.

  • RETURN: In this clause, we have to call the allShortestPath function that takes an expression as a parameter. The expression is a variable length relation (this is the asterisk between the square brackets). Here, we don't care about relationship types and the direction, but we can filter properties, relation types involved, and so on, if necessary.

  • RETURN: In this clause, we have an alias. An alias must be defined using the keyword AS. It just specifies the name of the column returned.

Node IDs as starting points

When we execute a query like the previous code, Cypher must find the nodes and relationships that match the pattern. However, to do so, it must start to search from a set of nodes or relationships. We can let Cypher find the starting points of a query on its own, but we can also specify them because we want to search a pattern that starts from a specific node, or a specific relation resulting in an important improvement in the performances of the query.

We can assign starting points to variables in the query using the START keyword. The previous query, for example, could be rewritten in the following way:

START a=node(2), b=node(3)
RETURN allShortestPaths((a)-[*]-(b)) AS path

If we execute this query, and compare the time elapsed in executing this query and the previous one, we can easily prove that the latter is dramatically faster. The drawback is that we need to know the ID of the node.

Query parameters

In real-world applications, you often need to execute a query multiple times, changing a value in the query every time. For example, you need to find an employee by the surname, but the surname is typed by the application user from the keyboard. Cypher allows us to use parameters, just like in SQL. The names of the parameters must be between curly brackets:

MATCH (n:Employee {surname: {inputSurname} })
RETURN n

In this query, we have a parameter (inputSurname), whose value must be provided while executing the query.

Passing parameters with Java

The Cypher Java API wants us to pass all the parameters in the map. The following code is a class example that has a public method to find all employees by their surname:

import java.util.Map;
import java.util.HashMap;
import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
import org.neo4j.graphdb.Node;
import org.neo4j.helpers.collection.IteratorUtil;

public class EmployeeRepository {

   public Iterator<Node> bySurname(String surname) {
      Map<String, Object> params = new HashMap<>();
      params.put("inputSurname", surname);
      ExecutionResult result = engine
         .execute("MATCH (n:Employee {surname: {inputSurname}})" +       
                  "RETURN n",
                  params);
        Iterator<Node> nodes = result.columnAs("n");
        return nodes;
    }
}

The bySurname method takes the surname of the employees as a parameter to search, and it creates a new HashMap and puts the parameter in the map. Finally, the map is passed to the execute method of ExecutionEngine, and the result is treated in the usual way.

Tip

Since parameters are referenced by name, you can reference the same variable several times in the query.

Summary


In this chapter, you first created a Neo4j database using the Java API. It details how Neo4j works with nodes and relationships.

Then, you queried that database to learn Cypher pattern matching. You learned about the MATCH keyword. You also learned that node expressions must go between round brackets, while relationships must be expressed in square brackets, and property expressions must be written between curly brackets.

You also learned how to use the RETURN clause to select which matched values we want the query to return. We wrote the Java code needed to use the result, even using query parameters. The OPTIONAL MATCH keyword allows us to match parts of the graph that could be missing. You also learned how to find paths from one node to another, and the shortest path between them, using the allShortestPaths function.

In the next chapter, you will learn how to filter and aggregate data and how to page through a query result.

Left arrow icon Right arrow icon

What you will learn

  • Design and create flexible and fast graph databases using the Cypher declarative syntax
  • Write powerful, readable, and reusable queries with pattern matching and parameters
  • Develop fast applications using best practices to improve the performance of your Cypher queries
  • Transition smoothly from SQL to Neo4j
  • Migrate relational databases to the graph model, getting rid of O/R mismatch
  • Avoid the common mistakes and pitfalls in programming with Neo4j
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : May 14, 2014
Length: 162 pages
Edition :
Language : English
ISBN-13 : 9781783287758
Category :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to United States

Economy delivery 10 - 13 business days

Free $6.95

Premium delivery 6 - 9 business days

$21.95
(Includes tracking information)

Product Details

Publication date : May 14, 2014
Length: 162 pages
Edition :
Language : English
ISBN-13 : 9781783287758
Category :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 117.97
Learning Cypher
$29.99
Neo4j Graph Data Modelling
$32.99
Learning Neo4j
$54.99
Total $ 117.97 Stars icon
Banner background image

Table of Contents

5 Chapters
Querying Neo4j Effectively with Pattern Matching Chevron down icon Chevron up icon
Filter, Aggregate, and Combine Results Chevron down icon Chevron up icon
Manipulating the Database Chevron down icon Chevron up icon
Improving Performance Chevron down icon Chevron up icon
Migrating from SQL Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.7
(12 Ratings)
5 star 33.3%
4 star 16.7%
3 star 33.3%
2 star 16.7%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




TAL Oct 12, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Very methodology learning
Amazon Verified review Amazon
Kozhanov Eugeny Aug 31, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This is great book! It's easy to read and learning Cypher. It's have examples of queries and examples code using Cypher's queries in Java. This book very useful for SQL users, because it's easy to learn if you use SQL in your work. There are examples and comments for each query in this book. I use neo4j and Java, and this book help me make my queries more shorter and useful for other developers in our team.
Amazon Verified review Amazon
Deven Kalra Sep 23, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Seamlessly upgrade from SQL to Cypher.. Author has in depth knowledge of the subjects and he explains about graph theory also.. So need to purchase additional book for that.. Basically all the code is in java so if you are trying to connect to Graph database like Neo4J you are in luck with this! I know I was. Hitting wall after wall.. till I found this gem. Very cheap and good.
Amazon Verified review Amazon
Daniele Ticchiarelli Jun 25, 2014
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The authors have done an outstanding job of extracting the essence of Cypher and describing a core set of features of the tool features as well as how to take advantage from them.He describes very clearly his capability, constructs and above all how to manage with the relationships among nodes, their creation, modification and deleting.This text has served as a helpful guide among other more recently online docs and tutorials.Yes, I would recommend it.
Amazon Verified review Amazon
online shopper Feb 25, 2015
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
I've been playing with Neo4j off and on for more than a year, and I knew that Cypher was maturing as a query language.I agree with the other reviewer who said there's a bunch of java stuff at the beginning, which may be useful. I don't know because I'm still learning java.But I was surprised to find that since Cypher is so intuitive, that if you just jump into Chapter 1, around page 20, if you're familiar with graphs and databases, it should be pretty obvious what is going on from the syntax alone.Having said that, I found the comparisons with SQL really helpful. I was surprised to note how many of the things in this book were already familiar to me, and how many new features have been added to Cypher.I hope it will continue growing as a language, since there are still a lot of graph applications that require outside wrappers (I use py2neo), and not all of those wrappers are fully developed, yet. So java is the fallback for anything sophisticated you might want to do.This book is really for those who are newer to Neo4j and want to get up and running quickly with a good overview of what it can do out of the box, which is quite a lot.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact customercare@packt.com with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at customercare@packt.com using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on customercare@packt.com with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on customercare@packt.com within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on customercare@packt.com who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on customercare@packt.com within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela