Functions
Functions are grouped by usage. Except for the COALESCE
function, all of these functions return a null
value if any argument has a null
value.
COALESCE
The COALESCE
function takes any number of parameters (at least one) and returns the first non-null value. It returns a null
value only in case all the parameters have null
values.
Usually, this function is used every time you have to deal with null
values and want a fallback value in case of a null
value. Consider the following query:
MATCH (aa:Book) WHERE 'novel' IN COALESCE(aa.tags, ['novel']) RETURN aa.title
This query returns all books that have novel in their tags, but will return all the books that do not have no tags set as well. In fact, if a book doesn't have the tags
property, the COALESCE
function will return the collection ['novel']
. Therefore, the IN
predicate will always succeed.
TIMESTAMP
The TIMESTAMP
function takes no arguments and returns the number of milliseconds passed from the midnight of January 1, 1970. This function is used when you create or update a node to mark the change in the date. It is especially useful to create versions. Consider the following query:
CREATE (a:Book {title: "Learning Cypher", version: TIMESTAMP() })
This query creates a book in the database setting the version number as a timestamp. Another query could check whether the version of the book has not changed before updating, and fail if a new version of the book is in the database (an optimistic lock).
ID
The ID
function returns the identification number, managed by Neo4j, of a node or a relationship. Consider the following query:
MATCH (a) RETURN a ORDER BY ID(a) LIMIT 1
The preceding query returns the node with the minimum ID value. In fact, it sorts the dataset by the ID, then limits the results to one row.
Working with nodes
The following functions are specific to nodes. They are commonly used when you work with paths or collections of nodes and you want to inspect nodes and labels.
NODES
The NODES
function returns a collection of all the nodes in a path. It is used when you want all the nodes found traversing a path between two nodes. Consider the following query:
MATCH p = (:Employee {surname: "Davies"})-[:REPORTS_TO*]- (:Employee {surname: "Taylor"}) RETURN NODES(p)
The preceding query first matches all the paths between two employees, visiting all the relations of the REPORTS_TO
type and then returns all the nodes visited. In other words, it returns all the employees found in a hierarchy between two employees.
Note that, being a collection, the return value can be further inspected with the collection functions, which is explained later.
LABELS
To get all the labels assigned to a node, use the LABELS
function. It returns a collection of strings. Consider the following query:
MATCH (e {surname: "Davies"}) RETURN LABELS(e)
This query returns all the labels assigned to the nodes that have the surname
property set to Davies
. This function is useful when your database has nodes that have multiple labels.
Working with paths and relationships
When you work with paths or with variable length relationships, you may need to inspect the relationships contained. The following functions are used for this purpose:
TYPE
ENDNODE
andSTARTNODE
SHORTESTPATH
andALLSHORTESTPATHS
RELATIONSHIPS
TYPE
The string returned by this function is the type of the relationship passed. Consider the following query:
MATCH ()-[r]-() RETURN TYPE(r), COUNT(*)
This query computes the number of relationships grouped by their type.
ENDNODE and STARTNODE
The following functions inspect the nodes at the endpoints of a relationship:
STARTNODE
: This function returns the start node of the relationship passed as an argumentENDNODE
: This function returns the end node of the relationship passed as an argument
These functions are useful when you haven't specified the direction of the relationship in the MATCH
clause of a query, but you want all the related nodes and wish to know the direction of the relationship.
For example, in the HR management tool, the following query can be used:
MATCH (e:Employee {surname: "Davies"})-[r:REPORTS_TO]-(a:Employee) RETURN a, ENDNODE(r)
This query looks for all the neighborhood nodes of the node with the surname
property Davies
(depth of one) along the relationship of REPORTS_TO
. It returns the neighborhood and the employee who is at the end node of the relationship. In other words, it says who the boss is.
SHORTESTPATH and ALLSHORTESTPATHS
The SHORTESTPATH
and ALLSHORTESTPATHS
functions search the shortest path among a set of paths. The difference is that while the SHORTESTPATH
function returns at most one only path, the ALLSHORTESTPATHS
function returns all the paths with the smallest depth in a collection of paths. Consider the following query:
MATCH p=SHORTESTPATH((a{surname:'Davies'})--(b{surname:'Doe'})) RETURN p
The preceding query returns the shortest path between two employee nodes (Davies
and Doe
). Now, consider the following query:
MATCH p=SHORTESTPATH( (a {surname:'Davies'})--(b) ) RETURN p ORDER BY LENGTH(p) DESC LIMIT 1
Instead, the preceding query returns the path to the farthest node from a given employee node (Davies
). In fact, once stated that the distance from a node is the shortest path from it, this query first computes all distances from all nodes, then selects the longest distance.
RELATIONSHIPS
The RELATIONSHIPS
function returns the collection of the relationships contained in a path. It is the equivalent of the NODES
function. Consider the following query:
MATCH p = (a{surname:'Davies'})-[*..4]-(b{surname:'Taylor'}) RETURN RELATIONSHIPS(p)
This example takes all the paths with maximum depth of four between two nodes, then for each path, it returns the relationships visited.
Working with collections
The following functions are used to manipulate or inspect a collection:
HEAD
TAIL
LAST
HEAD, TAIL, and LAST
The HEAD
, TAIL
, and LAST
functions are used to work with collections as if they were lists. The HEAD
function returns the first item of the collection, the TAIL
function returns the rest of the list, while the LAST
function returns the last item of the collection. Note that when you add the head of a collection to its tail (HEAD(c) + TAIL(c)
), you get the full collection again.
If the collection is empty, the HEAD
and the LAST
functions will return a null
value, while the TAIL
function will return an empty collection. Consider the following query:
MATCH (a:Book) RETURN HEAD(a.tags) LIMIT 30
This query returns the first tag of the first 30 books found in the database.
LENGTH
The LENGTH
function returns the size of a collection of any type or of a string. Consider the following query:
MATCH (a:Book) WHERE LENGTH(a.tags) >= 2 RETURN a
This example query returns all the books that have at least two tags. Note that the books without the tags
property will be ignored because LENGTH(NULL)
is a null
value. Now, consider the following query:
MATCH (a:Book) WHERE LENGTH(a.title) < 7 RETURN a
This example, instead, shows the function in action with a string. It returns all the books with a title shorter than seven characters in length.
EXTRACT
The EXTRACT
function is equivalent to the map
function in the MapReduce paradigm. A similar function is included in most programming languages nowadays; for example, the map
function in Scala, Java 8, JavaScript, and Ruby, the Select
function in C#, and the list comprehensions in Python (although the map
function is provided as well). It takes two parameters: a collection and an expression, which can be evaluated on each item of the collection. It returns a new collection, which has the values returned by the expression applied on every item of the collection. The syntax is similar to the syntax of the collection predicates (refer to Chapter 2, Filter, Aggregate, and Combine Results). Consider the following query:
RETURN EXTRACT(i IN [1,2,3] | i*2)
The result of this query is the collection [2,4,6]
because it applies the expression i*2
on every item i
of the collection [1,2,3]
. Now, consider the following query:
MATCH (a:Book) RETURN EXTRACT(tag in a.tags | "Tag: " + tag) LIMIT 30
For every book in the first 30 books found in the database, this query returns a list of strings with the prefix Tag:
.
This function is mostly used with collections of nodes or relationships. Consider the following query:
MATCH p=(e:Employee {surname: "Davies"})-[r:REPORTS_TO*] -(a:Employee) RETURN EXTRACT(node IN NODES(p) | LABELS(node))
The preceding query, for each node found in a path, extracts the labels of that node.
FILTER
The FILTER
function returns a collection created by adding values that satisfy a certain condition from a source collection. It is similar to the WHERE
clause, but it works on collections. Consider the following query:
RETURN FILTER (x IN [1,2,3,4] WHERE x%2 = 0)
This simple query returns the even values from a list of integers. Yet, this query is mostly used while working with paths too. Now, consider the following query:
MATCH p=(e)-[r]->(a) RETURN FILTER (node IN NODES(p) WHERE node:Book)
This query, for each path that matches the pattern, returns the list of nodes found in the path, but filters only the nodes with the label Book
.
REDUCE
The REDUCE
function has the same purpose as the Reduce
function of the well-known MapReduce paradigm. Nowadays, almost all languages have this function, even if sometimes with different names; the reduce
function in Java 8 (the Stream API) and JavaScript, the collect
function in Ruby and Python, the Aggregate
function in C#, and the foldLeft
function in Scala. It is used to aggregate the items of a collection, scanning every item and accumulating them according to a given expression. For example, the following query returns the sum of all items of a collection:
RETURN REDUCE (total = 0, i in [1,2,3] | total + i)
The variable total
is the accumulator initialized to 0. Then for each item in the collection, this variable is increased by the value of the item. Of course, you can apply the same pattern to strings. Consider the following query:
RETURN REDUCE (text = "", i in ["Hello ","world","!"] | text + i)
This query concatenates the strings in the collections and returns Hello world!
. Let's see a more realistic usage. Consider the following query:
MATCH p=(e:Book)-[r*]-(a) WITH REDUCE (text = "", node IN NODES(p) | text + COALESCE(node.title, " * ") + "-") as TEXT RETURN LEFT(text, LENGTH(text)-1)
This query, for each path found, returns a collection with the titles of the books found separated by a hyphen. In case a title is not found, thanks to the COALESCE
function, an asterisk is returned instead of a null
value. The last line of the query is needed to remove the last dash from the string.
RANGE
The RANGE
function is very simple; it returns a collection of integers contained in a numerical range, which is computed with the help of a given step. Consider the following query:
RETURN RANGE (1, 10, 2)
The preceding query returns [1,3,5,7,9]
. The third parameter in the function is the step parameter, and is optional. By default, it is 1. Now, consider the following query:
RETURN RANGE (1, 5)
The preceding query returns [1,2,3,4,5]
as the ranges are inclusive. Note that if you try to pass the step parameter as 0
, the query will fail with an error.
Working with strings
Cypher provides a set of functions to work with strings. They are the typical string functions provided by most SQL databases. They are as follows:
SUBSTRING
,LEFT
, andRIGHT
STR
REPLACE
Trimming functions
LOWER
andUPPER
SUBSTRING, LEFT, and RIGHT
The SUBSTRING
, LEFT
, and RIGHT
functions are used when you want the substring of a given string. The following are their uses:
SUBSTRING(source, index, length)
: This is a general function. It returns a string obtained by taking the characters of a given source string, from a given index, for a given length. If the length is not specified, it returns the string until the end.LEFT(source, length)
: This function returns a string of a given length, taking the first characters of the source string.RIGHT(source, length)
: This function returns a string of a given length, taking the last characters of the source string.
All of these functions will cause an exception if any index is negative, but won't cause any exception if the index is greater than the length of the string. In this case, they will just return an empty string.
The following query cuts the book where the titles are longer than four characters in length and append an ellipsis symbol at the end:
MATCH (a:Book) RETURN LEFT(a.title, 4) + "..."
STR
The STR
function converts its unique argument to a string. It works on numbers, nodes, relationships, and paths. Consider the following query:
MATCH p=(a:Book{title:"Ficciones"})-[*]-() RETURN STR(p)
This query returns the string representation of the path found with the MATCH
clause. The result obtained is follows:
[Node[1027]{title:"Ficciones"},:AuthorOf[277]{},Node[1026]{name:"Jorge Luis Borges"}]
Note
Conversion to integer values or to float values is not yet supported in Version 2.0.1, but two new functions (toInt
and toFloat
) were released along with the Version 2.0.2 to allow this conversion.
REPLACE
The REPLACE
function returns a string obtained by replacing all occurrences of a given string inside another string with a specified string. Consider the following query:
RETURN REPLACE("Cypher 1.9", "1.9", "2.0")
The preceding query will replace the occurrences of 1.9
in Cypher 1.9
with 2.0
, returning Cypher 2.0
. The REPLACE
function can be used to remove all occurrences of a string as well. Consider the following query:
MATCH (a:Book) RETURN REPLACE(a.title, "The ", "")
This query will return all the book titles after removing the The
string.
Trimming functions
Cypher provides three trimming functions. They are as follows:
LTRIM
: This function is used to remove whitespace characters from a given string from the leftRTRIM
: This function is used to trim a given string from the rightTRIM
: This function is used to trim a given string from both the left and the right
These three functions accept one string parameter and return the trimmed value. Consider the following query:
MATCH (a:Book) RETURN TRIM(a.title)
The rows returned by this query are the book titles in the database without whitespaces at the beginning or at the end of the string.
LOWER and UPPER
The LOWER
function returns the lowercase version of a given string, while the UPPER
function returns the uppercase version. They are often used to make a case-insensitive string comparison without using regular expressions. Consider the following query:
MATCH (a:Book) WHERE UPPER(a.title) = UPPER("Learning Cypher") RETURN a
The preceding query returns the book node that has the title Learning Cypher
without caring for the case of the test string.
Aggregation functions
We learned how to aggregate datasets and explored some aggregation functions in detail in Chapter 2, Filter, Aggregate, and Combine Results. Here, we summarize the aggregation functions and their usage. You can also find an extensive guide on the usage of groups and functions in this chapter.
COUNT
The COUNT
function is invoked with an argument, which can be an identifier or the *
symbol. If this symbol is specified, the function will count all the rows in the dataset; otherwise, it will count the number of non-null values of the identifier. Consider the following query:
MATCH (aa: Person) RETURN COUNT(a.surname)
This query returns the number of nodes with the label Person
that have the property surname
set. The COUNT
function can be invoked with the DISTINCT
clause to let it count only distinct values, as shown in the following query:
MATCH (a: Person) RETURN COUNT(DISTINCT aa.surname)
SUM
To sum the numerical values in a dataset, you can use the SUM
function. Consider the following query:
MATCH (b:Book) <-[r:Votes]- (:User) RETURN b, SUM(r.score)
This query returns the sum of the scores of the votes received by each book. The null
values are ignored.
AVG
The AVG
function computes the average of a group of number values in the dataset. Consider the following query:
MATCH (b:Book) <-[r:Votes]- (:User) RETURN b, AVG(r.score)
The rows returned by this query contain the average score of the votes received by a book. The null
values are ignored in the computation of the average.
PERCENTILEDISC and PERCENTILECONT
The PERCENTILEDISC
and PERCENTILECONT
functions compute the percentile of a value in a dataset. The difference between them is in the formula used. The PERCENTILEDISC
function uses a discrete model for interpolation, while the PERCENTILECONT
function uses a continuous model. A percentile measures the value below which a given percentage of items fall. Consider the following query:
MATCH (b:Book) <-[r:Votes]- (:User) RETURN b, PERCENTILEDISC(r.score, 0.3)
This query returns, for each book, the thirtieth percentile of the score of votes for each book. It is a measure of the score below which 30 percent of the votes fall.
Note that there is no standard definition of percentile, but three possible definitions are available. An introduction to percentile definitions can be found at http://onlinestatbook.com/2/introduction/percentiles.html.
STDEV and STDEVP
Standard deviation gives an estimate of the dispersion from the average. You should use the STDEVP
function when you are computing the standard deviation for the whole population; otherwise, the STDEV
function must be used. Consider the following query:
MATCH (b:Book) <-[r:Votes]- (:User) RETURN STDEVP(r.score)
The return value of this query is the standard deviation of the score of the votes of all users.
A simple explanation of standard deviation can be found at http://www.mathplanet.com/education/algebra-2/quadratic-functions-and-inequalities/standard-deviation-and-normal-distribution.
MIN and MAX
The MIN
and MAX
functions return the minimum and the maximum of a value in the dataset. Consider the following query:
MATCH (b:Book) <-[r:Votes]- (:User) RETURN b, MIN(r.score), MAX(r.score)
The query returns all books voted by users, each with their minimum and maximum score received.
Mathematical functions
Cypher provides a number of mathematical functions. Explaining all of them in detail is beyond the scope of this book. The following is a list of the most commonly used functions:
Function |
Return value |
---|---|
|
|
|
The absolute value of |
|
The natural exponential function of |
|
The natural logarithm of |
|
The common logarithm (base 10) of |
|
The value of e (Euler's number), the base of natural logarithm |
|
The value of pi |
|
The value of |
|
A random double value between zero and one |
|
The square root of |
|
The sine of |
|
The cosine of |
|
The tangent of |