Getting the correct results

Example graph

First we create some data to use for our examples:

CREATE (matrix:Movie {title: 'The Matrix', released: 1997})
CREATE (cloudAtlas:Movie {title: 'Cloud Atlas', released: 2012})
CREATE (forrestGump:Movie {title: 'Forrest Gump', released: 1994})
CREATE (keanu:Person {name: 'Keanu Reeves', born: 1964})
CREATE (robert:Person {name: 'Robert Zemeckis', born: 1951})
CREATE (tom:Person {name: 'Tom Hanks', born: 1956})
CREATE (tom)-[:ACTED_IN {roles: ['Forrest']}]->(forrestGump)
CREATE (tom)-[:ACTED_IN {roles: ['Zachry']}]->(cloudAtlas)
CREATE (robert)-[:DIRECTED]->(forrestGump)

This is the resulting graph:

cypher intro results01

Filtering results

So far we have matched patterns in the graph and always returned all results we found. Now we will look into options for filtering the results and only return the subset of data that we are interested in. Those filter conditions are expressed using the WHERE clause. This clause allows to use any number of boolean expressions, predicates, combined with AND, OR, XOR and NOT. The simplest predicates are comparisons; especially equality.

MATCH (m:Movie)
WHERE m.title = 'The Matrix'
Rows: 1

| m                                              |
| (:Movie {title: 'The Matrix', released: 1997}) |

The query above, using the WHERE clause, is equivalent to this query which includes the condition in the pattern matching:

MATCH (m:Movie {title: 'The Matrix'})

Cypher is designed to be flexible, so there is often more than one way to write a query.

Other options are numeric comparisons, matching regular expressions, and checking the existence of values within a list.

The WHERE clause in the following example includes a regular expression match, a greater-than comparison, and a test to see if a value exists in a list:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE =~ 'K.+' OR m.released > 2000 OR 'Neo' IN r.roles
RETURN p, r, m
Rows: 1

| p                                         | r                               | m                                               |
| (:Person {name: 'Tom Hanks', born: 1956}) | [:ACTED_IN {roles: ['Zachry']}] | (:Movie {title: 'Cloud Atlas', released: 2012}) |

An advanced aspect is that patterns can be used as predicates. Where MATCH expands the number and shape of patterns matched, a pattern predicate restricts the current result set. It only allows the paths to pass that satisfy the specified pattern. As we can expect, the use of NOT only allows the paths to pass that do not satisfy the specified pattern.

MATCH (p:Person)-[:ACTED_IN]->(m)
Rows: 2

| p                                         | m                                                |
| (:Person {name: 'Tom Hanks', born: 1956}) | (:Movie {title: 'Cloud Atlas', released: 2012})  |
| (:Person {name: 'Tom Hanks', born: 1956}) | (:Movie {title: 'Forrest Gump', released: 1994}) |

Here we find actors, because they sport an ACTED_IN relationship but then skip those that ever DIRECTED any movie.

There are more advanced ways of filtering, for example list predicates, which we will discuss later in this section.

Querying ranges of values

There are frequently queries where you want to look for data within a certain range. Date or number ranges can be used to check for events within a certain timeline, age values, or other uses.

The syntax for this criteria is very similar to SQL and other programming language logic structures for checking ranges of values. Let’s look at an example below.

MATCH (p:Person)
WHERE 3 <= p.yearsExp <= 7
Rows: 4

| p                              |
| (:Person {name: 'John'})       |
| (:Person {name: 'Sally'})      |
| (:Person {name: 'Dan'})        |
| (:Person {name: 'Jennifer'})   |

Testing if a property exists

You may only be interested if a property exists on a node or relationship. For instance, you might want to check which customers in your system have Twitter handles, so you can show relevant content. Or, you could check if the all of your employees have a start date property to verify which entities might need updated.

Remember: in Neo4j, a property only exists (is stored) if it has a value. A null property is not stored. This ensures that only valuable, necessary information is retained for your nodes and relationships.

To write this type of existence check, you simply need to use the WHERE clause and the exists() method for that property. The Cypher code is written in the block below.

//Query1: find all users who have a birthdate property
MATCH (p:Person)
WHERE exists(p.birthdate)

//Query2: find all WORKS_FOR relationships that have a startYear property
MATCH (p:Person)-[rel:WORKS_FOR]->(c:Company)
WHERE exists(rel.startYear)
RETURN p, rel, c;
Query1 Results:

cypher filter exists nodeProp

Query2 Results:

cypher filter exists relProp

Checking strings - partial values, fuzzy searches, and more

Some scenarios require query syntax that matches on partial values or broad categories within a string. To do this kind of query, you need some flexibility and options for string matching and searches. Whether you are looking for a string that starts with, ends with, or includes a certain value, Cypher offers the ability to handle it performantly and easily.

There are a few keywords in Cypher used with the WHERE clause to test string property values. The STARTS WITH keyword allows you check the value of a property that begins with the string you specify. With the CONTAINS keyword, you can check if a specified string is part of a property value. The ENDS_WITH keyword checks the end of the property string for the value you specify.

An example of each is in the Cypher block below.

//check if a property starts with 'M'
MATCH (p:Person)

//check if a property contains 'a'
MATCH (p:Person)

//check if a property ends with 'n'
MATCH (p:Person)

You can also use regular expressions to test the value of strings. For example, you could look for all the Person nodes that share a first name or you could find all the classes with a certain department code.

Let’s look at an example.

MATCH (p:Person)
WHERE =~ 'Jo.*'

Just like in SQL and other languages, you can check if a property value is a value in a list. The IN keyword allows you to specify an array of values and validate a property’s contents against each one in the list.

Here is an example:

MATCH (p:Person)
WHERE p.yearsExp IN [1, 5, 6]
RETURN, p.yearsExp

Filtering on patterns

One thing that makes graph unique is its focus on relationships. Just as you can filter queries based on node labels or properties, you can also filter results based on relationships or patterns. This allows you to test if a pattern also has a certain relationship or doesn’t, or if another pattern exists.

The Cypher code below shows how this is done.

//Query1: find which people are friends of someone who works for Neo4j
MATCH (p:Person)-[r:IS_FRIENDS_WITH]->(friend:Person)
WHERE exists((p)-[:WORKS_FOR]->(:Company {name: 'Neo4j'}))
RETURN p, r, friend;

//Query2: find Jennifer's friends who do not work for a company
MATCH (p:Person)-[r:IS_FRIENDS_WITH]->(friend:Person)
WHERE = 'Jennifer'
AND NOT exists((friend)-[:WORKS_FOR]->(:Company))
Query1 Results:

cypher filter exists ptrn

Query2 Results:

cypher filter notExists ptrn

Optional patterns

There are cases where you might want to retrieve results from patterns, even if they do not match the entire pattern or all of the criteria. This is how an outer join in SQL functions. In Cypher, you can use an OPTIONAL MATCH pattern to try to match it, but if it doesn’t find results, those rows will return null for those values.

We can see how this would look in Cypher by querying for people whose name starts with a letter and who may work for a company.

//find all people whose name starts with J and who may work for a company.
MATCH (p:Person)
OPTIONAL MATCH (p)-[:WORKS_FOR]-(other:Company)

Notice that Joe is returned because his name starts with the letter 'J', but his company name is null. That is because he does not have a WORKS_FOR relationship to a company node. Since we used optional match, his Person node is still returned from the first match, but the second match is not found, so returns null.

To see the difference, try running the query without the OPTIONAL in front of the second match. You can see that Joe’s row is no longer returned. That is because Cypher reads the statement with an AND match, so that the person must match the first criteria (name starts with 'J') and the second criteria (person works for a company).

More complex patterns

We are able to handle many simple graph queries, even at this point, but what happens when we want to extend our patterns past a single relationship? What if we wanted to know who else likes graphs besides Jennifer?

We handle this functionality and many others by simply adding on to our first pattern or matching additional patterns. Let us look at a couple of examples.

//Query1: find who likes graphs besides Jennifer
MATCH (j:Person {name: 'Jennifer'})-[r:LIKES]-(graph:Technology {type: 'Graphs'})-[r2:LIKES]-(p:Person)

//Query2: find who likes graphs besides Jennifer that she is also friends with
MATCH (j:Person {name: 'Jennifer'})-[:LIKES]->(:Technology {type: 'Graphs'})<-[:LIKES]-(p:Person),
Query1 Results:

cypher filter extPattern

Query2 Results:

cypher filter twoPattern

Notice that on the second query, we used a comma after the first MATCH line and added another pattern to match on the next line. This allows us to chain together patterns, similar to when we used the WHERE exists(<pattern>) syntax above. With this structure, though, we can add multiple different patterns and link them together, allowing us to traverse various pieces of the graph with certain patterns.

Returning results

So far, we have returned nodes, relationships and paths directly via their variables. However, the RETURN clause can return any number of expressions. But what are expressions in Cypher?

The simplest expressions are literal values. Examples of literal values are: numbers, strings, arrays (for example: [1,2,3]), and maps (for example: {name: 'Tom Hanks', born:1964, movies: ['Forrest Gump', ...], count: 13}). Individual properties of any node, relationship or map can be accessed using the dot syntax, for example: Individual elements or slices of arrays can be retrieved with subscripts, for example: names[0] and movies[1..-1]. Each function evaluation, for example: length(array), toInteger('12'), substring('2014-07-01', 0, 4) and coalesce(p.nickname, 'n/a'), is also an expression.

Predicates used in WHERE clauses count as boolean expressions.

Simple expressions can be composed and concatenated to form more complex expressions.

By default the expression itself will be used as label for the column, in many cases you want to alias that with a more understandable name using expression AS alias. The alias can be used subsequently to refer to that column.

MATCH (p:Person)
  p, AS name,
  coalesce(p.nickname, 'n/a') AS nickname,
  {name:, label: head(labels(p))} AS person
Rows: 3

| p                                               | name              | toUpper(   | nickname | person                                     |
| (:Person {name: 'Keanu Reeves', born: 1964})    | 'Keanu Reeves'    | 'KEANU REEVES'    | 'n/a'    | {name: 'Keanu Reeves', label: 'Person'}    |
| (:Person {name: 'Robert Zemeckis', born: 1951}) | 'Robert Zemeckis' | 'ROBERT ZEMECKIS' | 'n/a'    | {name: 'Robert Zemeckis', label: 'Person'} |
| (:Person {name: 'Tom Hanks', born: 1956})       | 'Tom Hanks'       | 'TOM HANKS'       | 'n/a'    | {name: 'Tom Hanks', label: 'Person'}       |

If you wish to display only unique results you can use the DISTINCT keyword after RETURN:

RETURN DISTINCT labels(n) AS Labels
Rows: 2

| Labels     |
| ['Movie']  |
| ['Person'] |

Returning unique results

Let us speak in detail on how to return unique results using DISTINCT keyword in Cypher. Some of your queries may return duplicate results due to multiple paths to the node or a node that meets multiple criteria. This redundancy can clutter results and make sifting through a long list difficult to find what you need.

To trim out duplicate entities, you can use the DISTINCT keyword.

//Query13: find people who have a twitter or like graphs or query languages
MATCH (user:Person)
WHERE user.twitter IS NOT null
WITH user
MATCH (user)-[:LIKES]-(t:Technology)
WHERE t.type IN ['Graphs','Query Languages']
Query13 Results:

cypher results distinct user

For Query13, our use case is that we are launching a new Twitter account for tips and tricks on Cypher, and we want to notify users who have a Twitter account and who like graphs or query languages. The first two lines of the query look for Person nodes who have a Twitter handle. Then, we use WITH to pass those users over to the next MATCH, where we find out if the person likes graphs or query languages. Notice that running this statement without the DISTINCT keyword results in "Melissa" shown twice. This is because she likes graphs, and she also likes query languages. When we use DISTINCT, we only retrieve unique users.

Limiting number of results

There are times where you want a sampling set or you only want to pull so many results to update or process at a time. The LIMIT keyword takes the output of the query and limits the volume returned based on the number you specify.

For instance, we can find each person’s number of friends in our graph. If our graph were thousands or millions of nodes and relationships, the number of results returned would be massive. What if we only cared about the top 3 people who had the most friends? Let’s write a query for that!

//Query14: find the top 3 people who have the most friends
MATCH (p:Person)-[r:IS_FRIENDS_WITH]-(other:Person)
RETURN, count( AS numberOfFriends
ORDER BY numberOfFriends DESC

Our query pulls persons and the friends they are connected to and returns the person name and count of their friends. We could run just that much of the query and return a messy list of names and friend counts, but we probably want to order the list based on the number of friends each person has starting with the biggest number at the top (DESC). You could also run that much of the query to see the friends and counts all in order, but we only want to pull the top 3 people with the most friends. The LIMIT pulls the top results from our ordered list.

Try mixing up the query by removing the ORDER BY and LIMIT lines and then add each one separately. Notice that only removing the ORDER BY line pulls the starting 3 values from the list, getting a random sampling of the return results.

Aggregating information

In many cases, we wish to aggregate or group the data encountered while traversing patterns in our graph. In Cypher, aggregation happens in the RETURN clause while computing the final results. Many common aggregation functions are supported, e.g. count, sum, avg, min, and max, but there are several more.

Counting the number of people in your database could be achieved by this:

MATCH (:Person)
RETURN count(*) AS people
Rows: 1

| people |
| 3      |

Note that NULL values are skipped during aggregation. For aggregating only unique values use DISTINCT, for example: count(DISTINCT role).

Aggregation works implicitly in Cypher. We specify which result columns we wish to aggregate. Cypher uses all non-aggregated columns as grouping keys.

Aggregation affects which data is still visible in ordering or later query parts.

The following statement finds out how often an actor and director have worked together:

MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie)<-[:DIRECTED]-(director:Person)
RETURN actor, director, count(*) AS collaborations
Rows: 1

| actor                                     | director                                        | collaborations |
| (:Person {name: 'Tom Hanks', born: 1956}) | (:Person {name: 'Robert Zemeckis', born: 1951}) | 1              |

The count() function in Cypher allows you to count the number of occurences of entities, relationships, or results returned.

There are two different ways you can count return results from your query.

  • The first is by using count(n) to count the number of occurences of n and does not include null values. You can specify nodes, relationships, or properties within the parentheses for Cypher to count.

  • The second way to count results is with count(*), which counts the number of result rows returned (including those with null values).

In our dataset, some of our Person nodes have a Twitter handle, but others do not. If we run the first example query below, you will see that we have the twitter property has a value for four people and is null for the other five people. The second and third queries show how to use the different count options.

//Query1: see the list of Twitter handle values for Person nodes
MATCH (p:Person)
RETURN p.twitter;
Query1 Results:

cypher agg count

//Query2: count of the non-null `twitter` property of the Person nodes
MATCH (p:Person)
RETURN count(p.twitter);
Query2 Results:

cypher agg count prop

//Query3: count on the Person nodes
MATCH (p:Person)
RETURN count(*);
Query3 Results:

cypher agg count results

Collecting aggregation

A very helpful aggregation function is collect(), which collects all the aggregated values into a list. This is very useful in many situations, since no information of details is lost while aggregating.

collect() is well-suited for retrieving typical parent-child structures, where one core entity (parent, root or head) is returned per row with all its dependent information in associated lists created with collect(). This means that there is no need to repeat the parent information for each child row, or running n+1 statements to retrieve the parent and its children individually.

The following statement could be used to retrieve the cast of each movie in our database:

MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
RETURN m.title AS movie, collect( AS cast, count(*) AS actors
Rows: 2

| movie          | cast          | actors |
| 'Forrest Gump' | ['Tom Hanks'] | 1      |
| 'Cloud Atlas'  | ['Tom Hanks'] | 1      |

The lists created by collect() can either be used from the client consuming the Cypher results, or directly within a statement with any of the list functions or predicates.

Ordering and pagination

It is common to sort and paginate after aggregating using count(x).

Ordering is done using the ORDER BY expression [ASC|DESC] clause. The expression can be any expression, as long as it is computable from the returned information.

For instance, if we return we can still ORDER BY person.age since both are accessible from the person reference. We cannot order by things that are not returned. This is especially important with aggregation and DISTINCT return values, since both remove the visibility of data that is aggregated.

Pagination is done using the SKIP {offset} and LIMIT {count} clauses.

A common pattern is to aggregate for a count (score or frequency), order by it, and only return the top-n entries.

For instance to find the most prolific actors we could do:

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
RETURN a, count(*) AS appearances
ORDER BY appearances DESC LIMIT 10
Rows: 1

| a                                         | appearances |
| (:Person {name: 'Tom Hanks', born: 1956}) | 2           |

Ordering results

Our list of potential hiring candidates from our last example might be more useful if we could order the candidates by most or least experience. Or perhaps we want to rank all of our people by age.

The ORDER BY keyword will sort the results based on the value you specify and in ascending or descending order (ascending is default). Let’s use the same queries from our example with UNWIND and see how we can order our candidates.

//Query11: for a list of techRequirements, look for people who have each skill - ordered Query9
WITH ['Graphs','Query Languages'] AS techRequirements
UNWIND techRequirements AS technology
MATCH (p:Person)-[r:LIKES]-(t:Technology {type: technology})
WITH t.type AS technology, AS personName
ORDER BY technology, personName
RETURN technology, collect(personName) AS potentialCandidates;
Query11 Results:

cypher results order names

//Query12: for numbers in a list, find candidates who have that many years of experience - ordered Query10
WITH [4, 5, 6, 7] AS experienceRange
UNWIND experienceRange AS number
MATCH (p:Person)
WHERE p.yearsExp = number
RETURN, p.yearsExp ORDER BY p.yearsExp DESC;
Query12 Results:

cypher results order experience

Notice that our first query has to order by Person name before collecting the values into a list. If you do not sort first (put the ORDER BY after the RETURN), you will sort based on the size of the list and not by the first letter of the values in the list. We also sort on two values - technology, then person. This allows us to sort our technology so that all the persons that like a technology are listed together.

You can try out the difference in sorting by both values or one value by running these queries:

//only sorted by person's name in alphabetical order
WITH ['Graphs','Query Languages'] AS techRequirements
UNWIND techRequirements AS technology
MATCH (p:Person)-[r:LIKES]-(t:Technology {type: technology})
WITH t.type AS technology, AS personName
ORDER BY personName
RETURN technology, personName;
//only sorted by technology (person names are out of order)
WITH ['Graphs','Query Languages'] AS techRequirements
UNWIND techRequirements AS technology
MATCH (p:Person)-[r:LIKES]-(t:Technology {type: technology})
WITH t.type AS technology, AS personName
ORDER BY technology
RETURN technology, personName;
//sorted by technology, then by person's name
WITH ['Graphs','Query Languages'] AS techRequirements
UNWIND techRequirements AS technology
MATCH (p:Person)-[r:LIKES]-(t:Technology {type: technology})
WITH t.type AS technology, AS personName
ORDER BY technology, personName
RETURN technology, personName;

Counting values in a list

If you have a list of values, you can also find the number of items in that list or calculate the size of an expression using the size() function. The examples below return the number of items or patterns found.

//Query5: find number of items in collected list
MATCH (p:Person)-[:IS_FRIENDS_WITH]->(friend:Person)
RETURN, size(collect( AS numberOfFriends;
Query5 Results:

cypher agg size

//Query6: find number of friends who have other friends
MATCH (p:Person)-[:IS_FRIENDS_WITH]->(friend:Person)
WHERE size((friend)-[:IS_FRIENDS_WITH]-(:Person)) > 1
RETURN, collect( AS friends, size((friend)-[:IS_FRIENDS_WITH]-(:Person)) AS numberOfFoFs;
Query6 Results:

cypher agg sizePath

Looping through list values

If you have a list that you want to inspect or separate the values, Cypher offers the UNWIND clause. This does the opposite of collect() and separates a list into individual values on separate rows.

Using UNWIND is frequently used for looping through JSON and XML objects when importing data, as well as everyday arrays and other types of lists. Let us look at a couple of examples where we assume that the technologies someone likes also mean they have some experience with each one. We are interested in hiring people who are familiar with Graphs or Query Languages, so we can write a query to find people to interview.

//Query9: for a list of techRequirements, look for people who have each skill
WITH ['Graphs','Query Languages'] AS techRequirements
UNWIND techRequirements AS technology
MATCH (p:Person)-[r:LIKES]-(t:Technology {type: technology})
RETURN t.type, collect( AS potentialCandidates;
Query9 Results:

cypher results unwind strList

//Query10: for numbers in a list, find candidates who have that many years of experience
WITH [4, 5, 6, 7] AS experienceRange
UNWIND experienceRange AS number
MATCH (p:Person)
WHERE p.yearsExp = number
RETURN, p.yearsExp;
Query10 Results:

cypher results unwind numList

Next steps

This guide has shown how to do more with Cypher by combining clauses and keywords for aggregating and returning data. We have seen how to use functions in Cypher and some of the operations offered. In the next section, we will learn how to maintain data integrity by using constraints and increase query performance with indexes.