Controlling Results Returned

About this module

You have learned how to query the database using both simple and complex patterns and you have learned how to control query processing by chaining queries using WITH. In this module, we focus on controlling how results are processed in the RETURN and WITH clauses.

At the end of this module, you will write Cypher statements to:

  • Eliminate duplication in results.

  • Order results.

  • Limit the number of results.

Example with duplicate results

You have seen a number of query results where there is duplication in the results returned. In most cases, you want to eliminate duplicated results. You do so by using the DISTINCT keyword.

Here is a simple example where duplicate data is returned. Tom Hanks both acted in and directed the movie, That Thing You Do, so the movie is returned twice in the result stream:

MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN m.title, m.released

Here is the result returned:

Duplication

Eliminating duplication

We can eliminate the duplication by specifying the DISTINCT keyword as follows:

MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released

Here is the result returned:

NoDuplication

Using DISTINCT in the RETURN clause here means that rows with identical values will not be returned.

Duplication in lists

You can also specify DISTINCT when collecting elements for a list. Here is another query where we collect the names of people who acted in, directed, or wrote movies released in 2003.

MATCH (p:Person)-[:ACTED_IN | DIRECTED | WROTE]->(m:Movie)
WHERE m.released = 2003
RETURN m.title, collect(p.name) AS credits

Here is the result returned:

DuplicationInCollection

Eliminating duplication in lists

We can eliminate the duplication by specifying the DISTINCT keyword when collecting the results:

MATCH (p:Person)-[:ACTED_IN | DIRECTED | WROTE]->(m:Movie)
WHERE m.released = 2003
RETURN m.title, collect(DISTINCT p.name) AS credits

Here is the result returned:

NoDuplicationInCollection

Using WITH and DISTINCT to eliminate duplication

Another way that you can avoid duplication is to use WITH and DISTINCT together as follows:

MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
WITH DISTINCT m
RETURN m.released, m.title

Here is the result returned:

NoDuplication2

Ordering results

If you want the results to be sorted, you specify the expression to use for the sort using the ORDER BY keyword and whether you want the order to be descending using the DESC keyword. Ascending order is the default.

In this example, we specify that the release date of the movies for Tom Hanks will be returned in descending order.

MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks' OR p.name = 'Keanu Reeves'
RETURN DISTINCT m.title, m.released ORDER BY m.released DESC

Here is the result returned:

Ordering

Ordering multiple results

Note that you can provide multiple sort expressions and the result will be sorted in that order. Here we want the rows to be sorted by the release date, descending, and then by title:

MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks' OR p.name = 'Keanu Reeves'
RETURN DISTINCT m.title, m.released ORDER BY m.released DESC , m.title
Ordering2

There is no limit to the number of properties you can order by.

Limiting the number of results

Although you can filter queries to reduce the number of results returned, you may also want to limit the number of results returned. This is useful if you have very large result sets and you only need to see the beginning or end of a set of ordered results. You can use the LIMIT keyword to specify the number of results returned.

Suppose you want to see the titles of the ten most recently released movies. You could do so as follows where you limit the number of results using the LIMIT keyword as follows:

MATCH (m:Movie)
RETURN m.title as title, m.released as year ORDER BY m.released DESC LIMIT 10

Here is the result returned:

Limit

Limiting number of intermediate results

Furthermore, you can use the LIMIT keyword with the WITH clause to limit intermediate results. A best practice is to limit the number of rows processed before they are collected. Here is an example where we want to limit the number of actors collected in this query by the number 6:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH m, p LIMIT 6
RETURN collect(p.name), m.title

Here is the result returned:

Limit2

At most 6 nodes that are in the pattern of the MATCH clause are retrieved. Two rows are returned. Notice that the Keanu Reeves value is returned in the second row because that node represents the upper limit of the number of Person nodes retrieved in this query pattern.

Another example using LIMIT

Here is another example of limiting results. Suppose we want to retrieve five movies and for each movie, return the :ACTED_IN path to at most two actors. Here is one way to perform this query:

MATCH (m:Movie)
WITH m LIMIT 5
MATCH path = (m)<-[:ACTED_IN]-(:Person)
WITH m, collect(path) AS paths
RETURN m, paths[0..2]
Limit3

Note that this display in Neo4j Browser is with Connect result nodes unchecked.

Alternative to LIMIT

Another way that you can limit results is to collect or count them during the query and use the count to end the query processing. In this example, we count the number of movies during the query and we return the results once we have reached 5 movies. That is, the question we are asking is, "What actors acted in exactly five movies?".

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(*) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, numMovies, movies

An alternative to the above code is:

MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, collect(m.title) AS movies
WHERE size(movies) = 5
RETURN a.name, movies

Here is the result returned:

Count5Movies

Exercise 8: Controlling results returned

In the query edit pane of Neo4j Browser, execute the browser command:

:play 4.0-intro-neo4j-exercises

and follow the instructions for Exercise 8.

This exercise has 5 steps. Estimated time to complete: 15 minutes.

Check your understanding

Question 1

This code returns the titles of all movies that have been reviewed. Multiple people can review a movie. How can you change this code so that a movie title will only be returned once?

MATCH (m:Movie)<-[:REVIEWED]-()
RETURN  m.title

Select the correct answers.

  • MATCH (m:Movie)←[:REVIEWED]-() RETURN DISTINCT m.title

  • MATCH (m:Movie)←[:REVIEWED]-() RETURN UNIQUE m.title

  • MATCH (m:Movie)←[:REVIEWED]-() WITH DISTINCT m RETURN m.title

  • MATCH (m:Movie)←[:REVIEWED]-() WITH UNIQUE m RETURN m.title

Question 2

How many property values can you order in the returned result?

Select the correct answer.

  • One

  • As many as you need to

  • Two

  • Three

Question 3

We want to retrieve the names of the five oldest persons in our dataset. What code will do this?

Select the correct answer.

  • MATCH (p:Person)-[:ACTED_IN]→() WITH p LIMIT 5 RETURN DISTINCT p.name, p.born ORDER BY p.born

  • MATCH (p:Person) WITH p LIMIT 5 RETURN DISTINCT p.name, p.born ORDER BY p.born

  • MATCH (p:Person)-[:ACTED_IN]→() RETURN DISTINCT p.name, p.born ORDER BY p.born LIMIT 5

  • MATCH (p:Person) RETURN DISTINCT p.name, p.born ORDER BY p.born LIMIT 5

Summary

You can now write Cypher statements to:

  • Eliminate duplication in results returned.

  • Order results returned.

  • Limit the number of results returned.