Controlling Results Returned
About this module
You have learned how to query the database using both simple and complex patterns and you have learned how to control query processing by chaining queries using WITH
.
In this module, we focus on controlling how results are processed in the RETURN
and WITH
clauses.
At the end of this module, you will write Cypher statements to:
-
Eliminate duplication in results.
-
Order results.
-
Limit the number of results.
Example with duplicate results
You have seen a number of query results where there is duplication in the results returned.
In most cases, you want to eliminate duplicated results.
You do so by using the DISTINCT
keyword.
Here is a simple example where duplicate data is returned. Tom Hanks both acted in and directed the movie, That Thing You Do, so the movie is returned twice in the result stream:
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN m.title, m.released
Here is the result returned:
Eliminating duplication
We can eliminate the duplication by specifying the DISTINCT
keyword as follows:
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks'
RETURN DISTINCT m.title, m.released
Here is the result returned:
Using DISTINCT
in the RETURN
clause here means that rows with identical values will not be returned.
Duplication in lists
You can also specify DISTINCT
when collecting elements for a list.
Here is another query where we collect the names of people who acted in, directed, or wrote movies released in 2003.
MATCH (p:Person)-[:ACTED_IN | DIRECTED | WROTE]->(m:Movie)
WHERE m.released = 2003
RETURN m.title, collect(p.name) AS credits
Here is the result returned:
Eliminating duplication in lists
We can eliminate the duplication by specifying the DISTINCT
keyword when collecting the results:
MATCH (p:Person)-[:ACTED_IN | DIRECTED | WROTE]->(m:Movie)
WHERE m.released = 2003
RETURN m.title, collect(DISTINCT p.name) AS credits
Here is the result returned:
Ordering results
If you want the results to be sorted, you specify the expression to use for the sort using the ORDER BY
keyword and whether you want the order to be descending using the DESC
keyword.
Ascending order is the default.
In this example, we specify that the release date of the movies for Tom Hanks will be returned in descending order.
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks' OR p.name = 'Keanu Reeves'
RETURN DISTINCT m.title, m.released ORDER BY m.released DESC
Here is the result returned:
Ordering multiple results
Note that you can provide multiple sort expressions and the result will be sorted in that order. Here we want the rows to be sorted by the release date, descending, and then by title:
MATCH (p:Person)-[:DIRECTED | ACTED_IN]->(m:Movie)
WHERE p.name = 'Tom Hanks' OR p.name = 'Keanu Reeves'
RETURN DISTINCT m.title, m.released ORDER BY m.released DESC , m.title
There is no limit to the number of properties you can order by.
Limiting the number of results
Although you can filter queries to reduce the number of results returned, you may also want to limit the number of results returned.
This is useful if you have very large result sets and you only need to see the beginning or end of a set of ordered results.
You can use the LIMIT
keyword to specify the number of results returned.
Suppose you want to see the titles of the ten most recently released movies.
You could do so as follows where you limit the number of results using the LIMIT
keyword as follows:
MATCH (m:Movie)
RETURN m.title as title, m.released as year ORDER BY m.released DESC LIMIT 10
Here is the result returned:
Limiting number of intermediate results
Furthermore, you can use the LIMIT
keyword with the WITH
clause to limit intermediate results.
A best practice is to limit the number of rows processed before they are collected.
Here is an example where we want to limit the number of actors collected in this query by the number 6:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH m, p LIMIT 6
RETURN collect(p.name), m.title
Here is the result returned:
At most 6 nodes that are in the pattern of the MATCH
clause are retrieved. Two rows are returned.
Notice that the Keanu Reeves value is returned in the second row because that node represents the upper limit of the number of Person nodes retrieved in this query pattern.
Another example using LIMIT
Here is another example of limiting results. Suppose we want to retrieve five movies and for each movie, return the :ACTED_IN path to at most two actors. Here is one way to perform this query:
MATCH (m:Movie)
WITH m LIMIT 5
MATCH path = (m)<-[:ACTED_IN]-(:Person)
WITH m, collect(path) AS paths
RETURN m, paths[0..2]
Note that this display in Neo4j Browser is with Connect result nodes unchecked.
Alternative to LIMIT
Another way that you can limit results is to collect or count them during the query and use the count to end the query processing. In this example, we count the number of movies during the query and we return the results once we have reached 5 movies. That is, the question we are asking is, "What actors acted in exactly five movies?".
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, count(*) AS numMovies, collect(m.title) AS movies
WHERE numMovies = 5
RETURN a.name, numMovies, movies
An alternative to the above code is:
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WITH a, collect(m.title) AS movies
WHERE size(movies) = 5
RETURN a.name, movies
Here is the result returned:
Exercise 8: Controlling results returned
In the query edit pane of Neo4j Browser, execute the browser command:
:play 4.0-intro-neo4j-exercises
and follow the instructions for Exercise 8.
This exercise has 5 steps. Estimated time to complete: 15 minutes. |
Check your understanding
Question 1
This code returns the titles of all movies that have been reviewed. Multiple people can review a movie. How can you change this code so that a movie title will only be returned once?
MATCH (m:Movie)<-[:REVIEWED]-()
RETURN m.title
Select the correct answers.
-
MATCH (m:Movie)←[:REVIEWED]-()
RETURN DISTINCT m.title
-
MATCH (m:Movie)←[:REVIEWED]-()
RETURN UNIQUE m.title
-
MATCH (m:Movie)←[:REVIEWED]-()
WITH DISTINCT m
RETURN m.title
-
MATCH (m:Movie)←[:REVIEWED]-()
WITH UNIQUE m
RETURN m.title
Question 2
How many property values can you order in the returned result?
Select the correct answer.
-
One
-
As many as you need to
-
Two
-
Three
Question 3
We want to retrieve the names of the five oldest persons in our dataset. What code will do this?
Select the correct answer.
-
MATCH (p:Person)-[:ACTED_IN]→()
WITH p LIMIT 5
RETURN DISTINCT p.name, p.born ORDER BY p.born
-
MATCH (p:Person)
WITH p LIMIT 5
RETURN DISTINCT p.name, p.born ORDER BY p.born
-
MATCH (p:Person)-[:ACTED_IN]→()
RETURN DISTINCT p.name, p.born ORDER BY p.born LIMIT 5
-
MATCH (p:Person)
RETURN DISTINCT p.name, p.born ORDER BY p.born LIMIT 5
Summary
You can now write Cypher statements to:
-
Eliminate duplication in results returned.
-
Order results returned.
-
Limit the number of results returned.
Need help? Ask in the Neo4j Community