Using WHERE to Filter Queries
About this module
You have learned how to query nodes and relationships in a graph using simple patterns.
You learned how to use node labels, relationship types, and properties to filter your queries.
Cypher provides a rich set of MATCH
clauses and keywords you can use to get more out of your queries.
At the end of this module, you will be able to write Cypher WHERE
clauses for testing:
-
Equality.
-
Multiple values.
-
Ranges.
-
Labels.
-
Existence of a property.
-
String values.
-
Regular expressions.
-
Patterns in the graph.
-
Inclusion in a list.
Testing equality
You have learned how to specify values for properties of nodes and relationships to filter what data is returned from the MATCH
and RETURN
clauses.
The format for filtering you have learned thus far only tests equality, where you must specify values for the properties to test with.
What if you wanted more flexibility about how the query is filtered?
For example, you want to retrieve all movies released after 2000, or retrieve all actors born after 1970 who acted in movies released before 1995.
Most applications need more flexibility in how data is filtered.
The most common clause you use to filter queries is the WHERE
clause that typically follows a MATCH
clause.
In the WHERE
clause, you can place conditions that are evaluated at runtime to filter the query.
Previously, you learned to write simple query as follows:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie {released: 2008})
RETURN p, m
Here is a way you specify the same query using the WHERE
clause:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008
RETURN p, m
Testing multiple values
In this example, you can only refer to named nodes or relationships in a WHERE
clause so remember that you must specify a variable for any node or relationship you are testing in the WHERE
clause.
The benefit of using a WHERE
clause is that you can specify potentially complex conditions for the query.
For example:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008 OR m.released = 2009
RETURN p, m
Example: Testing ranges
Not only can the equality =
be tested, but you can test ranges, existence, strings, as well as specify logical operations during the query.
Here is an example of specifying a range for filtering the query:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released >= 2003 AND m.released <= 2004
RETURN p.name, m.title, m.released
Here is the result:
Example: Specifying a range differently
You can also specify the same query as:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2003 <= m.released <= 2004
RETURN p.name, m.title, m.released
You can specify conditions in a WHERE
clause that return a value of true
or false
(for example predicates).
For testing numeric values, you use the standard numeric comparison operators. Each condition can be combined for runtime evaluation using the boolean operators AND
, OR
, XOR
, and NOT
.
There are a number of numeric functions you can use in your conditions.
See the Neo4j Cypher Manual’s section Mathematical Functions for more information.
A special condition in a query is when the retrieval returns an unknown value called null
.
Read the Neo4j Cypher Manual’s section Working with null to understand how null
values are used at runtime.
Testing labels
Thus far, you have used the node labels to filter queries in a MATCH
clause.
You can filter node labels in the WHERE
clause also:
For example, these two Cypher queries:
MATCH (p:Person)
RETURN p.name
MATCH (p:Person)-[:ACTED_IN]->(:Movie {title: 'The Matrix'})
RETURN p.name
can be rewritten using WHERE
clauses as follows:
MATCH (p)
WHERE p:Person
RETURN p.name
MATCH (p)-[:ACTED_IN]->(m)
WHERE p:Person AND m:Movie AND m.title='The Matrix'
RETURN p.name
Here is the result of running this second query:
Not all node labels need to be tested during a query. If your graph has multiple labels for the same node, filtering it by the node label will provide better query performance.
Testing existence of a property
Recall that a property is associated with a particular node or relationship. A property is not associated with a node with a particular label or relationship type. In one of our queries earlier, we saw that the movie "Something’s Gotta Give" is the only movie in the Movie database that does not have a tagline property. Suppose we only want to return the movies that the actor, Jack Nicholson acted in with the condition that they must all have a tagline.
Here is the query to retrieve the specified movies where we test the existence of the tagline property:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND exists(m.tagline)
RETURN m.title, m.tagline
Here is the result:
In Neo4j 4.3, exists() for property values has been deprecated and may not work in future releases, but it still works in 4.3. In Neo4j 4.3 and later you can use AND m.tagline IS NOT NULL
|
Testing strings
Cypher has a set of string-related keywords that you can use in your WHERE
clauses to test string property values.
You can specify STARTS WITH
, ENDS WITH
, and CONTAINS
.
For example, to find all actors in the Movie database whose first name is Michael, you would write:
MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name
Here is the result:
String comparisons are case-sensitive
Note that the comparison of strings is case-sensitive.
There are a number of string-related functions you can use to further test strings.
For example, if you want to test a value, regardless of its case, you could call the toLower()
function to convert the string to lower case before it is compared.
MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name
In this example where we are converting a property to lower case, if an index has been created for this property, it will not be used at runtime. |
See the String functions section of the Neo4j Cypher Manual for more information. It is sometimes useful to use the built-in string functions to modify the data that is returned in the query in the RETURN clause.
Testing with regular expressions
If you prefer, you can test property values using regular expressions.
You use the syntax =~
to specify the regular expression you are testing with.
Here is an example where we test the name of the Person using a regular expression to retrieve all Person nodes with a name property that begins with 'Tom':
MATCH (p:Person)
WHERE p.name =~'Tom.*'
RETURN p.name
Here is the result:
If you specify a regular expression. The index will never be used. In addition, the property value must fully match the regular expression. |
Example: Testing with patterns - 1
Sometimes during a query, you may want to perform additional filtering using the relationships between nodes being visited during the query. For example, during retrieval, you may want to exclude certain paths traversed.
You can specify a NOT
specifier on a pattern in a WHERE
clause.
Here is an example where we want to return all Person nodes of people who wrote movies:
MATCH (p:Person)-[:WROTE]->(m:Movie)
RETURN p.name, m.title
Here is the result:
Example: Testing with patterns - 2
Next, we modify this query to exclude people who directed that particular movie:
MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title
Here is the result:
exists() for patterns has not been deprecated in 4.3. Another way that you can specify this clause is WHERE NOT ( (p)-[:DIRECTED]→(m) ) .
|
Example: Testing with patterns - 3
Here is another example where we want to find Gene Hackman and the movies that he acted in with another person who also directed the movie.
MATCH (gene:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(other:Person)
WHERE gene.name= 'Gene Hackman'
AND exists( (other)-[:DIRECTED]->(m) )
RETURN gene, other, m
Here is the result:
Testing with list values
If you have a set of values you want to test with, you can place them in a list or you can test with an existing list in the graph. A Cypher list is a comma-separated set of values within square brackets.
You can define the list in the WHERE
clause.
During the query, the graph engine will compare each property with the values IN
the list.
You can place either numeric or string values in the list, but typically, elements of the list are of the same type of data.
If you are testing with a property of a string type, then all the elements of the list will be strings.
In this example, we only want to retrieve Person nodes of people born in 1965 or 1970:
MATCH (p:Person)
WHERE p.born IN [1965, 1970]
RETURN p.name as name, p.born as yearBorn
Here is the result:
Testing list values in the graph
You can also compare a value to an existing list in the graph.
We know that the :ACTED_IN relationship has a property, roles that contains the list of roles an actor had in a particular movie they acted in. Here is the query we write to return the name of the actor who played Neo in the movie The Matrix:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE 'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name
Here is the result:
In the course Creating Nodes and Relationships in Neo4j 4.x, you will learn how to create lists from your queries by aggregating data in the graph.
There are a number of syntax elements of Cypher that we have not covered in this training. For example, you can specify CASE logic in your conditional testing for your WHERE clauses.
You can learn more about these syntax elements in the Neo4j Cypher Manual and the Cypher Refcard.
|
Exercise 4: Filtering queries using the WHERE
clause
In the query edit pane of Neo4j Browser, execute the browser command:
:play 4.0-intro-neo4j-exercises
and follow the instructions for Exercise 4.
This exercise has 13 steps. Estimated time to complete: 30 minutes. |
Check your understanding
Question 1
Suppose you want to add a WHERE
clause at the end of this statement to filter the results retrieved.
MATCH (p:Person)-[rel]->(m:Movie)<-[:PRODUCED]-(:Person)
What variables, can you test in the WHERE
clause?
Select the correct answers.
-
p
-
rel
-
m
-
PRODUCED
Question 2
Suppose you want to retrieve all movies that have a released property value that is 2000, 2002, 2004, 2006, or 2008. Here is an incomplete Cypher example to return the title property values of all movies released in these years. What keyword do you specify for XX?
MATCH (m:Movie)
WHERE m.released XX [2000, 2002, 2004, 2006, 2008]
RETURN m.title
Select the correct answer.
-
CONTAINS
-
IN
-
IS
-
EQUALS
Question 3
We want a query that returns the names of any people who both acted in and wrote the same movie. What query will retrieve this data?
Select the correct answer.
-
MATCH (p:Person) WHERE (p)-[:WROTE]-(m) AND (p)-[:ACTED_IN]-(m) RETURN p.name, m.title
-
MATCH (p:Person)-[:ACTED_IN]→(m:Movie) WHERE (p)-[:WROTE]-(m) RETURN p.name, m.title
-
MATCH (p:Person)-[:ACTED_IN | WROTE]→(m:Movie) RETURN p.name, m.title
-
MATCH (p:Person)-[:ACTED_IN]→(m:Movie)←[WROTE]-(p) RETURN p.name, m.title
Summary
You can now write Cypher WHERE
clauses to test:
-
Equality.
-
Multiple values.
-
Ranges.
-
Labels.
-
Existence of a property.
-
String values.
-
Regular expressions.
-
Patterns in the graph.
-
Inclusion in a list.
Need help? Ask in the Neo4j Community