Using WHERE to Filter Queries

About this module

You have learned how to query nodes and relationships in a graph using simple patterns. You learned how to use node labels, relationship types, and properties to filter your queries. Cypher provides a rich set of MATCH clauses and keywords you can use to get more out of your queries.

At the end of this module, you will be able to write Cypher WHERE clauses for testing:

  • Equality.

  • Multiple values.

  • Ranges.

  • Labels.

  • Existence of a property.

  • String values.

  • Regular expressions.

  • Patterns in the graph.

  • Inclusion in a list.

Testing equality

You have learned how to specify values for properties of nodes and relationships to filter what data is returned from the MATCH and RETURN clauses. The format for filtering you have learned thus far only tests equality, where you must specify values for the properties to test with. What if you wanted more flexibility about how the query is filtered? For example, you want to retrieve all movies released after 2000, or retrieve all actors born after 1970 who acted in movies released before 1995. Most applications need more flexibility in how data is filtered.

The most common clause you use to filter queries is the WHERE clause that typically follows a MATCH clause. In the WHERE clause, you can place conditions that are evaluated at runtime to filter the query.

Previously, you learned to write simple query as follows:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie {released: 2008})
RETURN p, m

Here is a way you specify the same query using the WHERE clause:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008
RETURN p, m

Testing multiple values

In this example, you can only refer to named nodes or relationships in a WHERE clause so remember that you must specify a variable for any node or relationship you are testing in the WHERE clause. The benefit of using a WHERE clause is that you can specify potentially complex conditions for the query.

For example:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released = 2008 OR m.released = 2009
RETURN p, m

Example: Testing ranges

Not only can the equality = be tested, but you can test ranges, existence, strings, as well as specify logical operations during the query.

Here is an example of specifying a range for filtering the query:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.released >= 2003 AND m.released <= 2004
RETURN p.name, m.title, m.released

Here is the result:

MoviesBetween2003And2004

Example: Specifying a range differently

You can also specify the same query as:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE 2003 <= m.released <= 2004
RETURN p.name, m.title, m.released

You can specify conditions in a WHERE clause that return a value of true or false (for example predicates). For testing numeric values, you use the standard numeric comparison operators. Each condition can be combined for runtime evaluation using the boolean operators AND, OR, XOR, and NOT. There are a number of numeric functions you can use in your conditions. See the Neo4j Cypher Manual’s section Mathematical Functions for more information.

A special condition in a query is when the retrieval returns an unknown value called null. Read the Neo4j Cypher Manual’s section Working with null to understand how null values are used at runtime.

Testing labels

Thus far, you have used the node labels to filter queries in a MATCH clause. You can filter node labels in the WHERE clause also:

For example, these two Cypher queries:

MATCH (p:Person)
RETURN p.name
MATCH (p:Person)-[:ACTED_IN]->(:Movie {title: 'The Matrix'})
RETURN p.name

can be rewritten using WHERE clauses as follows:

MATCH (p)
WHERE p:Person
RETURN p.name
MATCH (p)-[:ACTED_IN]->(m)
WHERE p:Person AND m:Movie AND m.title='The Matrix'
RETURN p.name

Here is the result of running this second query:

TestLabels

Not all node labels need to be tested during a query. If your graph has multiple labels for the same node, filtering it by the node label will provide better query performance.

Testing existence of a property

Recall that a property is associated with a particular node or relationship. A property is not associated with a node with a particular label or relationship type. In one of our queries earlier, we saw that the movie "Something’s Gotta Give" is the only movie in the Movie database that does not have a tagline property. Suppose we only want to return the movies that the actor, Jack Nicholson acted in with the condition that they must all have a tagline.

Here is the query to retrieve the specified movies where we test the existence of the tagline property:

MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE p.name='Jack Nicholson' AND exists(m.tagline)
RETURN m.title, m.tagline

Here is the result:

JackNicholsonMoviesWithTaglines
In Neo4j 4.3, exists() for property values has been deprecated and may not work in future releases, but it still works in 4.3. In Neo4j 4.3 and later you can use AND m.tagline IS NOT NULL

Testing strings

Cypher has a set of string-related keywords that you can use in your WHERE clauses to test string property values. You can specify STARTS WITH, ENDS WITH, and CONTAINS.

For example, to find all actors in the Movie database whose first name is Michael, you would write:

MATCH (p:Person)-[:ACTED_IN]->()
WHERE p.name STARTS WITH 'Michael'
RETURN p.name

Here is the result:

ActorsNamedMichael

String comparisons are case-sensitive

Note that the comparison of strings is case-sensitive. There are a number of string-related functions you can use to further test strings. For example, if you want to test a value, regardless of its case, you could call the toLower() function to convert the string to lower case before it is compared.

MATCH (p:Person)-[:ACTED_IN]->()
WHERE toLower(p.name) STARTS WITH 'michael'
RETURN p.name
In this example where we are converting a property to lower case, if an index has been created for this property, it will not be used at runtime.

See the String functions section of the Neo4j Cypher Manual for more information. It is sometimes useful to use the built-in string functions to modify the data that is returned in the query in the RETURN clause.

Testing with regular expressions

If you prefer, you can test property values using regular expressions. You use the syntax =~ to specify the regular expression you are testing with. Here is an example where we test the name of the Person using a regular expression to retrieve all Person nodes with a name property that begins with 'Tom':

MATCH (p:Person)
WHERE p.name =~'Tom.*'
RETURN p.name

Here is the result:

MatchTomsRegex
If you specify a regular expression. The index will never be used. In addition, the property value must fully match the regular expression.

Example: Testing with patterns - 1

Sometimes during a query, you may want to perform additional filtering using the relationships between nodes being visited during the query. For example, during retrieval, you may want to exclude certain paths traversed. You can specify a NOT specifier on a pattern in a WHERE clause.

Here is an example where we want to return all Person nodes of people who wrote movies:

MATCH (p:Person)-[:WROTE]->(m:Movie)
RETURN p.name, m.title

Here is the result:

WroteMovies

Example: Testing with patterns - 2

Next, we modify this query to exclude people who directed that particular movie:

MATCH (p:Person)-[:WROTE]->(m:Movie)
WHERE NOT exists( (p)-[:DIRECTED]->(m) )
RETURN p.name, m.title

Here is the result:

WroteMoviesNotDirected
exists() for patterns has not been deprecated in 4.3. Another way that you can specify this clause is WHERE NOT ( (p)-[:DIRECTED]→(m) ).

Example: Testing with patterns - 3

Here is another example where we want to find Gene Hackman and the movies that he acted in with another person who also directed the movie.

MATCH (gene:Person)-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(other:Person)
WHERE gene.name= 'Gene Hackman'
AND exists( (other)-[:DIRECTED]->(m) )
RETURN  gene, other, m

Here is the result:

ExistsPattern

Testing with list values

If you have a set of values you want to test with, you can place them in a list or you can test with an existing list in the graph. A Cypher list is a comma-separated set of values within square brackets.

You can define the list in the WHERE clause. During the query, the graph engine will compare each property with the values IN the list. You can place either numeric or string values in the list, but typically, elements of the list are of the same type of data. If you are testing with a property of a string type, then all the elements of the list will be strings.

In this example, we only want to retrieve Person nodes of people born in 1965 or 1970:

MATCH (p:Person)
WHERE p.born IN [1965, 1970]
RETURN p.name as name, p.born as yearBorn

Here is the result:

UsingIN

Testing list values in the graph

You can also compare a value to an existing list in the graph.

We know that the :ACTED_IN relationship has a property, roles that contains the list of roles an actor had in a particular movie they acted in. Here is the query we write to return the name of the actor who played Neo in the movie The Matrix:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE  'Neo' IN r.roles AND m.title='The Matrix'
RETURN p.name

Here is the result:

UsingINRoles

In the course Creating Nodes and Relationships in Neo4j 4.x, you will learn how to create lists from your queries by aggregating data in the graph.

There are a number of syntax elements of Cypher that we have not covered in this training. For example, you can specify CASE logic in your conditional testing for your WHERE clauses. You can learn more about these syntax elements in the Neo4j Cypher Manual and the Cypher Refcard.

Exercise 4: Filtering queries using the WHERE clause

In the query edit pane of Neo4j Browser, execute the browser command:

:play 4.0-intro-neo4j-exercises

and follow the instructions for Exercise 4.

This exercise has 13 steps. Estimated time to complete: 30 minutes.

Check your understanding

Question 1

Suppose you want to add a WHERE clause at the end of this statement to filter the results retrieved.

MATCH (p:Person)-[rel]->(m:Movie)<-[:PRODUCED]-(:Person)

What variables, can you test in the WHERE clause?

Select the correct answers.

  • p

  • rel

  • m

  • PRODUCED

Question 2

Suppose you want to retrieve all movies that have a released property value that is 2000, 2002, 2004, 2006, or 2008. Here is an incomplete Cypher example to return the title property values of all movies released in these years. What keyword do you specify for XX?

MATCH (m:Movie)
WHERE m.released XX [2000, 2002, 2004, 2006, 2008]
RETURN m.title

Select the correct answer.

  • CONTAINS

  • IN

  • IS

  • EQUALS

Question 3

We want a query that returns the names of any people who both acted in and wrote the same movie. What query will retrieve this data?

Select the correct answer.

  • MATCH (p:Person) WHERE (p)-[:WROTE]-(m) AND (p)-[:ACTED_IN]-(m) RETURN p.name, m.title

  • MATCH (p:Person)-[:ACTED_IN]→(m:Movie) WHERE (p)-[:WROTE]-(m) RETURN p.name, m.title

  • MATCH (p:Person)-[:ACTED_IN | WROTE]→(m:Movie) RETURN p.name, m.title

  • MATCH (p:Person)-[:ACTED_IN]→(m:Movie)←[WROTE]-(p) RETURN p.name, m.title

Summary

You can now write Cypher WHERE clauses to test:

  • Equality.

  • Multiple values.

  • Ranges.

  • Labels.

  • Existence of a property.

  • String values.

  • Regular expressions.

  • Patterns in the graph.

  • Inclusion in a list.