Online Course Introduction to Neo4j 4.0 Neo4j is a Graph Database The Neo4j Graph Platform Introduction to Cypher Using WHERE to Filter Queries Working with Patterns in Queries Working with Cypher Data Controlling the Query Chain Controlling Results Returned Creating… Read more →

# Working with Patterns in Queries

You have learned how to query nodes and relationships in a graph using simple patterns and also how to use the `WHERE` clause for filtering queries.

At the end of this module, you should be able to write Cypher statements to:

• Specify multiple `MATCH` patterns.
• Specify multiple `MATCH` clauses.
• Specify varying length paths.
• Return a subgraph.
• Specify `OPTIONAL` in a query.

### Traversal in a MATCH clause

Suppose we want to find all of the followers of people who reviewed the movie, The Replacements.

Here is the query to do this:

``````MATCH (follower:Person)-[:FOLLOWS]->(reviewer:Person)-[:REVIEWED]->(m:Movie)
WHERE m.title = 'The Replacements'
RETURN follower.name, reviewer.name``````

Here is the result:

Here is the traversal that the graph engine performed. It first found the movie, The Replacements. Then it found all Person nodes that reviewed that movie, Angela, Jessica, and James. Then it found all Person nodes who follow the people who reviewed the movie, Paul, Angela, and James. In all, six relationships were traversed.

### Specifying multiple patterns in a `MATCH`

Up until now, you have specified a single `MATCH` pattern in a query with filtering in a `WHERE` clause. You can specify multiple patterns in a `MATCH` clause.

Suppose we want to write queries that focus on movies released in the year 2000. Here are the nodes and relationships for these movies:

This `MATCH` clause includes a pattern specified by two paths separated by a comma:

``````MATCH (a:Person)-[:ACTED_IN]->(m:Movie),
(m)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
RETURN a.name, m.title, d.name``````

It returns a Person nodes for people who acted in these three movies and using that same movie node,m it retrieves the Person node who is the director for that movie, m.

Here is the result of executing this query:

It returns 15 rows, one for each actor with the associated movie title and name of the director for that particular movie. When multiple patterns are specified in a `MATCH` clause, no relationship is traversed more than one time.

### Specifying a single pattern

However, a better way to write this same query would be:

``````MATCH (a:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person)
WHERE m.released = 2000
RETURN a.name, m.title, d.name``````

There are, however, some queries where you will need to specify two or more patterns. Multiple patterns are used when a query is complex and cannot be satisfied with a single pattern. This is useful when you are looking for a specific node in the graph and want to connect it to a different node. You will learn about creating nodes and relationships later in this training.

### Example: Using two patterns in a `MATCH`

Here are some examples of specifying two paths in a `MATCH` clause.

In the first example, we want the actors that worked with Keanu Reeves to meet Hugo Weaving, who has worked with Keanu Reeves. Here we retrieve the actors who acted in the same movies as Keanu Reeves, but not when Hugo Weaving acted in the same movie. To do this, we specify two paths for the `MATCH`:

``````MATCH (keanu:Person)-[:ACTED_IN]->(movie:Movie)<-[:ACTED_IN]-(n:Person),
(hugo:Person)
WHERE keanu.name='Keanu Reeves' AND
hugo.name='Hugo Weaving'
AND NOT (hugo)-[:ACTED_IN]->(movie)
RETURN n.name``````

When you perform this type of query, you may see a warning in the query edit pane stating that the pattern represents a cartesian product and may require a lot of resources to perform the query. You should only perform these types of queries if you know the data well and the implications of doing the query.

If you click the warning symbol in the top left corner, it produces an explanation result pane.

Here is the result of executing this query:

### Example: Two patterns in a `MATCH` required

Here is another example where two patterns are necessary.

Suppose we want to retrieve the movies that Meg Ryan acted in and their respective directors, as well as the other actors that acted in these movies. Here is the query to do this:

``````MATCH (meg:Person)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person),
(other:Person)-[:ACTED_IN]->(m)
WHERE meg.name = 'Meg Ryan'
RETURN m.title as movie, d.name AS director , other.name AS `co-actors```````

Here is the result returned:

### Traversal with patterns

During a query, you want to minimize the number of paths traversed. In some cases, however, you can only retrieve the nodes, relationships, or paths of interest using multiple patterns or even multiple `MATCH` clauses.

Here is an example query where multiple `MATCH` clauses are used:

``````MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie)
MATCH (actor:Person)-[:ACTED_IN]->(m)
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name``````

The first `MATCH` clause retrieves Val Kilmer pointing to the movie, Top Gun using the ACTED_IN relationship. The anchor of this `MATCH` clause is the Val Kilmer Person node. The second `MATCH` clause retrieves all Person nodes that have the ACTED_IN relationship with the movie, Top Gun. The anchor of the `MATCH` clause is the Top Gun _Movie_node.

When the query engine traverses the graph for these `MATCH` clauses, we see that the ACTED_IN relationship is traversed twice.

Here is the result returned:

### Traversal: Multiple patterns in a `MATCH` clause

Here is the same example where multiple patterns are specified in a single `MATCH` clause:

``````MATCH (valKilmer:Person)-[:ACTED_IN]->(m:Movie),
(actor:Person)-[:ACTED_IN]->(m)
WHERE valKilmer.name = 'Val Kilmer'
RETURN m.title as movie , actor.name``````

The `MATCH` clause retrieves the Val Kilmer node and uses the ACTED_IN relationship to retrieve the Top Gun node, then it uses the movie node to retrieve all actors. With this scenario, the ACTED_IN relationship is only traversed once. We already know the Person node for Val Kilmer so we need not return it.

The result returned is smaller because it does not include the Val Kilmer node.

A best practice is to traverse as few nodes as possible so in this example, using multiple `MATCH` patterns is best.

### Specifying varying length paths

Any graph that represents social networking, trees, or hierarchies will most likely have multiple paths of varying lengths. Think of the connected relationship in LinkedIn and how connections are made by people connected to more people. The Movie database for this training does not have much depth of relationships, but it does have the :FOLLOWS relationship that you learned about earlier:

You write a `MATCH` clause where you want to find all of the followers of the followers of a Person by specifying a numeric value for the number of hops in the path. Here is an example where we want to retrieve all Person nodes that are exactly two hops away:

``````MATCH (follower:Person)-[:FOLLOWS*2]->(p:Person)
WHERE follower.name = 'Paul Blythe'
RETURN p.name``````

Here is the result returned:

If we had specified `[:FOLLOWS*]` rather than `[:FOLLOWS*2]`, the query would return all Person nodes that are in the `:FOLLOWS` path from Paul Blythe.

### Syntax: Varying length patterns – 1

Here are simplified syntax examples for how varying length patterns are specified in Cypher:

Retrieve all paths of any length with the relationship, :RELTYPE from nodeA to nodeB and beyond:

``(nodeA)-[:RELTYPE*]->(nodeB)``

Retrieve all paths of any length with the relationship, :RELTYPE from nodeA to nodeB or from nodeB to nodeA and beyond. This is usually a very expensive query so you should place limits on how many nodes are retrieved:

``(nodeA)-[:RELTYPE*]-(nodeB)``

### Syntax: Varying length patterns – 2

Retrieve the paths of length 3 with the relationship, :RELTYPE from nodeA to nodeB:

``(node1)-[:RELTYPE*3]->(node2)``

Retrieve the paths of lengths 1, 2, or 3 with the relationship, :RELTYPE from nodeA to nodeB, nodeB to nodeC, as well as, nodeC to _nodeD) (up to three hops):

``(node1)-[:RELTYPE*1..3]->(node2)``

### Finding the shortest path

A built-in function that you may find useful in a graph that has many ways of traversing the graph to get to the same node is the `shortestPath()` function. Using the shortest path between two nodes improves the performance of the query.

In this example, we want to discover a shortest path between the movies The Matrix and A Few Good Men. In our `MATCH` clause, we set the variable p to the result of calling `shortestPath()`, and then return p. In the call to `shortestPath()`, notice that we specify `*` for the relationship. This means any relationship; for the traversal.

``````MATCH p = shortestPath((m1:Movie)-[*]-(m2:Movie))
WHERE m1.title = 'A Few Good Men' AND
m2.title = 'The Matrix'
RETURN  p``````

When you specify this `MATCH` clause to use the `shortestPath()` function as shown here with an unbounded varying length, you will see this warning:

You should heed the warning, especially for large graphs. You can also read the Graph Data Science documentation about the shortest path algorithm, which performs even better than the one that is build into Cypher.

Here is the result returned:

Notice that the graph engine has traversed many types of relationships to get to the end node.

When you use `ShortestPath()`, you can specify a upper limits for the shortest path. In addition, you should aim to provide the patterns for the from an to nodes that execute efficiently. For example, use labels and indexes.

### Returning a subgraph

In using `shortestPath()`, the return type is a path. A subgraph is essentially as set of paths derived from your `MATCH` clause.

For example, here is an example where we want a subgraph of all nodes connected to the movie, The Replacements:

``````MATCH paths = (m:Movie)-[rel]-(p:Person)
WHERE m.title = 'The Replacements'
RETURN paths``````

If in Neo4j Browser where have unset Connect result nodes, the result is visualized as a graph because the query has returned a set of paths which are a subgraph.

Here is the result of this query:

If you view the result as text, you will see that it is simply a set of rows where a movie is connected to a person:

Some actor relationships have data for the roles property or summary property of the relationship. Note that in this text, the name of the relationship is not shown, but is it in the graph visualization. Later in this course, you will learn more about working with lists, which is what this data represents.

The APOC library is very useful if you want to query the graph to obtain subgraphs.

### Specifying optional pattern matching

`OPTIONAL MATCH` matches patterns with your graph, just like `MATCH` does. The difference is that if no matches are found, `OPTIONAL MATCH` will use nulls for missing parts of the pattern. `OPTIONAL MATCH` could be considered the Cypher equivalent of the outer join in SQL.

Here is a subgraph of our movies graph with all people named James and their relationships:

Here is an example where we query the graph for all people whose name starts with James. The `OPTIONAL MATCH` is specified to include people who have reviewed movies:

``````MATCH (p:Person)
WHERE p.name STARTS WITH 'James'
OPTIONAL MATCH (p)-[r:REVIEWED]->(m:Movie)
RETURN p.name, type(r), m.title``````

Here is the result returned:

Notice that for all rows that do not have the :REVIEWED relationship, a null value is returned for the movie part of the query, as well as the relationship.

### Exercise 5: Working with patterns in queries

In the query edit pane of Neo4j Browser, execute the browser command:

:play 4.0-intro-neo4j-exercises

and follow the instructions for Exercise 5.

 Note This exercise has 6 steps. Estimated time to complete: 30 minutes.

### Question 1

Given this Cypher query:

``````MATCH (follower:Person)-[:FOLLOWS]->(reviewer:Person)-[:REVIEWED]->(m:Movie)
WHERE m.title = 'The Replacements' RETURN follower.name, reviewer.name``````

What is the first node that is retrieved by the query engine?

Select the correct answer.

• The first Person node with a FOLLOWS relationship
• The first Person node with a REVIEWED relationship
• The Movie node for the movie, The Replacements
• The first Movie node in the alphabetical list of movies in the graph

### Question 2

We want a query that returns a list of people who acted in movies released later than 2005 and for those movies, also return title and released year of the movie, as well as the name of the writer. How can you correct this query?

``````MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
(m)<-[:WROTE]-(w:Person)
WHERE m.released > 2005
RETURN a.name, m.title, m.released, w.name``````

Select the correct answer.

• The second line should be: `(m2:Movie)←[:WROTE]-(w:Person)`.
• Add a comma after the first pattern in the `MATCH` clause.
• The second line should be: `(m2:Movie)←[:WROTE]-(a)`.
• Add a `MATCH` clause at the beginning of the second line.

### Question 3

Suppose you have a graph of Person nodes representing a social network graph. A Person node can have a IS_FRIENDS_WITH relationship with any other Person node. Like in Facebook, there can be a long path of connections between people. What Cypher `MATCH` clause would you use to find all people in this graph that are two to four hops away from each other?

Select the correct answer.

• `MATCH (p:Person)-[:IS_FRIENDS_WITH*2..4]→(p2:Person)`
• `MATCH (p:Person)-[:IS_FRIENDS_WITH*2-4]→(p2:Person)`
• `MATCH (p:Person)-[:IS_FRIENDS_WITH,2-4]→(p2:Person)`
• `MATCH (p:Person)-[:IS_FRIENDS_WITH,2,4]→(p2:Person)`

### Summary

You should now be able to write Cypher statements to:

• Specify multiple `MATCH` patterns.
• Specify multiple `MATCH` clauses.
• Specify varying length paths.
• Return a subgraph.
• Specify `OPTIONAL` in a query.

### Stay Connected

Sign up to find out more about Neo4j's upcoming events & meetups.