Concepts

This is a step-by-step guide to the concepts behind graph pattern matching. It starts with the simple building blocks of graph patterns: node patterns and relationship patterns. It then shows how those are composed into path patterns that match fixed-length paths, variable-length paths and paths that have cycles in them.

The model data in the examples below are based on the UK national rail network, using publicly available datasets.

Node patterns

Every graph pattern contains at least one node pattern. The simplest graph pattern is a single, empty node pattern:

MATCH ()

The empty node pattern matches every node in a property graph. In order to obtain a reference to the nodes matched, a variable needs to be declared in the node pattern:

MATCH (n)

With this reference, node properties can be accessed:

MATCH (n)
RETURN n.name

Adding a label expression to the node pattern means only nodes with labels that match will be returned. The following matches nodes that have the Stop label:

MATCH (n:Stop)

The following more complex label expression matches all nodes that are either a TrainStation and a BusStation or StationGroup:

MATCH (n:(TrainStation&BusStation)|StationGroup)

A map of property names and values can be used to match on node properties based on equality with the specified values. The following matches nodes that have their mode property equal to Rail:

MATCH (n { mode: 'Rail' })

More general predicates can be expressed with a WHERE clause. The following matches nodes whose name property starts with Preston:

MATCH (n:Station WHERE n.name STARTS WITH 'Preston')

See the node patterns reference section for more details.

Relationship patterns

The simplest possible relationship pattern is a pair of dashes:

--

This pattern matches a relationship with any direction and does not filter on any relationship type or property. Unlike a node pattern, a relationship pattern cannot be used in a MATCH clause without node patterns at both ends. See path patterns for more details.

In order to obtain a reference to the relationships matched by the pattern, a relationship variable needs to be declared in the pattern by adding the variable name in square brackets in between the dashes:

-[r]-

To match a specific direction, add < or > to the left or right hand side respectively:

-[r]->

To match on a relationship type, add the type name after a colon:

-[:CALLS_AT]->

Similar to node patterns, a map of property names and values can be added to filter on properties of the relationship based on equality with the specified values:

-[{ distance: 0.24, duration: 'PT4M' }]->

A WHERE clause can be used for more general predicates:

-[r WHERE time() + duration(r.duration) < time('22:00') ]->

See the relationship patterns reference section for more details.

Path patterns

Any valid path starts and ends with a node, with relationships between each node (if there is more than one node). Path patterns have the same restrictions, and for all valid path patterns the following are true:

  • They have at least one node pattern.

  • They begin and end with a node pattern.

  • They alternate between nodes and relationships.

These are all valid path patterns:

()
(s)--(e)
(:Station)--()<--(m WHERE m.departs > time('12:00'))-->()-[:NEXT]->(n)

These are invalid path patterns:

-->
()-->
()-->-->()

Path pattern matching

This section contains an example of matching a path pattern to paths in a property graph.

It uses the following graph:

path pattern example graph

To recreate the graph, run the following query against an empty Neo4j database:

CREATE (pmr:Station {name: 'Peckham Rye'}),
  (dmk:Station {name: 'Denmark Hill'}),
  (vic:Station {name: 'London Victoria'}),
  (clp:Station {name: 'Clapham High Street'}),
  (eph:Station {name: 'Elephant & Castle'}),
  (vic)<-[:CALLS_AT]-(s1:Stop {departs: time('11:55')}),
  (dmk)<-[:CALLS_AT]-(s2:Stop {departs: time('11:44')})-[:NEXT]->(s1),
  (pmr)<-[:CALLS_AT]-(s3:Stop {departs: time('11:40')})-[:NEXT]->(s2),
  (clp)<-[:CALLS_AT]-(s4:Stop {departs: time('11:41')}),
  (dmk)<-[:CALLS_AT]-(s5:Stop {departs: time('11:37')})-[:NEXT]->(s4),
  (pmr)<-[:CALLS_AT]-(s6:Stop {departs: time('11:33')})-[:NEXT]->(s5),
  (eph)<-[:CALLS_AT]-(s7:Stop {departs: time('11:54')}),
  (dmk)<-[:CALLS_AT]-(s8:Stop {departs: time('11:47')})-[:NEXT]->(s7),
  (pmr)<-[:CALLS_AT]-(s9:Stop {departs: time('11:44')})-[:NEXT]->(s8)

The graph contains a number of train Stations and Stops. A Stop represents the arrival and departure of a train that CALLS_AT a Station. Each Stop forms part of a sequence of Stops connected by relationships with the type NEXT, representing the order of calling points made by a train service.

The graph shows three chains of Stops that represent different train services. Each of these services calls at the Station with the name Denmark Hill.

To return all Stops that call at the Station Denmark Hill, the following motif is used (the term motif is used to describe the pattern looked for in the graph):

path pattern motif

In this case, three paths in the graph match the structure of the motif (plus the predicate anchoring to the Station Denmark Hill):

path pattern solutions

In order to return the name of each Stop that calls at a Station, declare a variable in the Stop node pattern. The results will then have a row containing the departs value of each Stop for each match shown above:

Query
MATCH (s:Stop)-[:CALLS_AT]->(:Station {name: 'Denmark Hill'})
RETURN s.departs AS departureTime
Table 1. Result
departureTime

"11:44:00Z"

"11:47:00Z"

"11:37:00Z"

Rows: 3

Equijoins

An equijoin is an operation on paths that requires more than one of the nodes or relationships of the paths to be the same. The equality between the nodes or relationships is specified by declaring the same variable in multiple node patterns or relationship patterns. An equijoin allows cycles to be specified in a path pattern. See graph patterns for more complex patterns.

This section uses the following graph:

patterns equijoins

To recreate the graph, run the following query against an empty Neo4j database:

CREATE (bhi:Station {name: 'Birmingham International'}),
  (cov:Station {name: 'Coventry'}),
  (eus:Station  {name: 'London Euston'}),
  (bhi)<-[:CALLS_AT]-(s1:Stop {departs: time('12:03')}),
  (cov)<-[:CALLS_AT]-(s2:Stop {departs: time('11:33')})-[:NEXT]->(s1),
  (eus)<-[:CALLS_AT]-(s3:Stop {departs: time('15:54')}),
  (cov)<-[:CALLS_AT]-(s4:Stop {departs: time('14:45')})-[:NEXT]->(s3),
  (cov)<-[:CALLS_AT]-(s5:Stop {departs: time('09:34')}),
  (eus)<-[:CALLS_AT]-(s6:Stop {departs: time('08:40')})-[:NEXT]->(s5)

To illustrate how equijoins work, we will use the problem of finding a round trip between two Stations.

In this example scenario, a passenger starts their outbound journey at London Euston Station and ends at Coventry Station. The return journey will be the reverse order of those Stations.

The graph has three different services, two of which would compose the desired round trip, and a third which would send the passenger to Birmingham International.

The solution is the following path with a cycle:

patterns equijoins solution2

If unique properties exist on the node where the cycle "join" occurs in the path, then it is possible to repeat the node pattern with a predicate matching on the unique property. The following motif demonstrates how that can be achieved, repeating a Station node pattern with the name London Euston:

patterns equijoins motif

The path pattern equivalent is:

(:Station {name: 'London Euston'})<-[:CALLS_AT]-(:Stop)-[:NEXT]->(:Stop)
  -[:CALLS_AT]->(:Station {name: 'Coventry'})<-[:CALLS_AT]-(:Stop)
  -[:NEXT]->(:Stop)-[:CALLS_AT]->(:Station {name: 'London Euston'})

There may be cases where a unique predicate is not available. In this case, an equijoin can be used to define the desired cycle by using a repeated node variable. In the current example, if you declare the same node variable n in both the first and last node patterns, then the node patterns must match the same node:

patterns equijoins motif2

Putting this path pattern with an equijoin in a query, the times of the outbound and return journeys can be returned:

Query
MATCH (n:Station {name: 'London Euston'})<-[:CALLS_AT]-(s1:Stop)
  -[:NEXT]->(s2:Stop)-[:CALLS_AT]->(:Station {name: 'Coventry'})
  <-[:CALLS_AT]-(s3:Stop)-[:NEXT]->(s4:Stop)-[:CALLS_AT]->(n)
RETURN s1.departs+'-'+s2.departs AS outbound,
  s3.departs+'-'+s4.departs AS `return`
Table 2. Result
outbound return

"08:40:00Z-09:34:00Z"

"14:45:00Z-15:54:00Z"

Rows: 1

Quantified path patterns

This feature was introduced in Neo4j 5.9.

All the path patterns discussed so far have had a fixed length. This section considers how to match paths of varying length by using quantified path patterns, allowing you to search for paths whose lengths are unknown or within a specific range.

Quantified path patterns can be useful when, for example, searching for all nodes that can be reached from an anchor node, finding all paths connecting two nodes, or when traversing a hierarchy that may have differing depths.

This example uses a new graph:

patterns qpp calling points

To recreate the graph, run the following query against an empty Neo4j database:

Query
CREATE (pmr:Station {name: 'Peckham Rye'}),
  (dmk:Station {name: 'Denmark Hill'}),
  (clp:Station {name: 'Clapham High Street'}),
  (wwr:Station {name: 'Wandsworth Road'}),
  (clj:Station {name: 'Clapham Junction'}),
  (s1:Stop {arrives: time('17:19'), departs: time('17:20')}),
  (s2:Stop {arrives: time('17:12'), departs: time('17:13')}),
  (s3:Stop {arrives: time('17:10'), departs: time('17:11')}),
  (s4:Stop {arrives: time('17:06'), departs: time('17:07')}),
  (s5:Stop {arrives: time('16:58'), departs: time('17:01')}),
  (s6:Stop {arrives: time('17:17'), departs: time('17:20')}),
  (s7:Stop {arrives: time('17:08'), departs: time('17:10')}),
  (clj)<-[:CALLS_AT]-(s1), (wwr)<-[:CALLS_AT]-(s2),
  (clp)<-[:CALLS_AT]-(s3), (dmk)<-[:CALLS_AT]-(s4),
  (pmr)<-[:CALLS_AT]-(s5), (clj)<-[:CALLS_AT]-(s6),
  (dmk)<-[:CALLS_AT]-(s7),
  (s5)-[:NEXT {distance: 1.2}]->(s4),(s4)-[:NEXT {distance: 0.34}]->(s3),
  (s3)-[:NEXT {distance: 0.76}]->(s2), (s2)-[:NEXT {distance: 0.3}]->(s1),
  (s7)-[:NEXT {distance: 1.4}]->(s6)

Each Stop on a service CALLS_AT one Station. Each Stop has the properties arrives and departs that give the times the train is at the Station. Following the NEXT relationship of a Stop will give the next Stop of the service.

For this example, a path pattern is constructed to match each of the services that allow passengers to travel from Denmark Hill to Clapham Junction. The following shows the two paths that the path pattern should match:

patterns qpp solutions

The following motif represents a fixed-length path pattern that matches the service that departs from Denmark Hill station at 17:07:

patterns qpp motif1

To match the second train service, leaving Denmark Hill at 17:10, a shorter path pattern is needed:

patterns qpp motif2

Translating the motifs into Cypher®, and adding predicates to match the origin and destination Stations, yields the following two path patterns respectively:

(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
  -[:NEXT]->(:Stop)
  -[:NEXT]->(:Stop)
  -[:NEXT]->(:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })
(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
  -[:NEXT]->(:Stop)-[:CALLS_AT]->
(:Station { name: 'Clapham Junction' })

To return both solutions in the same query using these fixed-length path patterns, a UNION of two MATCH statements would be needed. For example, the following query returns the departure of the two services:

Query
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
        -[:NEXT]->(:Stop)
        -[:NEXT]->(:Stop)
        -[:NEXT]->(a:Stop)-[:CALLS_AT]->
      (:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
UNION
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
        -[:NEXT]->(a:Stop)-[:CALLS_AT]->
      (:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
Table 3. Result
departureTime arrivalTime

"17:07:00Z"

"17:19:00Z"

"17:10:00Z"

"17:17:00Z"

Rows: 2

The problem with this solution is that not only is it verbose, it can only be used where the lengths of the target paths are known in advance. Quantified path patterns solve this problem by extracting repeating parts of a path pattern into parentheses and applying a quantifier. That quantifier specifies a range of possible repetitions of the extracted pattern to match on. For the current example, the first step is identifying the repeating pattern, which in this case is the sequence of alternating Stop nodes and NEXT relationships, representing one segment of a Service:

(:Stop)-[:NEXT]->(:Stop)

The shortest path has one instance of this pattern, the longest three. So the quantifier applied to the wrapper parentheses is the range one to three, expressed as {1,3}:

((:Stop)-[:NEXT]->(:Stop)){1,3}

This also includes repetitions of two, but in this case this repetition will not return matches. To understand the semantics of this pattern, it helps to work through the expansion of the repetitions. Here are the three repetitions specified by the quantifier, combined into a union of path patterns:

(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop)(:Stop)-[:NEXT]->(:Stop)

The union operator (|) here is used for illustration only; using it this way is not part of Cypher syntax. Where two node patterns are next to each other in the expansion above, they must necessarily match the same node: the next segment of a Service starts where the previous segment ends. As such they can be rewritten as a single node pattern with any filtering condition combined conjunctively. In this example this is trivial, because the filtering applied to those nodes is just the label Stop:

patterns qpp illustration

With this, the union of path patterns simplifies to:

(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop) |
(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop)-[:NEXT]->(:Stop)

The segments of the original path pattern that connect the Stations to the Stops can also be rewritten. Here is what those segments look like when concatenated with the first repetition:

(:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(:Stop)
(:Stop)-[:NEXT]->(:Stop)
(:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })

The original MATCH clause now has the following three parts:

patterns qpp query breakdown

Translating the union of fixed-length path patterns into a quantified path pattern results in a pattern that will return the correct paths. The following query adds a RETURN clause that yields the departure and arrival times of the two services:

Query
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
      ((:Stop)-[:NEXT]->(:Stop)){1,3}
      (a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
RETURN d.departs AS departureTime, a.arrives AS arrivalTime
Table 4. Result
departureTime arrivalTime

"17:10Z"

"17:17Z"

"17:07Z"

"17:19Z"

Rows: 2

Quantified relationships

Quantified relationships allow some simple quantified path patterns to be re-written in a more succinct way. Continuing with the example of Stations and Stops from the previous section, consider the following query:

Query
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(n:Stop)
      ((:Stop)-[:NEXT]->(:Stop)){1,10}
      (m:Stop)-[:CALLS_AT]->(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime

If the relationship NEXT only connects Stop nodes, the :Stop label expressions can be removed:

Query
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(n:Stop)
      (()-[:NEXT]->()){1,10}
      (m:Stop)-[:CALLS_AT]->(a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime

When the quantified path pattern has one relationship pattern, it can be abbreviated to a quantified relationship. A quantified relationship is a relationship pattern with a postfix quantifier. Below is the previous query rewritten with a quantified relationship:

Query
MATCH (d:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-
        (n:Stop)-[:NEXT]->{1,10}(m:Stop)-[:CALLS_AT]->
        (a:Station { name: 'Clapham Junction' })
WHERE m.arrives < time('17:18')
RETURN n.departs AS departureTime

The scope of the quantifier {1,10} is the relationship pattern -[:NEXT]-> and not the node patterns abutting it. More generally, where a path pattern contained in a quantified path pattern has the following form:

(() <relationship pattern> ()) <quantifier>

then it can be re-written as follows:

<relationship pattern> <quantifier>

Prior to the introduction of quantified path patterns and quantified relationships in Neo4j 5.9, the only method in Cypher to match paths of a variable length was through variable-length relationships. This syntax is still available. It is very similar to the syntax for quantified relationships, with the following differences:

  • Position and syntax of quantifier.

  • Semantics of the asterisk symbol.

  • Type expressions are limited to the disjunction operator.

  • The WHERE clause is not allowed.

For more information, see the reference section on variable-length relationships.

Group variables

This section uses the example of Stations and Stops used in the previous section, but with an additional property distance added to the NEXT relationships:

patterns group variables graph

As the name suggests, this property represents the distance between two Stops. To return the total distance for each service connecting a pair of Stations, a variable referencing each of the relationships traversed is needed. Similarly, to extract the departs and arrives properties of each Stop, variables referencing each of the nodes traversed is required. In this example of matching services between Denmark Hill and Clapham Junction, the variables l and m are declared to match the Stops and r is declared to match the relationships. The variable origin only matches the first Stop in the path:

MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(origin)
      ((l)-[r:NEXT]->(m)){1,3}
      ()-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })

Variables that are declared inside quantified path patterns are known as group variables. They are so called because, when referred outside of the quantified path pattern, they are lists of the nodes or relationships they are bound to in the match. To understand how to think about the way group variables are bound to nodes or relationships, it helps to expand the quantified path pattern, and observe how the different variables match to the elements of the overall matched path. Here the three different expansions for each value in the range given by the quantifier {1,3}:

(l1)-[r1:NEXT]->(m1) |
(l1)-[r1:NEXT]->(m1)(l2)-[r2:NEXT]->(m2) |
(l1)-[r1:NEXT]->(m1)(l2)-[r2:NEXT]->(m2)(l3)-[r3:NEXT]->(m3)

The subscript of each variable indicates which instance of the path pattern repetition they belong to. The following diagram shows the variable bindings of the path pattern with three repetitions, which matches the service that departs Denmark Hill at 17:07. It traces the node or relationship that each indexed variable is bound to. Note that the index increases from right to left as the path starts at Denmark Hill:

patterns group variables graph2

For this matched path, the group variables have the following bindings:

l => [n2, n3, n4]
r => [r2, r3, r4]
m => [n3, n4, n5]

The second solution is the following path:

patterns group variables graph3

The following table shows the bindings for both matches, including the variable origin. In contrast to the group variables, origin is a singleton variable due to being declared outside the quantification. Singleton variables bind at most to one node or relationship.

origin l r m

n2

[n2, n3, n4]

[r2, r3, r4]

[n3, n4, n5]

n7

[n7]

[r8]

[n8]

Returning to the original goal, which was to return the sequence of depart times for the Stops and the total distance of each service, the final query exploits the compatibility of group variables with list comprehensions and list functions such as reduce():

Query
MATCH (:Station {name: 'Denmark Hill'})<-[:CALLS_AT]-(origin)
      ((l)-[r:NEXT]->(m)){1,3}
      ()-[:CALLS_AT]->(:Station {name: 'Clapham Junction'})
RETURN origin.departs + [stop in m | stop.departs] AS departureTimes,
       reduce(acc = 0.0, next in r | round(acc + next.distance, 2)) AS totalDistance
Table 5. Result
departureTimes totalDistance

["17:10:00Z", "17:20:00Z"]

1.4

["17:07:00Z", "17:11:00Z", "17:13:00Z", "17:20:00Z"]

1.4

Rows: 2

Shortest path

This section uses the following graph:

patterns shortestpath graph

To recreate it, run the following query against an empty Neo4j database:

CREATE (asc:Station {name:'Ashchurch'}),
  (bmv:Station {name:'Bromsgrove'}),
  (cnm:Station {name:'Cheltenham Spa'}),
  (dtw:Station {name:'Droitwich Spa'}),
  (hby:Station {name:'Hartlebury'}),
  (psh:Station {name:'Pershore'}),
  (wop:Station {name:'Worcestershire Parkway Ll'}),
  (wof:Station {name:'Worcester Foregate Street'}),
  (wos:Station {name:'Worcester Shrub Hill'})
SET asc.location = point({longitude: -2.10876, latitude: 51.9989}),
  bmv.location = point({longitude: -2.04978, latitude: 52.3206}),
  cnm.location = point({longitude: -2.09962, latitude: 51.8974}),
  dtw.location = point({longitude: -2.15836, latitude: 52.2682}),
  hby.location = point({longitude: -2.22112, latitude: 52.33449}),
  psh.location = point({longitude: -2.07154, latitude: 52.13055}),
  wop.location = point({longitude: -2.16003, latitude: 52.15605}),
  wof.location = point({longitude: -2.2216, latitude: 52.19514}),
  wos.location = point({longitude: -2.20941, latitude: 52.19473})
CREATE (asc)-[:LINK {distance: 7.25}]->(cnm),
  (asc)-[:LINK {distance: 11.29}]->(wop),
  (asc)-[:LINK {distance: 14.75}]->(wos),
  (bmv)-[:LINK {distance: 31.14}]->(cnm),
  (bmv)-[:LINK {distance: 6.16}]->(dtw),
  (bmv)-[:LINK {distance: 12.6}]->(wop),
  (dtw)-[:LINK {distance: 5.64}]->(hby),
  (dtw)-[:LINK {distance: 6.03}]->(wof),
  (dtw)-[:LINK {distance: 5.76}]->(wos),
  (psh)-[:LINK {distance: 4.16}]->(wop),
  (wop)-[:LINK {distance: 3.71}]->(wos),
  (wof)-[:LINK {distance: 0.65}]->(wos)

Single shortest path

The shortestPath function returns the path between two nodes with the fewest number of relationships. If more than one shortest path exists, then one is picked non-deterministically.

For example, the following returns the shortest path between Hartlebury and Cheltenham Spa:

Query
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
WHERE hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
RETURN [n in nodes(p) | n.name] AS stops
Table 6. Result
stops

["Hartlebury", "Droitwich Spa", "Bromsgrove", "Cheltenham Spa"]

Rows: 1

The path pattern passed to the shortestPath function defines the pattern that the shortest path must conform to. It needs to be a variable-length relationship with a single relationship pattern. For more information, see the reference section on variable-length relationships.

Single shortest path with predicates

If the MATCH clause of the shortestPath function includes a WHERE clause, a shortest path that satisfies the WHERE clause conditions is returned if one exists. This is different to first finding the shortest path using the path pattern and then applying the WHERE clause condition, which could potentially return no results.

For example, the following query returns the shortest path, with the condition that the distance between stations is never 20 miles or more:

Query
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
WHERE all(link in relationships(p) WHERE link.distance < 20) AND
      hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
RETURN [n in nodes(p) | n.name] AS stops
Table 7. Result
stops

["Hartlebury", "Droitwich Spa", "Worcester Shrub Hill", "Ashchurch", "Cheltenham Spa"]

Rows: 1

If the evaluation of the WHERE clause conditions is forced to happen after the shortestPath function returns a solution, for example by moving the WHERE clause so it comes after a WITH clause, the shortest path found will include LINK relationships with distance greater than 20, which will subsequently get filtered out:

Query
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
WHERE hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
WITH p
WHERE all(link in relationships(p) WHERE link.distance < 20)
RETURN [n in nodes(p) | n.name] AS stops
Result
(no changes, no records)

When the predicate of the WHERE clause can be checked during the search for the shortest path as in the previous example, then solutions can be found efficiently. If, however, the predicate requires evaluation of the whole path before being checked, then a more exhaustive search may need to be done first. This can have a significant impact on performance. For example, the following query requires the whole path to determine the total distance between the endpoints:

Query
MATCH p = shortestPath((hby)-[link:LINK*]-(cnm))
WHERE reduce(acc = 0, l in link | acc + l.distance) > 50 AND
      hby.name = 'Hartlebury' AND cnm.name = 'Cheltenham Spa'
RETURN [n in nodes(p) | n.name] AS stops
Table 8. Result
stops

["Hartlebury", "Droitwich Spa", "Worcester Shrub Hill", "Worcestershire Parkway Ll", "Bromsgrove", "Cheltenham Spa"]

Rows: 1

On a large, highly connected graph, this can be very time consuming.

All shortest paths

The allShortestPaths function finds all paths between two nodes that have the minimum number of relationships. For example, the following returns the two shortest paths between Hartlebury and Pershore:

Query
MATCH p = allShortestPaths((hby)-[link:LINK*]-(psh))
WHERE hby.name = 'Hartlebury' AND psh.name = 'Pershore'
RETURN [n in nodes(p) | n.name] AS stops
Table 9. Result
stops

["Hartlebury", "Droitwich Spa", "Bromsgrove", "Worcestershire Parkway Ll", "Pershore"]

["Hartlebury", "Droitwich Spa", "Worcester Shrub Hill", "Worcestershire Parkway Ll", "Pershore"]

Rows: 2

Predicates in quantified path patterns

One of the pitfalls of quantified path patterns is that, depending on the graph, they can end up matching very large numbers of paths, resulting in a slow query performance. This is especially true when searching for paths with a large maximum length or when the pattern is too general. However, by using inline predicates that specify precisely which nodes and relationships should be included in the results, unwanted results will be pruned as the graph is traversed.

Here are some examples of the types of constraints you can impose on quantified path pattern traversals:

  • Nodes must have certain combinations of labels. For example, all nodes must be an Employee, but not a Contractor.

  • Relationships must have certain types. For example, all relationships in the path must be of type EMPLOYED_BY.

  • Nodes or relationships must have properties satisfying some condition. For example, all relationships must have property distance > 10.

The same example used in the shortest path section above is used here to illustrate the use of inline predicates. In that section, the shortest path in terms of number of hops was found. Here the example is developed to find the shortest path by physical distance and compared to the result from the shortestPath function.

The total distance from Hartlebury to Cheltenham Spa following the path yielded by shortestPath is given by the following query:

Query
MATCH (hby:Station {name: 'Hartlebury'}),
      (cnm:Station {name: 'Cheltenham Spa'})
MATCH p = shortestPath((hby)-[:LINK*]-(cnm))
RETURN reduce(acc = 0, r in relationships(p) | acc + r.distance)
  AS distance
Table 10. Result
distance

42.94

Rows: 1

Whether this is the shortest path by distance can be checked by looking at each path between the two end Stations and returning the first result after ordering by distance:

Query
MATCH (hby:Station {name: 'Hartlebury'}),
      (cnm:Station {name: 'Cheltenham Spa'})
MATCH p = (hby)-[:LINK]-+(cnm)
RETURN reduce(acc = 0, r in relationships(p) | acc + r.distance)
  AS distance
ORDER BY distance LIMIT 1
Table 11. Result
distance

33.4

Rows: 1

This shows that there is a route with a shorter distance than the one with fewer Stations. For a small dataset, this query will be fast, but the execution time will increase exponentially with the graph size. For a real dataset, such as the entire rail network of the UK, it will be unacceptably long.

One approach to avoid the exponential explosion in paths is to put a finite upper bound to the quantified path pattern. This works fine where the solution is known to lie within some range of hops. But in cases where this is not known, one alternative would be to make the pattern more specific by, for example, adding node labels, or by specifying a relationship direction. Another alternative would be to add an inline predicate to the quantified path pattern.

In this example, an inline predicate can be added that exploits the geospatial property location of the Stations: for each pair of Stations on the path, the second Station will be closer to the endpoint (not always true, but is assumed here to keep the example simple). To compose the predicate, the point.distance() function is used to compare the distance of the left-hand and the right-hand Station to the destination Cheltenham Spa:

Query
MATCH (hby:Station {name: 'Hartlebury'}),
      (cnm:Station {name: 'Cheltenham Spa'})
MATCH p = (hby)
          ((a)-[:LINK]-(b) WHERE point.distance(a.location, cnm.location) >
            point.distance(b.location, cnm.location))+ (cnm)
RETURN reduce(acc = 0, r in relationships(p) | acc + r.distance)
  AS distance
ORDER BY distance
Table 12. Result
distance

33.4

33.65

34.32

34.57

Rows: 4

This query shows that there are only four paths solving the query (a number that remains constant even if the data from the rest of the UK railway network was included). Using inline predicates or making quantified path patterns more specific where possible can greatly improve query performance.

Graph patterns

In addition to the single path patterns discussed so far, multiple path patterns can be combined in a comma-separated list to form a graph pattern. In a graph pattern, each path pattern is matched separately, and where node variables are repeated in the separate path patterns, the solutions are reduced via equijoins. If there are no equijoins between the path patterns, the result is a Cartesian product between the separate solutions.

The benefit of joining multiple path patterns in this way is that it allows the specification of more complex patterns than the linear paths allowed by a single path pattern. To illustrate this, another example drawn from the railway model will be used. In this example, a passenger is traveling from Starbeck to Huddersfield, changing trains at Leeds. To get to Leeds from Starbeck, the passenger can take a direct service that stops at all stations on the way. However, there is an opportunity to change at one of the stations (Harrogate) on the way to Leeds, and catch an express train, which may enable the passenger to catch an earlier train leaving from Leeds, reducing the overall journey time.

This section uses the following graph, showing the train services the passenger can use:

patterns graph patterns graph1

To recreate the graph, run the following query against an empty Neo4j database:

CREATE (hgt:Station {name: 'Harrogate'}), (lds:Station {name: 'Leeds'}),
(sbe:Station {name: 'Starbeck'}), (hbp:Station {name: 'Hornbeam Park'}),
(wet:Station {name: 'Weeton'}), (hrs:Station {name: 'Horsforth'}),
(hdy:Station {name: 'Headingley'}), (buy:Station {name: 'Burley Park'}),
(pnl:Station {name: 'Pannal'}), (hud:Station {name: 'Huddersfield'}),
(s9:Stop {arrives: time('11:53')}),
(s8:Stop {arrives: time('11:44'), departs: time('11:45')}),
(s7:Stop {arrives: time('11:40'), departs: time('11:43')}),
(s6:Stop {arrives: time('11:38'), departs: time('11:39')}),
(s5:Stop {arrives: time('11:29'), departs: time('11:30')}),
(s4:Stop {arrives: time('11:24'), departs: time('11:25')}),
(s3:Stop {arrives: time('11:19'), departs: time('11:20')}),
(s2:Stop {arrives: time('11:16'), departs: time('11:17')}),
(s1:Stop {departs: time('11:11')}), (s21:Stop {arrives: time('11:25')}),
(s211:Stop {departs: time('11:00')}), (s10:Stop {arrives: time('11:45')}),
(s101:Stop {departs: time('11:20')}), (s11:Stop {arrives: time('12:05')}),
(s111:Stop {departs: time('11:40')}), (s12:Stop {arrives: time('12:07')}),
(s121:Stop {departs: time('11:50')}), (s13:Stop {arrives: time('12:37')}),
(s131:Stop {departs: time('12:20')}),
(lds)<-[:CALLS_AT]-(s9), (buy)<-[:CALLS_AT]-(s8)-[:NEXT]->(s9),
(hdy)<-[:CALLS_AT]-(s7)-[:NEXT]->(s8), (hrs)<-[:CALLS_AT]-(s6)-[:NEXT]->(s7),
(wet)<-[:CALLS_AT]-(s5)-[:NEXT]->(s6), (pnl)<-[:CALLS_AT]-(s4)-[:NEXT]->(s5),
(hbp)<-[:CALLS_AT]-(s3)-[:NEXT]->(s4), (hgt)<-[:CALLS_AT]-(s2)-[:NEXT]->(s3),
(sbe)<-[:CALLS_AT]-(s1)-[:NEXT]->(s2), (lds)<-[:CALLS_AT]-(s21), (hgt)<-[:CALLS_AT]-(s211)-[:NEXT]->(s21), (lds)<-[:CALLS_AT]-(s10), (hgt)<-[:CALLS_AT]-(s101)-[:NEXT]->(s10), (lds)<-[:CALLS_AT]-(s11), (hgt)<-[:CALLS_AT]-(s111)-[:NEXT]->(s11), (hud)<-[:CALLS_AT]-(s12), (lds)<-[:CALLS_AT]-(s121)-[:NEXT]->(s12), (hud)<-[:CALLS_AT]-(s13), (lds)<-[:CALLS_AT]-(s131)-[:NEXT]->(s13)

The solution to the problem assembles a set of path patterns matching the following three parts: the stopping service; the express service; and the final leg of the journey from Leeds to Huddersfield. Each changeover, from stopping to express service and from express to onward service, has to respect the fact that the arrival time of a previous leg has to be before the departure time of the next leg. This will be encoded in a single WHERE clause.

The following visualizes the three legs with different colors, and identifies the node variables used to create the equijoins and anchoring:

patterns graph patterns graph2

For the stopping service, it is assumed that the station the passenger needs to change at is unknown. As a result, the pattern needs to match a variable number of stops before and after the Stop b, the Stop that calls at the changeover station l. This is achieved by placing the quantified relationship -[:NEXT]->* on each side of node b. The ends of the path also needs to be anchored at a Stop departing from Starbeck at 11:11, as well as at a Stop calling at Leeds:

(:Station {name: 'Starbeck'})<-[:CALLS_AT]-
  (a:Stop {departs: time('11:11')})-[:NEXT]->*(b)-[:NEXT]->*
  (c:Stop)-[:CALLS_AT]->(lds:Station {name: 'Leeds'})

For the express service, the ends of the path are anchored at the stop b and Leeds station, which lds will be bound to by the first leg. Although in this particular case there are only two stops on the service, a more general pattern that can match any number of stops is used:

(b)-[:CALLS_AT]->(l:Station)<-[:CALLS_AT]-(m:Stop)-[:NEXT]->*
  (n:Stop)-[:CALLS_AT]->(lds)

Note that as Cypher only allows a relationship to be traversed once in a given match for a graph pattern, the first and second legs are guaranteed to be different train services. (See relationship uniqueness for more details.) Similarly, another quantified relationship that bridges the stops calling at Leeds station and Huddersfield station is added:

(lds)<-[:CALLS_AT]-(x:Stop)-[:NEXT]->*(y:Stop)-[:CALLS_AT]->
  (:Station {name: 'Huddersfield'})

The other node variables are for the WHERE clause or for returning data. Putting this together, the resulting query returns the earliest arrival time achieved by switching to an express service:

Query
MATCH (:Station {name: 'Starbeck'})<-[:CALLS_AT]-
        (a:Stop {departs: time('11:11')})-[:NEXT]->*(b)-[:NEXT]->*
        (c:Stop)-[:CALLS_AT]->(lds:Station {name: 'Leeds'}),
      (b)-[:CALLS_AT]->(l:Station)<-[:CALLS_AT]-(m:Stop)-[:NEXT]->*
        (n:Stop)-[:CALLS_AT]->(lds),
      (lds)<-[:CALLS_AT]-(x:Stop)-[:NEXT]->*(y:Stop)-[:CALLS_AT]->
        (:Station {name: 'Huddersfield'})
WHERE b.arrives < m.departs AND n.arrives < x.departs
RETURN a.departs AS departs,
       l.name AS changeAt,
       m.departs AS changeDeparts,
       y.arrives AS arrives
ORDER BY y.arrives LIMIT 1
Table 13. Result
departs changeAt changeDeparts arrives

"11:11:00Z"

"Harrogate"

"11:20:00Z"

"12:07:00Z"

Rows: 1