Primer

This section contains a primer covering some fundamental features of graph pattern matching with Cypher® queries.

Example graph

The example graph used in this tutorial is a model of train Stations, and different train services with Stops that call at the Stations.

patterns primer

To recreate the graph, run the following query against an empty Neo4j database:

Query
CREATE (n1:Station {name: 'Denmark Hill'}),
(n5:Station {name: 'Battersea Park'}),
(n6:Station {name: 'Wandsworth Road'}),
(n15:Station {name: 'Clapham High Street'}),
(n16:Station {name: 'Peckham Rye'}),
(n17:Station {name: 'Brixton'}),
(n14:Station {name: 'London Victoria'}),
(n18:Station {name: 'Clapham Junction'}),
(p10:Stop {departs: time('22:37'), arrives: time('22:36')}),
(p0:Stop {departs: time('22:41'), arrives: time('22:41')}),
(p2:Stop {departs: time('22:43'), arrives: time('22:43')}),
(p17:Stop {arrives: time('22:50'), departs: time('22:50')}),
(p18:Stop {arrives: time('22:46'), departs: time('22:46')}),
(p19:Stop {departs: time('22:33'), arrives: time('22:31')}),
(p21:Stop {arrives: time('22:55')}),
(p20:Stop {departs: time('22:44'), arrives: time('22:43')}),
(p22:Stop {arrives: time('22:55')}),
(p23:Stop {arrives: time('22:48')}),
(n15)-[:LINK {distance: 1.96}]->(n1)-[:LINK {distance: 0.86}]->(n16),
(n15)-[:LINK {distance: 0.39}]->(n6)<-[:LINK {distance: 0.7}]-(n5)-[:LINK {distance: 1.24}]->(n14), (n5)-[:LINK {distance: 1.45}]->(n18),
(n14)<-[:LINK {distance: 3.18}]-(n17)-[:LINK {distance: 1.11}]->(n1),
(p2)-[:CALLS_AT]->(n6), (p17)-[:CALLS_AT]->(n5), (p19)-[:CALLS_AT]->(n16),
(p22)-[:CALLS_AT]->(n14), (p18)-[:CALLS_AT]->(n18), (p0)-[:CALLS_AT]->(n15), (p23)-[:CALLS_AT]->(n5), (p20)-[:CALLS_AT]->(n1),
(p21)-[:CALLS_AT]->(n14), (p10)-[:CALLS_AT]->(n1), (p19)-[:NEXT]->(p10)-[:NEXT]->(p0)-[:NEXT]->(p2)-[:NEXT]->(p23),
(p22)<-[:NEXT]-(p17)<-[:NEXT]-(p18), (p21)<-[:NEXT]-(p20)

Matching fixed-length paths

An empty pair of parentheses is a node pattern that will match any node. This example gets a count of all the nodes in the graph:

MATCH ()
RETURN count(*) AS numNodes
Table 1. Result
numNodes

18

Rows: 1

Adding a label to the node pattern will filter on nodes with that label (see label expressions). The following query gets a count of all the nodes with the label Stop:

MATCH (:Stop)
RETURN count(*) AS numStops
Table 2. Result
numStops

10

Rows: 1

Path patterns can match relationships and the nodes they connect. The following query gets the arrival time of all trains calling at Denmark Hill:

MATCH (s:Stop)-[:CALLS_AT]->(:Station {name: 'Denmark Hill'})
RETURN s.arrives AS arrivalTime
Table 3. Result
arrivalTime

"22:36:00Z"

"22:43:00Z"

Rows: 2

Path patterns can include inline WHERE clauses. The following query gets the next calling point of the service that departs Denmark Hill at 22:37:

MATCH (n:Station {name: 'Denmark Hill'})<-[:CALLS_AT]-
        (s:Stop WHERE s.departs = time('22:37'))-[:NEXT]->
        (:Stop)-[:CALLS_AT]->(d:Station)
RETURN d.name AS nextCallingPoint
Table 4. Result
nextCallingPoint

"Clapham High Street"

Rows: 1

For more information, see Fixed length patterns.

Matching variable-length paths

Variable-length paths that only traverse relationships with a specified type can be matched with quantified relationships. Any variable declared in the relationship pattern will return a list of the relationships traversed. The following query returns the total distance traveled via all LINKs connecting the stations Peckham Rye and Clapham Junction:

Query
MATCH (:Station {name: 'Peckham Rye'})-[link:LINK]-+
        (:Station {name: 'Clapham Junction'})
RETURN reduce(acc = 0.0, l IN link | round(acc + l.distance, 2)) AS
         totalDistance
Table 5. Result
totalDistance

7.84

5.36

Rows: 2

-[:LINK]-+ is a quantified relationship. It is composed of a relationship pattern -[:LINK]- that matches relationships going in either direction, and a quantifier + that means it will match one or more relationships. As no node patterns are included with quantified relationships, they will match any intermediate nodes.

Variable-length paths can also be matched with quantified path patterns, which allow both WHERE clauses and accessing the nodes traversed by the path. The following query returns a list of calling points on routes from Peckham Rye to London Victoria, where no distance between stations is greater than two miles:

Query
MATCH (:Station {name: 'Peckham Rye'})
      (()-[link:LINK]-(s) WHERE link.distance <= 2)+
      (:Station {name: 'London Victoria'})
UNWIND s AS station
RETURN station.name AS callingPoint
Table 6. Result
callingPoint

"Denmark Hill"

"Clapham High Street"

"Wandsworth Road"

"Battersea Park"

"London Victoria"

Rows: 5

WHERE clauses inside node patterns can themselves include path patterns. The following query using an EXISTS subquery to anchor on the last Stop in a sequence of Stops, and returns the departure times, arrival times and final destination of all services calling at Denmark Hill:

Query
MATCH (:Station {name: 'Denmark Hill'})<-[:CALLS_AT]-(s1:Stop)-[:NEXT]->+
        (sN:Stop WHERE NOT EXISTS { (sN)-[:NEXT]->(:Stop) })-[:CALLS_AT]->
        (d:Station)
RETURN s1.departs AS departure, sN.arrives AS arrival,
       d.name AS finalDestination
Table 7. Result
departure arrival finalDestination

'22:37:00Z'

'22:48:00Z'

"Battersea Park"

'22:44:00Z'

'22:55:00Z'

"London Victoria"

Rows: 2

Node variables declared inside quantified path patterns become bound to lists of nodes, which can be unwound and used in subsequent MATCH clauses. The following query lists the calling points of the Peckham Rye to Battersea Park train service:

Query
MATCH (:Station {name: 'Peckham Rye'})<-[:CALLS_AT]-(:Stop)
      (()-[:NEXT]->(s:Stop))+
      ()-[:CALLS_AT]->(:Station {name: 'Battersea Park'})
UNWIND s AS stop
MATCH (stop)-[:CALLS_AT]->(station:Station)
RETURN stop.arrives AS arrival, station.name AS callingPoint
Table 8. Result
arrival callingPoint

"22:36:00Z"

"Denmark Hill"

"22:41:00Z"

"Clapham High Street"

"22:43:00Z"

"Wandsworth Road"

"22:48:00Z"

"Battersea Park"

Rows: 4

Repeating a node variable in a path pattern enables the same node to be bound more than once in a path (see equijoins). The following query finds all stations that are on a cycle (i.e., pass through the same Station more than once) formed by the LINK between Stations:

Query
MATCH (n:Station)-[:LINK]-+(n)
RETURN DISTINCT n.name AS station
Table 9. Result
station

"Denmark Hill"

"Battersea Park"

"Wandsworth Road"

"Clapham High Street"

"Brixton"

"London Victoria"

Rows: 6

Complex, non-linear paths can be matched using graph patterns, a comma separated list of path patterns that are connected via repeated node variables, i.e. equijoins. For example, a passenger is traveling from Denmark Hill and wants to join the train service to London Victoria that leaves Clapham Junction at 22:46. The following query finds the departure time from Denmark Hill as well as the changeover Station and time of arrival:

Query
MATCH (:Station {name: 'Denmark Hill'})<-[:CALLS_AT]-
        (s1:Stop)-[:NEXT]->+(s2:Stop)-[:CALLS_AT]->
        (c:Station)<-[:CALLS_AT]-(x:Stop),
       (:Station {name: 'Clapham Junction'})<-[:CALLS_AT]-
         (t1:Stop)-[:NEXT]->+(x)-[:NEXT]->+(:Stop)-[:CALLS_AT]->
         (:Station {name: 'London Victoria'})
WHERE t1.departs = time('22:46')
      AND s2.arrives < x.departs
RETURN s1.departs AS departure, s2.arrives AS changeArrival,
       c.name AS changeAt
Table 10. Result
departure changeArrival changeAt

"22:37:00Z"

"22:48:00Z"

"Battersea Park"

Rows: 1

For more information, see Variable length patterns.

Matching shortest paths

The shortest path between two nodes can be found using the SHORTEST keyword:

Query
MATCH p = SHORTEST 1
  (:Station {name: "Brixton"})
  (()-[:LINK]-(:Station))+
  (:Station {name: "Clapham Junction"})
RETURN [station IN nodes(p) | station.name] AS route
Table 11. Result
route

["Brixton", "London Victoria", "Battersea Park", "Clapham Junction"]

Rows: 1

To find all shortest paths, the ALL SHORTEST keywords can be used:

Query
MATCH p = ALL SHORTEST
  (:Station {name: "Denmark Hill"})
  (()-[:LINK]-(:Station))+
  (:Station {name: "Clapham Junction"})
RETURN [station IN nodes(p) | station.name] AS route
Table 12. Result
route

["Denmark Hill", "Clapham High Street", "Wandsworth Road", "Battersea Park", "Clapham Junction"]

["Denmark Hill", "Brixton", "London Victoria", "Battersea Park", "Clapham Junction"]

Rows: 2

In general, SHORTEST k can be used to return the k shortest paths. The following returns the two shortest paths:

Query
MATCH p = SHORTEST 2
  (:Station {name: "Denmark Hill"})
  (()-[:LINK]-(:Station))+
  (:Station {name: "Clapham High Street"})
RETURN [station IN nodes(p) | station.name] AS route
Table 13. Result
route

["Denmark Hill", "Clapham High Street"]

["Denmark Hill", "Brixton", "London Victoria", "Battersea Park", "Wandsworth Road", "Clapham High Street"]

Rows: 2

For more information, see Shortest paths.