Patterns
This section contains an overview of data patterns in Cypher.
Introduction
Patterns and pattern-matching are at the very heart of Cypher, so being effective with Cypher requires a good understanding of patterns.
Using patterns, you describe the shape of the data you are looking for.
For example, in the MATCH
clause you describe the shape with a pattern, and Cypher will figure out how to get that data for you.
The pattern describes the data using a form that is very similar to how one typically draws the shape of property graph data on a whiteboard: usually as circles (representing nodes) and arrows between them to represent relationships.
Patterns appear in multiple places in Cypher: in MATCH
, CREATE,
and MERGE
clauses, and in pattern expressions.
Each of these is described in more detail in:
Patterns for nodes
The very simplest 'shape' that can be described in a pattern is a node. A node is described using a pair of parentheses, and is typically given a name.
For example:
(a)
This simple pattern describes a single node, and names that node using the variable a
.
You can specify additional constraints by introducing node pattern predicates.
Patterns for related nodes
A more powerful construct is a pattern that describes multiple nodes and relationships between them. Cypher patterns describe relationships by employing an arrow between two nodes. For example:
(a)-->(b)
This pattern describes a very simple data shape: two nodes, and a single relationship from one to the other.
In this example, the two nodes are both named as a
and b
respectively, and the relationship is 'directed': it goes from a
to b
.
This manner of describing nodes and relationships can be extended to cover an arbitrary number of nodes and the relationships between them, for example:
(a)-->(b)<--(c)
Such a series of connected nodes and relationships is called a "path".
Note that the naming of the nodes in these patterns is only necessary should one need to refer to the same node again, either later in the pattern or elsewhere in the Cypher query. If this is not necessary, then the name may be omitted, as follows:
(a)-->()<--(c)
Equijoins
If a node variable is declared more than once in the query, this is called an equijoin.
This is an operation that requires each node pattern with the same node variable to be bound to the same node.
For example, the following pattern refers to the same node twice with the variable a
, forming a cycle:
(a)-->(b)-->(c)-->(a)
The following pattern refers twice to the same node with the variable b
, forming a T-shaped pattern:
(a)-->(b)-->(c), (b)-->(e)
Patterns for labels
In addition to simply describing the shape of a node in the pattern, one can also describe attributes. The most simple attribute that can be described in the pattern is a label that the node must have. For example:
(a:User)-->(b)
One can also describe a node that has multiple labels:
(a:User&Admin)-->(b)
Specifying properties
Nodes and relationships are the fundamental structures in a graph. Neo4j uses properties on both of these to allow for far richer models.
Properties can be expressed in patterns using a map-construct: curly brackets surrounding a number of key-expression pairs, separated by commas. E.g. a node with two properties on it would look like:
(a {name: 'Andy', sport: 'Brazilian Ju-Jitsu'})
A relationship with expectations on it is given by:
(a)-[{blocked: false}]->(b)
When properties appear in patterns, they add an additional constraint to the shape of the data.
In the case of a CREATE
clause, the properties will be set in the newly-created nodes and relationships.
In the case of a MERGE
clause, the properties will be used as additional constraints on the shape any existing data must have (the specified properties must exactly match any existing data in the graph).
If no matching data is found, then MERGE
behaves like CREATE
and the properties will be set in the newly created nodes and relationships.
Note that patterns supplied to CREATE
may use a single parameter to specify properties, e.g: CREATE (node $paramName)
.
This is not possible with patterns used in other clauses, as Cypher needs to know the property names at the time the query is compiled, so that matching can be done effectively.
Patterns for relationships
The simplest way to describe a relationship is by using the arrow between two nodes, as in the previous examples. Using this technique, you can describe that the relationship should exist and the directionality of it. If you don’t care about the direction of the relationship, the arrow head can be omitted, as exemplified by:
(a)--(b)
As with nodes, relationships may also be given names. In this case, a pair of square brackets is used to break up the arrow and the variable is placed between. For example:
(a)-[r]->(b)
Much like labels on nodes, relationships can have types. To describe a relationship with a specific type, you can specify this as follows:
(a)-[r:REL_TYPE]->(b)
Unlike labels, relationships can only have one type.
But if we’d like to describe some data such that the relationship could have any one of a set of types, then they can all be listed in the pattern, separating them with the pipe symbol |
like this:
(a)-[r:TYPE1|TYPE2]->(b)
Note that this form of pattern can only be used to describe existing data (ie. when using a pattern with MATCH
or as an expression).
It will not work with CREATE
or MERGE
, since it’s not possible to create a relationship with multiple types.
For more information on how to use relationship type expressions, see syntax/expressions.adoc#relationship-type-expressions.
As with nodes, the name of the relationship can always be omitted, as exemplified by:
(a)-[:REL_TYPE]->(b)
To specify additional constraints, introduce a relationship pattern predicate.
Equijoins
Similar to nodes, it is possible to use the same name for a relationship multiple times within a pattern.
However, due to relationship isomorphism, this will not yield any results if done in the same pattern.
It can be useful across separate MATCH
statements, though.
Using the same variable name for relationships multiple times in a query would require that particular relationship to be traversed several times.
Therefore, the following example will lead to no results:
()-[r:REL_TYPE]-()-[r:REL_TYPE]-()
If, on the other hand, the two relationships are spread over two MATCH
-clauses, then the relationship isomorphism does not pose an obstacle any longer.
The following query would, therefore, return results:
MATCH ()-[r:REL_TYPE]-()
MATCH ()-[r:REL_TYPE]-()
Note, that the two instances of r
refer to the same relationship.
Variable-length pattern matching
Rather than describing a long path using a sequence of many node and relationship descriptions in a pattern, many relationships (and the intermediate nodes) can be described by specifying a length in the relationship description of a pattern. For example:
(a)-[*2]->(b)
This describes a graph of three nodes and two relationships, all in one path (a path of length 2). This is equivalent to:
(a)-->()-->(b)
A range of lengths can also be specified: such relationship patterns are called 'variable length relationships'. For example:
(a)-[*3..5]->(b)
This is a minimum length of 3, and a maximum of 5. It describes a graph of either 4 nodes and 3 relationships, 5 nodes and 4 relationships or 6 nodes and 5 relationships, all connected together in a single path.
Either bound can be omitted. For example, to describe paths of length 3 or more, use:
(a)-[*3..]->(b)
To describe paths of length 5 or less, use:
(a)-[*..5]->(b)
Omitting both bounds is equivalent to specifying a minimum of 1, allowing paths of any positive length to be described:
(a)-[*]->(b)
As a simple example, let’s take the graph and query below:
MATCH (me)-[:KNOWS*1..2]-(remote_friend)
WHERE me.name = 'Filipa'
RETURN remote_friend.name
remote_friend.name |
---|
|
|
Rows: 2 |
This query finds data in the graph with a shape that fits the pattern: specifically a node (with the name property 'Filipa') and then the KNOWS
related nodes, one or two hops away.
This is a typical example of finding first and second degree friends.
Note that variable length relationships cannot be used with CREATE
and MERGE
.
Under certain circumstances, variable-length relationships can be planned with an optimisation, see VarLength Expand Pruning query plan.
Equijoins
Like simple relationships, the variable of variable-length relationships can be used more than once to refer to the same variable-length relationship. Just as with simple relationships, this yields no results if used in the same MATCH
statement.
In addition, the results must observe the bounds of both declarations of that variable.
In the following example, the bounds of r
require it to be of length 2 and therefore only 'Anders'
is returned:
MATCH (me {name: 'Filipa'})-[r:KNOWS*1..2]-(remote_friend)
MATCH ()-[r:KNOWS*2..3]-()
RETURN remote_friend.name
remote_friend.name |
---|
|
Rows: 1 |
Also note that for variable-length relationships of length 2 or greater, the direction of the variable-length relationship must match between the different occurrences.
Otherwise, these two occurrences are considered different.
For example, the following query will not return any results, even though the first MATCH
would find several matching paths.
This is because the variable-length relationship in the second MATCH
goes in the opposite direction.
MATCH (me {name: 'Filipa'})<-[r:KNOWS*2..3]-(remote_friend)
MATCH (remote_friend)-[r:KNOWS*2..3]->(me)
RETURN remote_friend.name
remote_friend.name |
---|
Rows: 0 |
This can be mitigated by using the reverse()
function:
MATCH (me {name: 'Filipa'})<-[r:KNOWS*2..3]-(remote_friend)
WITH reverse(r) AS r2
MATCH (remote_friend)-[r2:KNOWS*2..3]->(me)
RETURN remote_friend.name
remote_friend.name |
---|
|
|
|
Rows: 3 |
Assigning to path variables
As described above, a series of connected nodes and relationships is called a "path". Cypher allows paths to be named using an identifer, as exemplified by:
p = (a)-[*3..5]->(b)
You can do this in MATCH
, CREATE
and MERGE
, but not when using patterns as expressions.
Was this page helpful?