2.2. Get started with Cypher

This guide will introduce you to Cypher, Neo4j’s query language. It will help you:

2.2.1. Patterns

Neo4j’s Property Graphs are composed of nodes and relationships, either of which may have properties. Nodes represent entities, for example concepts, events, places and things. Relationships connect pairs of nodes.

However, nodes and relationships are simply low-level building blocks. The real strength of the property graph lies in its ability to encode patterns of connected nodes and relationships. A single node or relationship typically encodes very little information, but a pattern of nodes and relationships can encode arbitrarily complex ideas.

Cypher, Neo4j’s query language, is strongly based on patterns. Specifically, patterns are used to match desired graph structures. Once a matching structure has been found or created, Neo4j can use it for further processing.

A simple pattern, which has only a single relationship, connects a pair of nodes (or, occasionally, a node to itself). For example, a Person LIVES_IN a City or a City is PART_OF a Country.

Complex patterns, using multiple relationships, can express arbitrarily complex concepts and support a variety of interesting use cases. For example, we might want to match instances where a Person LIVES_IN a Country. The following Cypher code combines two simple patterns into a (mildly) complex pattern which performs this match:

(:Person) -[:LIVES_IN]-> (:City) -[:PART_OF]-> (:Country)

Pattern recognition is fundamental to the way that the brain works. Consequently, humans are very good at working with patterns. When patterns are presented visually, for example in a diagram or map, humans can use them to recognize, specify, and understand concepts. As a pattern-based language, Cypher takes advantage of this capability.

Like SQL, used in relational databases, Cypher is a textual declarative query language. It uses a form of ASCII art to represent graph-related patterns. SQL-like clauses and keywords, for example MATCH, WHERE and DELETE are used to combine these patterns and specify desired actions.

This combination tells Neo4j which patterns to match and what to do with the matching items, for example nodes, relationships, paths and lists. However, Cypher does not tell Neo4j how to find nodes, traverse relationships etc.

Diagrams made up of icons and arrows are commonly used to visualize graphs. Textual annotations provide labels, define properties etc.

2.2.1.1. Node syntax

Cypher uses a pair of parentheses (usually containing a text string) to represent a node, eg: (), (foo). This is reminiscent of a circle or a rectangle with rounded end caps. Here are some ASCII-art encodings for example Neo4j nodes, providing varying types and amounts of detail:

()
(matrix)
(:Movie)
(matrix:Movie)
(matrix:Movie {title: "The Matrix"})
(matrix:Movie {title: "The Matrix", released: 1997})

The simplest form, (), represents an anonymous, uncharacterized node. If we want to refer to the node elsewhere, we can add a variable, for example: (matrix). A variable is restricted to a single statement. It may have different (or no) meaning in another statement.

The Movie label (prefixed in use with a colon) declares the node’s type. This restricts the pattern, keeping it from matching (say) a structure with an Actor node in this position. Neo4j’s node indexes also use labels: each index is specific to the combination of a label and a property.

The node’s properties, for example title, are represented as a list of key/value pairs, enclosed within a pair of braces, for example: {name: "Keanu Reeves"}. Properties can be used to store information and/or restrict patterns.

2.2.1.2. Relationship syntax

Cypher uses a pair of dashes (--) to represent an undirected relationship. Directed relationships have an arrowhead at one end (<--, -->). Bracketed expressions ([…​]) can be used to add details. This may include variables, properties, and/or type information:

-->
-[role]->
-[:ACTED_IN]->
-[role:ACTED_IN]->
-[role:ACTED_IN {roles: ["Neo"]}]->

The syntax and semantics found within a relationship’s bracket pair are very similar to those used between a node’s parentheses. A variable (eg, role) can be defined, to be used elsewhere in the statement. The relationship’s type (eg, ACTED_IN) is analogous to the node’s label. The properties (eg, roles) are entirely equivalent to node properties. (Note that the value of a property may be an array.)

2.2.1.3. Pattern syntax

Combining the syntax for nodes and relationships, we can express patterns. The following could be a simple pattern (or fact) in this domain:

(keanu:Person:Actor {name:  "Keanu Reeves"} )
-[role:ACTED_IN     {roles: ["Neo"] } ]->
(matrix:Movie       {title: "The Matrix"} )

Like with node labels, the relationship type ACTED_IN is added as a symbol, prefixed with a colon: :ACTED_IN. Variables (eg, role) can be used elsewhere in the statement to refer to the relationship. Node and relationship properties use the same notation. In this case, we used an array property for the roles, allowing multiple roles to be specified.

Pattern Nodes vs. Database Nodes

When a node is used in a pattern, it describes zero or more nodes in the database. Similarly, each pattern describes zero or more paths of nodes and relationships.

2.2.1.4. Pattern variables

To increase modularity and reduce repetition, Cypher allows patterns to be assigned to variables. This allows the matching paths to be inspected, used in other expressions, etc.

acted_in = (:Person)-[:ACTED_IN]->(:Movie)

The acted_in variable would contain two nodes and the connecting relationship for each path that was found or created. There are a number of functions to access details of a path, including nodes(path), rels(path) (same as relationships(path)), and length(path).

2.2.1.5. Clauses

Cypher statements typically have multiple clauses, each of which performs a specific task, for example:

  • create and match patterns in the graph
  • filter, project, sort, or paginate results
  • compose partial statements

By combining Cypher clauses, we can compose more complex statements that express what we want to know or create. Neo4j then figures out how to achieve the desired goal in an efficient manner.