Patterns in practice

Creating and returning data

Let’s start by looking into the clauses that allow you to create data.

To add data, you just use the patterns you already know. By providing patterns, you can specify what graph structures, labels, and properties you would like to make part of your graph.

Obviously the simplest clause is called CREATE. It creates the patterns that you specify.

For the patterns you have looked at so far this could look like the following:

CREATE (:Movie {title: 'The Matrix', released: 1997})

If you run this statement, Cypher^® returns the number of changes: in this case adding one node, one label, and two properties.

Created Nodes: 1
Added Labels: 1
Set Properties: 2
Rows: 0

As you started out with an empty database, you now have a database with a single node in it:

If you also want to return the created data, you can add a RETURN clause, which refers to the variable you have assigned to your pattern elements.
The RETURN keyword in Cypher specifies what values or results you might want to return from a Cypher query.
You can tell Cypher to return nodes, relationships, node and relationship properties, or patterns in your query results.
RETURN is not required when doing write procedures, but is needed for reads.
The node and relationship variables, which are discussed earlier, become important when using RETURN.

CREATE (p:Person {name: 'Keanu Reeves', born: 1964})
RETURN p

This is what gets returned:

Created Nodes: 1
Added Labels: 1
Set Properties: 2
Rows: 1

+----------------------------------------------+
| p                                            |
+----------------------------------------------+
| (:Person {name: 'Keanu Reeves', born: 1964}) |
+----------------------------------------------+

If you want to create more than one element, you can separate the elements with commas or use multiple CREATE statements.

You can, of course, also create more complex structures, like an ACTED_IN relationship with information about the character, or DIRECTED ones for the director.

CREATE (a:Person {name: 'Tom Hanks', born: 1956})-[r:ACTED_IN {roles: ['Forrest']}]->(m:Movie {title: 'Forrest Gump', released: 1994})
CREATE (d:Person {name: 'Robert Zemeckis', born: 1951})-[:DIRECTED]->(m)
RETURN a, d, r, m

This is the part of the updated graph:

In most cases, you want to add new data to existing structures. This requires knowing how to find existing patterns in your graph data, which is covered in the next section.

Matching patterns

Matching patterns is a task for the MATCH statement. You pass the same kind of patterns you have used so far to MATCH to describe what you are looking for. It is similar to query by example, only that your examples also include the structures. To bring back nodes, relationships, properties, or patterns, you need to have variables specified in your MATCH clause for the data you want to return.

A MATCH statement searches for the patterns you specify and return one row per successful pattern match.

To find the data you have created so far, you can start looking for all nodes labeled with the Movie label.

MATCH (m:Movie)
RETURN m

Here’s the result:

This should show both The Matrix and Forrest Gump.

You can also look for a specific person, like Keanu Reeves.

MATCH (p:Person {name: 'Keanu Reeves'})
RETURN p

This query returns the matching node:

Note that you only provide enough information to find the nodes, not all properties are required. In most cases, you have key-properties like SSN, ISBN, emails, logins, geolocation, or product codes to look for.

You can also find more interesting connections, like, for instance, the movies' titles that Tom Hanks acted in and roles he played.

MATCH (p:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(m:Movie)
RETURN m.title, r.roles

Rows: 1

+------------------------------+
| m.title        | r.roles     |
+------------------------------+
| 'Forrest Gump' | ['Forrest'] |
+------------------------------+

In this case, you only returned the properties of the nodes and relationships that you are interested in. You can access them everywhere via a dot notation identifer.property.

Of course, this only lists T. Hank’s role as Forrest in Forrest Gump because that’s all data that you have added.

Now you know enough to add new nodes to existing ones and can combine MATCH and CREATE to attach structures to the graph.

Cypher examples

Let us look at some examples of using MATCH and RETURN keywords. Each subsequent example is more complicated than the previous one. The two last examples start with explanations of what we are trying to achieve. If you click the Run Query button below each Cypher code snippet, you can see the results in the format of a graph or table.

Example 1: Find the labeled Person nodes in the graph. Note that you must use a variable like p for the Person node if you want to retrieve the node in the RETURN clause.

MATCH (p:Person)
RETURN p
LIMIT 1

Example 2: Find Person nodes in the graph that have a name of 'Tom Hanks'. Remember that you can name your variable anything you want, as long as you reference that same name later.

MATCH (tom:Person {name: 'Tom Hanks'})
RETURN tom

Example 3: Find which Movie Tom Hanks has directed.

Explanation: at first you should find Tom Hanks' Person node and after that the Movie nodes he is connected to. To do that, you have to follow the DIRECTED relationship from Tom Hanks' Person node to the Movie node. You have also specified a label of Movie so that the query only looks at nodes with that label. Since you only care about returning the movie in this query, you need to give that node a variable (movie) but do not need to give variables for the Person node or DIRECTED relationship.

MATCH (:Person {name: 'Tom Hanks'})-[:DIRECTED]->(movie:Movie)
RETURN movie

Example 4: Find which Movie Tom Hanks has directed, but this time, return only the title of the movie.

Explanation: this query is similar to the previous one. Example 3 returned the entire Movie node with all its properties. For this example, you still need to find Tom’s movies, but now you only care about their titles. You should access the node’s title property using the syntax variable.property to return the name value.

MATCH (:Person {name: 'Tom Hanks'})-[:DIRECTED]->(movie:Movie)
RETURN movie.title

Aliasing return values

Not all properties are simple like movie.title in the example above. Some properties have poor names due to property length, multi-word descriptions, developer jargon, and other shortcuts. These naming conventions can be difficult to read, especially if they end up on reports and other user-facing interfaces.

Poorly-named properties

//poorly-named property
MATCH (tom:Person {name:'Tom Hanks'})-[rel:DIRECTED]-(movie:Movie)
RETURN tom.name, tom.born, movie.title, movie.released

Just like with SQL, you can rename return results by using the AS keyword and aliasing the property with a cleaner name.

Cleaner Results with aliasing

//cleaner printed results with aliasing
MATCH (tom:Person {name:'Tom Hanks'})-[rel:DIRECTED]-(movie:Movie)
RETURN tom.name AS name, tom.born AS `Year Born`, movie.title AS title, movie.released AS `Year Released`

You can specify return aliases that have spaces by using the backtick character before and after the alias (movie.released AS Year Released). If you do not have an alias that contains spaces, then you do not need to use backticks.

Attaching structures

To extend the graph with new information, you first match the existing connection points and then attach the newly created nodes to them with relationships. Adding Cloud Atlas as a new movie for Tom Hanks could be achieved like this:

MATCH (p:Person {name: 'Tom Hanks'})
CREATE (m:Movie {title: 'Cloud Atlas', released: 2012})
CREATE (p)-[r:ACTED_IN {roles: ['Zachry']}]->(m)
RETURN p, r, m

Here’s what the structure looks like in the database:

It is important to remember that you can assign variables to both nodes and relationships and use them later on, no matter if they were created or matched.

It is possible to attach both node and relationship in a single CREATE clause. For readability, it helps to split them up though.

A tricky aspect of the combination of MATCH and CREATE is that you get one row per matched pattern. This causes subsequent CREATE statements to be executed once for each row. In many cases, this is what you want. If that’s not intended, move the CREATE statement before the MATCH, or change the cardinality of the query with means discussed later or use the get or create semantics of the next clause: MERGE.

Completing patterns

Whenever you get data from external systems or are not sure if certain information already exists in the graph, you want to be able to express a repeatable (idempotent) update operation. In Cypher MERGE clause has this function. It acts like a combination of MATCH or CREATE, which checks for the existence of data before creating it. With MERGE, you define a pattern to be found or created. Usually, as with MATCH, you only want to include the key property to look for in your core pattern. MERGE allows you to provide additional properties you want to set ON CREATE.

If you do not know whether your graph already contained Cloud Atlas, you could merge it again.

MERGE (m:Movie {title: 'Cloud Atlas'})
ON CREATE SET m.released = 2012
RETURN m

Created Nodes: 1
Added Labels: 1
Set Properties: 2
Rows: 1

+-------------------------------------------------+
| m                                               |
+-------------------------------------------------+
| (:Movie {title: 'Cloud Atlas', released: 2012}) |
+-------------------------------------------------+

You get a result in both cases: either the data (potentially more than one row) that was already in the graph or a single, newly created Movie node.

A MERGE clause without any previously assigned variables in it either matches the full pattern or creates the full pattern. It never produces a partial mix of matching and creating within a pattern. To achieve a partial match/create, make sure to use already defined variables for the parts that shouldn’t be affected.

So foremost MERGE makes sure that you can’t create duplicate information or structures, but it comes with the cost of needing to check for existing matches first. Especially on large graphs, it can be costly to scan a large set of labeled nodes for a specific property. You can alleviate some of that by creating supporting indexes or constraints, which are discussed in the upcoming sections. But it’s still not for free, so whenever you’re sure to not create duplicate data use CREATE over MERGE.

MERGE can also assert that a relationship is only created once. For that to work you have to pass in both nodes from a previous pattern match.

MATCH (m:Movie {title: 'Cloud Atlas'})
MATCH (p:Person {name: 'Tom Hanks'})
MERGE (p)-[r:ACTED_IN]->(m)
ON CREATE SET r.roles =['Zachry']
RETURN p, r, m

If the direction of a relationship is arbitrary, you can leave off the arrowhead. MERGE checks for the relationship in either direction and creates a new directed relationship if there is no matching relationship.

If you choose to pass in only one node from a preceding clause, MERGE offers an interesting functionality. It only matches within the direct neighborhood of the provided node for the given pattern, and if the pattern is not found creates it. This can come in very handy for creating, for example, tree structures.

CREATE (y:Year {year: 2014})
MERGE (y)<-[:IN_YEAR]-(m10:Month {month: 10})
MERGE (y)<-[:IN_YEAR]-(m11:Month {month: 11})
RETURN y, m10, m11

This is the graph structure that gets created:

Here is no global search for the two Month nodes; they are only searched for in the context of the 2014 Year node.

Code challenge

Now knowing the basics, use the parts below to build a Cypher statement to find the title and year of release for every :Movie that Tom Hanks has :DIRECTED. Click the parts to add them in order and once you are done, click Run Query to see whether you have got it right. You can click any part of the query inside the code block to remove it.

MATCH (p:Person {name: "Tom Hanks"})-[:DIRECTED]->(m:Movie) RETURN m.title, m.released