Merging Data in the Graph

About this module

You have learned how to create nodes and relationships in the graph. When you create a node you want to ensure that there is only one instance of that node in the graph. You also want to ensure that you do not create unnecessary or duplicate relationships in the graph. Next, you will learn how to merge data in the graph.

At the end of this module, you will write Cypher statements to:

Merge data in a graph by:
- Creating nodes.
- Creating relationships.

Because the code examples in this lesson modify the database, it is recommended that you do not execute them against your database as you will be doing so in the hands-on exercises.

Creating data in the graph

Thus far, you have learned how to create nodes, labels, properties, and relationships in the graph. You can use MERGE to either create new nodes and relationships or to make changes to existing nodes and relationships.

For example, how the graph engine behaves when a duplicate element is created depends on the type of element. You have not yet learned about defining constraints in the graph. Constraints can be used to eliminate duplication of nodes and you will learn about them in the course, Using Indexes and Query Best Practices in Neo4j 4.x.

Here are some facts about creating nodes and relationships in the graph when no constraints have been defined.

If you use `CREATE`:	The result is:
Node	If a node with the same property values exists, a duplicate node is created.
Label	If the label already exists for the node, the node is not updated.
Property	If the node or relationship property already exists, it is updated with the new value. Note: If you specify a set of properties to be created using `=` rather than `+=`, existing properties are removed if they are not included in the set.
Relationship	If the relationship exists, a duplicate relationship is created.

If you use CREATE:

The result is:

Node

If a node with the same property values exists, a duplicate node is created.

Label

If the label already exists for the node, the node is not updated.

Property

If the node or relationship property already exists, it is updated with the new value. Note: If you specify a set of properties to be created using = rather than +=, existing properties are removed if they are not included in the set.

Relationship

If the relationship exists, a duplicate relationship is created.

Using `MERGE`

The MERGE clause is used to find nodes or patterns in the graph. If the node or pattern is not found, by default, it is created.

You use the simple MERGE clause to:

Create a unique node based on label and key information for a property or set of properties.
Update a node based on label and key information for a property or set of properties.
Create a unique relationship between two nodes.
Create a unique node and relationship in the context of another node.

Syntax: Using `MERGE` to create nodes

Here is the simplified syntax for the MERGE clause for creating a node:

MERGE (variable:Label{nodeProperties})
RETURN variable

If there is an existing node with Label and nodeProperties found in the graph, no node is created. If, however the node is not found in the graph, then the node is created.

When you specify nodeProperties for MERGE, only use properties that satisfy some sort of uniqueness constraint. You will learn about uniqueness constraints in the course, Using Indexes and Query Best Practices in Neo4j 4.x.

It is also extremely important to ensure that if you are using MERGE on a large dataset, the property that you use to uniquely identify the node has a uniqueness constraint defined which is an index. If there is no index, the MERGE needs to scan all nodes to find the matching node which is very expensive for large datasets.

Here is what we currently have in the graph for the Person, Michael Caine. This node has values for name and born. Notice also that the label for the node is Person.

MATCH (p:Person)
WHERE p.name = 'Michael Caine'
RETURN p

Example: Using `MERGE`

In this example, we use MERGE to find a node with the Actor label with the key property name of Michael Caine, and we set the born property to 1933. Our data model has never used the label, Actor so this is a new entity type in our graph.

MERGE (a:Actor {name: 'Michael Caine'})
SET a.born = 1933
RETURN a

Here is the result of running this Cypher example. We do not find a node with the label Actor so the graph engine creates one.

A best practice when using MERGE is to only specify properties that have unique values and unique labels.

Repeating the same `MERGE`

If we were to repeat this MERGE clause, no additional Actor nodes would be created in the graph.

At this point, however, we have two Michael Caine nodes in the graph, one of type Person, and one of type Actor:

Be mindful that node labels and the properties for a node are significant when merging nodes.

If we were to run MERGE code again:

MERGE (a:Actor {name: 'Michael Caine'})
SET a.born = 1933
WITH a
MATCH (p)
WHERE p.name = 'Michael Caine'
RETURN p

We would find that the Michael Caine node with the label Actor is not created. The MERGE found this node in the graph and did not create a new one.

Notice that we have a WITH a in this code. The WITH clause can be used to separate the writing part of a query from a reading part of a query. Here we see that the variable a is not used later in the query, but is simply used to separate writing and reading parts of the code.

Syntax: Using `MERGE` to create relationships

Here is the syntax for the MERGE clause for creating relationships:

MERGE (variable1:Label1 {nodeProperties1})-[:REL_TYPE]->
(variable2:Label2 {nodeProperties2})
RETURN variable

If there is an existing node with Label1 and nodeProperties1 with the :REL_TYPE relationship to an existing node with Label2 and nodeProperties2 in the graph, no relationship is created. If the relationship does not exist, the relationship is created.

Example: Finding existing relationships

Here is an example. We currently have the Person node with the :ACTED_IN relationship, but we do not have this relationship with the Actor node.

MATCH (p {name: 'Michael Caine'})-[*0..1]-(m)
RETURN p, m

Here is the result:

Example: Using `MERGE` to create relationship

Here is code where we want to create the :ACTED_IN relationship between Michael Caine and the movie Batman Begins.

MATCH (p {name: 'Michael Caine'}),(m:Movie {title:'Batman Begins'})
MERGE (p)-[:ACTED_IN]->(m)
RETURN p,m

Here is the result of running this code:

Since the relationship between the Person node and the Movie node already exists, it is not created. The relationship between the Actor node and the Movie node is created with this merge.

Although, you can leave out the direction of the relationship being created with the MERGE, in which case a left-to-right arrow will be assumed, a best practice is to always specify the direction of the relationship. However, if you have bidirectional relationships and you want to avoid creating duplicate relationships, you must leave off the arrow.

Specifying creation behavior when merging

You can use the MERGE clause, along with ON CREATE to assign specific values to a node being created as a result of an attempt to merge.

Here is an example where create a new node, specifying property values for the new node:

MERGE (a:Person {name: 'Sir Michael Caine'})
ON CREATE SET a.birthPlace = 'London',
              a.born = 1934
RETURN a

We know that there are no existing Sir Michael Caine Person nodes. When the MERGE executes, it will not find any matching nodes so it will create one and will execute the ON CREATE clause where we set the birthplace and born property values.

Here is the result of executing this code:

Example: Verifying the merge

Here is the code to display the nodes that have anything to do with Michael Caine.

MATCH (p)-[*0..1]-(m)
WHERE p.name CONTAINS 'Caine'
RETURN p, m

The most recently created node has the name value of Sir Michael Caine.

Specifying update behavior when merging

You can also specify an ON MATCH clause during merge processing. If the exact node is found, you can update its properties or labels. Here is an example:

MERGE (a:Person {name: 'Sir Michael Caine'})
ON CREATE SET a.born = 1934,
              a.birthPlace = 'UK'
ON MATCH SET a.birthPlace = 'UK'

And here we see that only the existing node with the name, Sir Michael Caine is updated with the new birthPlace. Furthermore, no new node is created for Sir Michael Caine.

Using `MERGE` to create relationships

Using MERGE to create relationships is expensive, only do it when you need to ensure that a relationship is unique and you are not sure if it already exists.

In this example, we use the MATCH clause to find all Person nodes that represent Michael Caine and we find the movie, Batman Begins that we want to connect to all of these nodes. We already have a connection between one of the Person nodes and the Movie node. We do not want this relationship to be duplicated. This is where we can use MERGE as follows:

MATCH (p:Person), (m:Movie)
WHERE m.title = 'Batman Begins' AND p.name ENDS WITH 'Caine'
MERGE (p)-[:ACTED_IN]->(m)
RETURN p, m

Here is the result of executing this Cypher statement. It went through all the nodes and added the relationship to the nodes that didn’t already have the relationship.

You must be aware of the behavior of the MERGE clause and how it will automatically create nodes and relationships. MERGE tries to find a full pattern and if it doesn’t find it, it creates that full pattern. That’s why in most cases you will first MERGE your nodes and then your relationship afterwards.

Use MERGE carefully

Only if you intentionally want to create a node within the context of another (like a month within a year) then a MERGE clause with one bound and one unbound node makes sense.

For example:

MATCH (fromDate:Date {year: 2018})
MERGE (toDate:Date {month: 'January'})-[:IN_YEAR]->(fromDate)

Exercise 12: Merging data in the graph

In the query edit pane of Neo4j Browser, execute the browser command:

:play 4.0-intro-neo4j-exercises

and follow the instructions for Exercise 12.

This exercise has 16 steps. Estimated time to complete: 45 minutes.

Check your understanding

Question 1

Given this MERGE clause, what is the most important thing you must make sure of?

MERGE (p:Person {name: 'Jane Doe'})
SET p.born = 1990
RETURN p

Select the correct answer.

The Person label exists in the graph.
The Person label does not exist in the graph.
The value for name is unique.
The value for born is unique.

Question 2

Given this MERGE clause, suppose that the p and m nodes exist in the graph. What does this code do?

MATCH (p {name: 'Jane Doe'}),(m:Movie {title:'The Good One'})
MERGE (p)-[:ACTED_IN]->(m)

Select the correct answers.

If the :ACTED_IN relationship exists, it deletes it and recreates it.
If the :ACTED_IN relationship exists, it does nothing.
If the :ACTED_IN relationship does not exist, it creates it.
If the :ACTED_IN relationship does not exist, it creates another one.

Question 3

Given this MERGE clause, suppose that the p and m nodes exist in the graph, but you do not know whether the relationship exists. What are your options to process this MERGE clause?

MATCH (p {name: 'Jane Doe'}),(m:Movie {title:'The Good One'})
MERGE (p)-[rel:ACTED_IN]->(m)
SET rel.role=['role']

Select the correct answers.

Use the default behavior. The relationship will be created if it doesn’t exist.
Specify ON CREATE to perform additional processing when the relationship is created.
Specify ON MATCH to perform additional processing when the relationship is not created.
Specify ON DELETE to perform additional processing when the relationship is deleted.

Summary

You can now write Cypher statements to:

Merge data in a graph by:
- Creating nodes.
- Creating relationships.

Check Answers

Need help? Ask in the Neo4j Community