# Merging Data

You have learned how to create nodes and relationships in the graph. When you create a node you want to ensure that there is only one instance of that node in the graph. You also want to ensure that you do not create unnecessary or duplicate relationships in the graph. Next, you will learn how to merge data in the graph.

At the end of this module, you should be able to write Cypher statements to:

• Merge data in a graph by:

• Creating nodes.

• Creating relationships.

 Because the code examples in this lesson modify the database, it is recommended that you do not execute them against your database as you will be doing so in the hands-on exercises.

## Creating data in the graph

Thus far, you have learned how to create nodes, labels, properties, and relationships in the graph. You can use `MERGE` to either create new nodes and relationships or to make changes to existing nodes and relationships.

For example, how the graph engine behaves when a duplicate element is created depends on the type of element. You have not yet learned about defining constraints in the graph. Constraints can be used to eliminate duplication of nodes and you will learn about them later in this course.

Here are some facts about creating nodes and relationships in the graph when no constraints have been defined.

If you use `CREATE`:

The result is:

Node

If a node with the same property values exists, a duplicate node is created.

Label

If the label already exists for the node, the node is not updated.

Property

If the node or relationship property already exists, it is updated with the new value. Note: If you specify a set of properties to be created using `=` rather than `+=`, existing properties are removed if they are not included in the set.

Relationship

If the relationship exists, a duplicate relationship is created.

## Using `MERGE`

The `MERGE` clause is used to find nodes or patterns in the graph. If the node or pattern is not found, by default, it is created.

You use the simple `MERGE` clause to:

• Create a unique node based on label and key information for a property or set of properties.

• Update a node based on label and key information for a property or set of properties.

• Create a unique relationship between two nodes.

• Create a unique node and relationship in the context of another node.

## Syntax: Using `MERGE` to create nodes

Here is the simplified syntax for the `MERGE` clause for creating a node:

``````MERGE (variable:Label{nodeProperties})
RETURN variable``````

If there is an existing node with Label and nodeProperties found in the graph, no node is created. If, however the node is not found in the graph, then the node is created.

When you specify nodeProperties for `MERGE`, you should only use properties that satisfy some sort of uniqueness constraint. You will learn about uniqueness constraints later in this course.

Here is what we currently have in the graph for the Person, Michael Caine. This node has values for name and born. Notice also that the label for the node is Person.

``````MATCH (p:Person)
WHERE p.name = 'Michael Caine'
RETURN p``````

### Example: Using `MERGE`

In this example, we use `MERGE` to find a node with the Actor label with the key property name of Michael Caine, and we set the born property to 1933. Our data model has never used the label, Actor so this is a new entity type in our graph.

``````MERGE (a:Actor {name: 'Michael Caine'})
SET a.born = 1933
RETURN a``````

Here is the result of running this Cypher example. We do not find a node with the label Actor so the graph engine creates one.

 A best practice when using `MERGE` is to only specify properties that have unique values and unique labels.

### Repeating the same `MERGE`

If we were to repeat this `MERGE` clause, no additional Actor nodes would be created in the graph.

At this point, however, we have two Michael Caine nodes in the graph, one of type Person, and one of type Actor:

Be mindful that node labels and the properties for a node are significant when merging nodes.

If we were to run `MERGE` code again:

``````MERGE (a:Actor {name: 'Michael Caine'})
SET a.born = 1933
WITH a
MATCH (p)
WHERE p.name = 'Michael Caine'
RETURN p``````

We would find that the Michael Caine node with the label Actor is not created. The `MERGE` found this node in the graph and did not create a new one.

## Syntax: Using `MERGE` to create relationships

Here is the syntax for the `MERGE` clause for creating relationships:

``````MERGE (variable1:Label1 {nodeProperties1})-[:REL_TYPE]->
(variable2:Label2 {nodeProperties2})
RETURN variable``````

If there is an existing node with Label1 and nodeProperties1 with the :REL_TYPE relationship to an existing node with Label2 and nodeProperties2 in the graph, no relationship is created. If the relationship does not exist, the relationship is created.

### Example: Finding existing relationships

Here is an example. We currently have the Person node with the :ACTED_IN relationship, but we do not have this relationship with the Actor node.

``````MATCH (p {name: 'Michael Caine'})-[*0..1]-(m)
RETURN p, m``````

Here is the result:

### Example: Using `MERGE` to create relationship

Here is code where we want to create the :ACTED_IN relationship between Michael Caine and the movie Batman Begins.

``````MATCH (p {name: 'Michael Caine'}),(m:Movie {title:'Batman Begins'})
MERGE (p)-[:ACTED_IN]->(m)
RETURN p,m``````

Here is the result of running this code:

Since the relationship between the Person node and the Movie node already exists, it is not created. The relationship between the Actor node and the Movie node is created with this merge.

 Although, you can leave out the direction of the relationship being created with the `MERGE`, in which case a left-to-right arrow will be assumed, a best practice is to always specify the direction of the relationship. However, if you have bidirectional relationships and you want to avoid creating duplicate relationships, you must leave off the arrow.

## Specifying creation behavior when merging

You can use the `MERGE` clause, along with `ON CREATE` to assign specific values to a node being created as a result of an attempt to merge.

Here is an example where create a new node, specifying property values for the new node:

``````MERGE (a:Person {name: 'Sir Michael Caine'})
ON CREATE SET a.birthPlace = 'London',
a.born = 1934
RETURN a``````

We know that there are no existing Sir Michael Caine Person nodes. When the `MERGE` executes, it will not find any matching nodes so it will create one and will execute the `ON CREATE` clause where we set the birthplace and born property values.

Here is the result of executing this code:

### Example: Verifying the merge

Here is the code to display the nodes that have anything to do with Michael Caine.

``````MATCH (p)-[*0..1]-(m)
WHERE p.name CONTAINS 'Caine'
RETURN p, m``````

The most recently created node has the name value of Sir Michael Caine.

## Specifying update behavior when merging

You can also specify an `ON MATCH` clause during merge processing. If the exact node is found, you can update its properties or labels. Here is an example:

``````MERGE (a:Person {name: 'Sir Michael Caine'})
ON CREATE SET a.born = 1934,
a.birthPlace = 'UK'
ON MATCH SET a.birthPlace = 'UK'``````

And here we see that only the existing node with the name, Sir Michael Caine is updated with the new birthPlace. Furthermore, no new node is created for Sir Michael Caine.

## Using `MERGE` to create relationships

Using `MERGE` to create relationships is expensive and you should only do it when you need to ensure that a relationship is unique and you are not sure if it already exists.

In this example, we use the `MATCH` clause to find all Person nodes that represent Michael Caine and we find the movie, Batman Begins that we want to connect to all of these nodes. We already have a connection between one of the Person nodes and the Movie node. We do not want this relationship to be duplicated. This is where we can use `MERGE` as follows:

``````MATCH (p:Person), (m:Movie)
WHERE m.title = 'Batman Begins' AND p.name ENDS WITH 'Caine'
MERGE (p)-[:ACTED_IN]->(m)
RETURN p, m``````

Here is the result of executing this Cypher statement. It went through all the nodes and added the relationship to the nodes that didn’t already have the relationship.

You must be aware of the behavior of the `MERGE` clause and how it will automatically create nodes and relationships. `MERGE` tries to find a full pattern and if it doesn’t find it, it creates that full pattern. That’s why in most cases you should first `MERGE` your nodes and then your relationship afterwards.

### Use MERGE carefully

Only if you intentionally want to create a node within the context of another (like a month within a year) then a `MERGE` clause with one bound and one unbound node makes sense.

For example:

``````MATCH (fromDate:Date {year: 2018})
MERGE (toDate:Date {month: 'January'})-[:IN_YEAR]->(fromDate)``````

## Exercise 12: Merging data in the graph

In the query edit pane of Neo4j Browser, execute the browser command:

:play 4.0-intro-neo4j-exercises

and follow the instructions for Exercise 12.

 This exercise has 16 steps. Estimated time to complete: 45 minutes.

### Question 1

Given this `MERGE` clause, what is the most important thing you should make sure of?

``````MERGE (p:Person {name: 'Jane Doe'})
SET p.born = 1990
RETURN p``````

• The Person label exists in the graph.

• The Person label does not exist in the graph.

• The value for name is unique.

• The value for born is unique.

### Question 2

Given this `MERGE` clause. Suppose that the p and m nodes exist in the graph. What does this code do?

``````MATCH (p {name: 'Jane Doe'}),(m:Movie {title:'The Good One'})
MERGE (p)-[:ACTED_IN]->(m)``````

• If the :ACTED_IN relationship exists, it deletes it and recreates it.

• If the :ACTED_IN relationship exists, it does nothing.

• If the :ACTED_IN relationship does not exist, it creates it.

• If the :ACTED_IN relationship does not exist, it creates another one.

### Question 3

Given this `MERGE` clause. Suppose that the p and m nodes exist in the graph, but you do not know whether the relationship exists. What are your options to process this `MERGE` clause?

``````MATCH (p {name: 'Jane Doe'}),(m:Movie {title:'The Good One'})
MERGE (p)-[rel:ACTED_IN]->(m)
SET rel.role=['role']``````

• Use the default behavior. The relationship will be created if it doesn’t exist.

• Specify `ON CREATE` to perform additional processing when the relationship is created.

• Specify `ON MATCH` to perform additional processing when the relationship is not created.

• Specify `ON DELETE` to perform additional processing when the relationship is deleted.

## Summary

You should now be able to write Cypher statements to:

• Merge data in a graph by:

• Creating nodes.

• Creating relationships.