Tutorial: Import data

This tutorial demonstrates how to import data from CSV files using LOAD CSV.

With the combination of the Cypher^® clauses LOAD CSV, MERGE, and CREATE you can import data into Neo4j. LOAD CSV allows you to access the data values and perform actions on them.

For a full description of LOAD CSV , see Cypher Manual → LOAD CSV.
For a full list of Cypher clauses, see Cypher Manual → Clauses.

The data files

In this tutorial, you import data from the following CSV files:

persons.csv
movies.csv
roles.csv

The content of the persons.csv file:

persons.csv

id,name
1,Charlie Sheen
2,Michael Douglas
3,Martin Sheen
4,Morgan Freeman

The persons.csv file contains two columns id and name. Each row represents one person that has a unique id and a name.

The content of the movies.csv file:

movies.csv

id,title,country,year
1,Wall Street,USA,1987
2,The American President,USA,1995
3,The Shawshank Redemption,USA,1994

The movies.csv file contains the columns id, title, country, and year. Each row represents one movie that has a unique id, a title, a country of origin, and a release year.

The content of the roles.csv file:

roles.csv

personId,movieId,role
1,1,Bud Fox
3,1,Carl Fox
2,1,Gordon Gekko
3,2,A.J. MacInerney
2,2,President Andrew Shepherd
4,3,Ellis Boyd 'Red' Redding

The roles.csv file contains the columns personId, movieId, and role. Each row represents one role with relationship data about the person id (from the persons.csv file) and the movie id (from the movies.csv file).

The graph model

The following data model shows what a graph model for this dataset could look like:

This is the resulting graph, based on the data from the CSV files:

Prerequisites

This tutorial uses the Linux or macOS tarball installation.

It assumes that your current work directory is the <neo4j-home> directory of the tarball installation, and the CSV files are placed in the default import directory.

For the default directory of other installations see, Operations Manual → File locations.
The import location can be configured with Operations Manual → server.directories.import.

Prepare the database

Before importing the data, you should prepare the database you want to use by creating indexes and constraints.

You should ensure that the Person and Movie nodes have unique id properties by creating constraints on them.

Creating a unique constraint also implicitly creates an index. By indexing the id property, node lookup (e.g. by MATCH) will be much faster.

Additionally, it is a good idea to index the country name for a fast lookup.

1. Start neo4j.

Run the command:

bin/neo4j start

The default user name is neo4j and password neo4j.

2. Create a constraint so that each Person node has a unique id property.

You create a constraint on the id property of Person nodes to ensure that nodes with the Person label will have a unique id property.

Using Neo4j Browser, run the following Cypher:

CREATE CONSTRAINT personIdConstraint FOR (person:Person) REQUIRE person.id IS UNIQUE

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j "CREATE CONSTRAINT personIdConstraint FOR (person:Person) REQUIRE person.id IS UNIQUE"

3. Create a constraint so that each Movie node has a unique id property.

You create a constraint on the id property of Movie nodes to ensure that nodes with the Movie label will have a unique id property.

Using Neo4j Browser, run the following Cypher:

CREATE CONSTRAINT movieIdConstraint FOR (movie:Movie) REQUIRE movie.id IS UNIQUE

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j "CREATE CONSTRAINT movieIdConstraint FOR (movie:Movie) REQUIRE movie.id IS UNIQUE"

4. Create an index for Country node for the name property.

Create an index on the name property of Country nodes to ensure fast lookups.

When using MERGE or MATCH with LOAD CSV, make sure you have an index or a unique constraint on the property that you are merging on. This will ensure that the query executes in a performant way.

Using Neo4j Browser, run the following Cypher:

CREATE INDEX FOR (c:Country) ON (c.name)

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j "CREATE INDEX FOR (c:Country) ON (c.name)"

Import data using `LOAD CSV`

1. Load the data from the persons.csv file.

You create nodes with the Person label and the properties id and name.

Using Neo4j Browser, run the following Cypher:

LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine
CREATE (p:Person {id: toInteger(csvLine.id), name: csvLine.name})

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine CREATE (p:Person {id:toInteger(csvLine.id), name:csvLine.name})'

Output:

Added 4 nodes, Set 8 properties, Added 4 labels

LOAD CSV also supports accessing CSV files via HTTPS, HTTP, and FTP, see Cypher Manual → LOAD CSV.

2. Load the data from the movies.csv file.

You create nodes with the Movie label and the properties id, title, and year.

Also you create nodes with the Country label. Using MERGE avoids creating duplicate Country nodes in the case where multiple movies have the same country of origin.

The relationship with the type ORIGIN will connect the Country node and the Movie node.

Using Neo4j Browser, run the following Cypher:

LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine
MERGE (country:Country {name: csvLine.country})
CREATE (movie:Movie {id: toInteger(csvLine.id), title: csvLine.title, year:toInteger(csvLine.year)})
CREATE (movie)-[:ORIGIN]->(country)

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine MERGE (country:Country {name:csvLine.country}) CREATE (movie:Movie {id:toInteger(csvLine.id), title:csvLine.title, year:toInteger(csvLine.year)}) CREATE (movie)-[:ORIGIN]->(country)'

Output:

Added 4 nodes, Created 3 relationships, Set 10 properties, Added 4 labels

3. Load the data from the roles.csv file

Importing the data from the roles.csv file is a matter of finding the Person node and the Movie node and then creating relationships between them.

For larger data files, good practice is to use the subquery CALL {…} IN TRANSACTIONS. This subquery tells Neo4j that the query might build up inordinate amounts of transaction state, and thus needs to be committed in batches. For more information, see Cypher Manual → CALL {} (subquery).

Using Neo4j Browser, run the following Cypher:

:auto LOAD CSV WITH HEADERS FROM 'file:///roles.csv' AS csvLine
CALL {
 WITH csvLine
 MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)})
CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)
} IN TRANSACTIONS OF 2 ROWS

Using Cypher Shell, you can run the command:

bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine CALL {WITH csvLine MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)}) CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)} IN TRANSACTIONS OF 2 ROWS'

Output:

Created 5 relationships, Set 5 properties

Note, CALL {…} IN TRANSACTIONS is only allowed in implicit transactions. Cypher Shell supports both explicit and implicit transactions. However, in Browser you need to prepend CALL {…} IN TRANSACTIONS with :auto. For more details, see Cypher manual → Subqueries in transactions and Neo4j Browser Guide → Command reference.

Validate the imported data

Check the resulting data set by finding all the nodes that have a relationship.

Using Neo4j Browser, run the following Cypher:

MATCH (n)-[r]->(m) RETURN n, r, m

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'MATCH (n)-[r]->(m) RETURN n, r, m'

Output:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                               | r                                               | m                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) | [:ORIGIN]                                       | (:Country {name: "USA"})                                        |
| (:Movie {id: 2, title: "The American President", year: 1995})   | [:ORIGIN]                                       | (:Country {name: "USA"})                                        |
| (:Movie {id: 1, title: "Wall Street", year: 1987})              | [:ORIGIN]                                       | (:Country {name: "USA"})                                        |
| (:Person {name: "Martin Sheen", id: 3})                         | [:ACTED_IN {role: "Carl Fox"}]                  | (:Movie {id: 1, title: "Wall Street", year: 1987})              |
| (:Person {name: "Charlie Sheen", id: 1})                        | [:ACTED_IN {role: "Bud Fox"}]                   | (:Movie {id: 1, title: "Wall Street", year: 1987})              |
| (:Person {name: "Michael Douglas", id: 2})                      | [:ACTED_IN {role: "Gordon Gekko"}]              | (:Movie {id: 1, title: "Wall Street", year: 1987})              |
| (:Person {name: "Michael Douglas", id: 2})                      | [:ACTED_IN {role: "President Andrew Shepherd"}] | (:Movie {id: 2, title: "The American President", year: 1995})   |
| (:Person {name: "Martin Sheen", id: 3})                         | [:ACTED_IN {role: "A.J. MacInerney"}]           | (:Movie {id: 2, title: "The American President", year: 1995})   |
| (:Person {name: "Morgan Freeman", id: 4})                       | [:ACTED_IN {role: "Ellis Boyd 'Red' Redding"}]  | (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Tutorial: Import data

The data files

The graph model

Prerequisites

Prepare the database

Import data using LOAD CSV

Validate the imported data

Import data using `LOAD CSV`