Import data

This tutorial demonstrates how to import data from CSV files using LOAD CSV.

With the combination of the Cypher clauses LOAD CSV, MERGE, and CREATE you can conveniently import data into Neo4j. LOAD CSV allows you to access the data values and perform actions on them.

1. The data files

In this tutorial, you import data from the following CSV files:

  • persons.csv

  • movies.csv

  • roles.csv

The content of the persons.csv file:

persons.csv
id,name
1,Charlie Sheen
2,Michael Douglas
3,Martin Sheen
4,Morgan Freeman

The persons.csv file contains two columns id and name. Each row represents one person that has a unique id and a name.

The content of the movies.csv file:

movies.csv
id,title,country,year
1,Wall Street,USA,1987
2,The American President,USA,1995
3,The Shawshank Redemption,USA,1994

The movies.csv file contains the columns id, title, country, and year. Each row represents one movie that has a unique id, a title, a country of origin, and a release year.

The content of the roles.csv file:

roles.csv
personId,movieId,role
1,1,Bud Fox
4,1,Carl Fox
3,1,Gordon Gekko
4,2,A.J. MacInerney
3,2,President Andrew Shepherd
5,3,Ellis Boyd 'Red' Redding

The roles.csv file contains the columns personId, movieId, and role. Each row represents one role with relationship data about the person id (from the persons.csv file) and the movie id (from the movies.csv file).

2. The graph model

The following simple data model shows what a graph model for this data set could look like:

Diagram

This is the resulting graph, based on the data from the CSV files:

Diagram

3. Prerequisites

This tutorial uses the Linux or macOS tarball installation.

It assumes that your current work directory is the <neo4j-home> directory of the tarball installation, and the CSV files are placed in the default import directory.

4. Prepare the database

Before importing the data, you should prepare the database you want to use by creating indexes and constraints.

You should ensure that the Person and Movie nodes have unique id properties by creating constraints on them.

Creating a unique constraint also implicitly creates an index. By indexing the id property, node lookup (e.g. by MATCH) will be much faster.

Additionally, it is a good idea to index the country name for a fast lookup.

1. Start neo4j.

Run the command:

bin/neo4j start

The default user name is neo4j and password neo4j.

2. Create a constraint so that each Person node has a unique id property.

You create a constraint on the id property of Person nodes to ensure that nodes with the Person label will have a unique id property.

Using Neo4j Browser, run the following Cypher:

CREATE CONSTRAINT personIdConstraint ON (person:Person) ASSERT person.id IS UNIQUE

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j "CREATE CONSTRAINT personIdConstraint ON (person:Person) ASSERT person.id IS UNIQUE"

3. Create a constraint so that each Movie node has a unique id propery.

You create a constraint on the id property of Movie nodes to ensure that nodes with the Movie label will have a unique id property.

Using Neo4j Browser, run the following Cypher:

CREATE CONSTRAINT movieIdConstraint ON (movie:Movie) ASSERT movie.id IS UNIQUE

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j "CREATE CONSTRAINT movieIdConstraint ON (movie:Movie) ASSERT movie.id IS UNIQUE"

4. Create an index for Country node for the name property.

Create an index on the name property of Country nodes to ensure fast lookups.

When using MERGE or MATCH with LOAD CSV, make sure you have an index or a unique constraint on the property that you are merging on. This will ensure that the query executes in a performant way.

Using Neo4j Browser, run the following Cypher:

CREATE INDEX FOR (c:Country) ON (c.name)

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j "CREATE INDEX FOR (c:Country) ON (c.name)"

5. Import data using LOAD CSV

1. Load the data from the persons.csv file.

You create nodes with the Person label and the the the properties id and name.

Using Neo4j Browser, run the following Cypher:

LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine
CREATE (p:Person {id: toInteger(csvLine.id), name: csvLine.name})

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine CREATE (p:Person {id:toInteger(csvLine.id), name:csvLine.name})'

Output:

Added 4 nodes, Set 8 properties, Added 4 labels

LOAD CSV also supports accessing CSV files via HTTPS, HTTP, and FTP, see Cypher Manual → LOAD CSV.

2. Load the data from the movies.csv file.

You create nodes with the Movie label and the the the properties id, title, and year.

Also you create nodes with the Country label. Using MERGE avoids creating duplicate Country nodes in the case where multiple movies have the same country of origin.

The relationship with the type ORIGIN will connect the Country node and the Movie node.

Using Neo4j Browser, run the following Cypher:

LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine
MERGE (country:Country {name: csvLine.country})
CREATE (movie:Movie {id: toInteger(csvLine.id), title: csvLine.title, year:toInteger(csvLine.year)})
CREATE (movie)-[:ORIGIN]->(country)

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine MERGE (country:Country {name:csvLine.country}) CREATE (movie:Movie {id:toInteger(csvLine.id), title:csvLine.title, year:toInteger(csvLine.year)}) CREATE (movie)-[:ORIGIN]->(country)'

Output:

Added 4 nodes, Created 3 relationships, Set 10 properties, Added 4 labels

3. Load the data from the roles.csv file

Importing the data from the roles.csv file is a matter of finding the Person node and Movie node and then creating relationships between them.

For larger data files, it is useful to use the hint USING PERIODIC COMMIT clause of LOAD CSV. This hint tells Neo4j that the query might build up inordinate amounts of transaction state, and thus needs to be periodically committed. For more information, see PERIODIC COMMIT query hint.

Using Neo4j Browser, run the following Cypher:

USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine
MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)})
CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine MATCH (person:Person {id:toInteger(csvLine.personId)}), (movie:Movie {id:toInteger(csvLine.movieId)}) CREATE (person)-[:ACTED_IND {role:csvLine.role}]->(movie)'

Output:

Created 5 relationships, Set 5 properties

6. Validate the imported data

Check the resulting data set by finding all the nodes that have a relationship.

Using Neo4j Browser, run the following Cypher:

MATCH (n)-[r]->(m) RETURN n, r, m

Or using Neo4j Cypher Shell, run the command:

bin/cypher-shell --database=neo4j 'MATCH (n)-[r]->(m) RETURN n, r, m'

Output:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                               | r                                               | m                                                             |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) | [:ORIGIN]                                       | (:Country {name: "USA"})                                      |
| (:Movie {id: 2, title: "The American President", year: 1995})   | [:ORIGIN]                                       | (:Country {name: "USA"})                                      |
| (:Movie {id: 1, title: "Wall Street", year: 1987})              | [:ORIGIN]                                       | (:Country {name: "USA"})                                      |
| (:Person {name: "Morgan Freeman", id: 4})                       | [:ACTED_IN {role: "Carl Fox"}]                  | (:Movie {id: 1, title: "Wall Street", year: 1987})            |
| (:Person {name: "Charlie Sheen", id: 1})                        | [:ACTED_IN {role: "Bud Fox"}]                   | (:Movie {id: 1, title: "Wall Street", year: 1987})            |
| (:Person {name: "Martin Sheen", id: 3})                         | [:ACTED_IN {role: "Gordon Gekko"}]              | (:Movie {id: 1, title: "Wall Street", year: 1987})            |
| (:Person {name: "Martin Sheen", id: 3})                         | [:ACTED_IN {role: "President Andrew Shepherd"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
| (:Person {name: "Morgan Freeman", id: 4})                       | [:ACTED_IN {role: "A.J. MacInerney"}]           | (:Movie {id: 2, title: "The American President", year: 1995}) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+