Tutorial: Import data
This tutorial demonstrates how to import data from CSV files using LOAD CSV.
With the combination of the Cypher® clauses LOAD CSV, MERGE, and CREATE you can import data into Neo4j.
LOAD CSV allows you to access the data values and perform actions on them.
|
The data files
In this tutorial, you import data from the following CSV files:
-
persons.csv
-
movies.csv
-
roles.csv
The content of the persons.csv file:
id,name
1,Charlie Sheen
2,Michael Douglas
3,Martin Sheen
4,Morgan Freeman
The persons.csv file contains two columns id and name.
Each row represents one person that has a unique id and a name.
The content of the movies.csv file:
id,title,country,year
1,Wall Street,USA,1987
2,The American President,USA,1995
3,The Shawshank Redemption,USA,1994
The movies.csv file contains the columns id, title, country, and year.
Each row represents one movie that has a unique id, a title, a country of origin, and a release year.
The content of the roles.csv file:
personId,movieId,role
1,1,Bud Fox
3,1,Carl Fox
2,1,Gordon Gekko
3,2,A.J. MacInerney
2,2,President Andrew Shepherd
4,3,Ellis Boyd 'Red' Redding
The roles.csv file contains the columns personId, movieId, and role.
Each row represents one role with relationship data about the person id (from the persons.csv file) and the movie id (from the movies.csv file).
The graph model
The following data model shows what a graph model for this dataset could look like:
This is the resulting graph, based on the data from the CSV files:
Prerequisites
This tutorial uses the Linux or macOS tarball installation.
It assumes that your current work directory is the <neo4j-home> directory of the tarball installation, and the CSV files are placed in the default import directory.
|
Prepare the database
Before importing the data, you should prepare the database you want to use by creating indexes and constraints.
You should ensure that the Person and Movie nodes have unique id properties by creating constraints on them.
Creating a unique constraint also implicitly creates an index.
By indexing the id property, node lookup (e.g. by MATCH) will be much faster.
Additionally, it is a good idea to index the country name for a fast lookup.
1. Start neo4j.
Run the command:
bin/neo4j start
|
The default user name is |
2. Create a constraint so that each Person node has a unique id property.
You create a constraint on the id property of Person nodes to ensure that nodes with the Person label will have a unique id property.
Using Neo4j Browser, run the following Cypher:
CREATE CONSTRAINT personIdConstraint FOR (person:Person) REQUIRE person.id IS UNIQUE
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE CONSTRAINT personIdConstraint FOR (person:Person) REQUIRE person.id IS UNIQUE"
3. Create a constraint so that each Movie node has a unique id property.
You create a constraint on the id property of Movie nodes to ensure that nodes with the Movie label will have a unique id property.
Using Neo4j Browser, run the following Cypher:
CREATE CONSTRAINT movieIdConstraint FOR (movie:Movie) REQUIRE movie.id IS UNIQUE
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE CONSTRAINT movieIdConstraint FOR (movie:Movie) REQUIRE movie.id IS UNIQUE"
4. Create an index for Country node for the name property.
Create an index on the name property of Country nodes to ensure fast lookups.
|
When using |
Using Neo4j Browser, run the following Cypher:
CREATE INDEX FOR (c:Country) ON (c.name)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE INDEX FOR (c:Country) ON (c.name)"
Import data using LOAD CSV
1. Load the data from the persons.csv file.
You create nodes with the Person label and the properties id and name.
Using Neo4j Browser, run the following Cypher:
LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine
CREATE (p:Person {id: toInteger(csvLine.id), name: csvLine.name})
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine CREATE (p:Person {id:toInteger(csvLine.id), name:csvLine.name})'
Output:
Added 4 nodes, Set 8 properties, Added 4 labels
|
|
2. Load the data from the movies.csv file.
You create nodes with the Movie label and the properties id, title, and year.
Also you create nodes with the Country label.
Using MERGE avoids creating duplicate Country nodes in the case where multiple movies have the same country of origin.
The relationship with the type ORIGIN will connect the Country node and the Movie node.
Using Neo4j Browser, run the following Cypher:
LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine
MERGE (country:Country {name: csvLine.country})
CREATE (movie:Movie {id: toInteger(csvLine.id), title: csvLine.title, year:toInteger(csvLine.year)})
CREATE (movie)-[:ORIGIN]->(country)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine MERGE (country:Country {name:csvLine.country}) CREATE (movie:Movie {id:toInteger(csvLine.id), title:csvLine.title, year:toInteger(csvLine.year)}) CREATE (movie)-[:ORIGIN]->(country)'
Output:
Added 4 nodes, Created 3 relationships, Set 10 properties, Added 4 labels
3. Load the data from the roles.csv file
Importing the data from the roles.csv file is a matter of finding the Person node and the Movie node and then creating relationships between them.
|
For larger data files, good practice is to use the subquery |
Using Neo4j Browser, run the following Cypher:
:auto LOAD CSV WITH HEADERS FROM 'file:///roles.csv' AS csvLine
CALL {
WITH csvLine
MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)})
CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)
} IN TRANSACTIONS OF 2 ROWS
Using Cypher Shell, you can run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine CALL {WITH csvLine MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)}) CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)} IN TRANSACTIONS OF 2 ROWS'
Output:
Created 5 relationships, Set 5 properties
|
Note, |
Validate the imported data
Check the resulting data set by finding all the nodes that have a relationship.
Using Neo4j Browser, run the following Cypher:
MATCH (n)-[r]->(m) RETURN n, r, m
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'MATCH (n)-[r]->(m) RETURN n, r, m'
Output:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | r | m |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Movie {id: 2, title: "The American President", year: 1995}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Movie {id: 1, title: "Wall Street", year: 1987}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Person {name: "Martin Sheen", id: 3}) | [:ACTED_IN {role: "Carl Fox"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Charlie Sheen", id: 1}) | [:ACTED_IN {role: "Bud Fox"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Michael Douglas", id: 2}) | [:ACTED_IN {role: "Gordon Gekko"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Michael Douglas", id: 2}) | [:ACTED_IN {role: "President Andrew Shepherd"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
| (:Person {name: "Martin Sheen", id: 3}) | [:ACTED_IN {role: "A.J. MacInerney"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
| (:Person {name: "Morgan Freeman", id: 4}) | [:ACTED_IN {role: "Ellis Boyd 'Red' Redding"}] | (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+