Import data
This tutorial demonstrates how to import data from CSV files using
LOAD CSV
.
With the combination of the Cypher clauses LOAD CSV
, MERGE
, and CREATE
you can conveniently import data into Neo4j.
LOAD CSV
allows you to access the data values and perform actions on them.
|
1. The data files
In this tutorial, you import data from the following CSV files:
-
persons.csv
-
movies.csv
-
roles.csv
The content of the persons.csv file:
id,name
1,Charlie Sheen
2,Michael Douglas
3,Martin Sheen
4,Morgan Freeman
The persons.csv file contains two columns id
and name
.
Each row represents one person that has a unique id
and a name
.
The content of the movies.csv file:
id,title,country,year
1,Wall Street,USA,1987
2,The American President,USA,1995
3,The Shawshank Redemption,USA,1994
The movies.csv file contains the columns id
, title
, country
, and year
.
Each row represents one movie that has a unique id
, a title
, a country
of origin, and a release year
.
The content of the roles.csv file:
personId,movieId,role
1,1,Bud Fox
4,1,Carl Fox
3,1,Gordon Gekko
4,2,A.J. MacInerney
3,2,President Andrew Shepherd
5,3,Ellis Boyd 'Red' Redding
The roles.csv file contains the columns personId
, movieId
, and role
.
Each row represents one role with relationship data about the person id
(from the persons.csv file) and the movie id
(from the movies.csv file).
2. The graph model
The following simple data model shows what a graph model for this data set could look like:
This is the resulting graph, based on the data from the CSV files:
3. Prerequisites
This tutorial uses the Linux or macOS tarball installation.
It assumes that your current work directory is the <neo4j-home> directory of the tarball installation, and the CSV files are placed in the default import directory.
|
4. Prepare the database
Before importing the data, you should prepare the database you want to use by creating indexes and constraints.
You should ensure that the Person
and Movie
nodes have unique id
properties by creating constraints on them.
Creating a unique constraint also implicitly creates an index.
By indexing the id
property, node lookup (e.g. by MATCH
) will be much faster.
Additionally, it is a good idea to index the country name
for a fast lookup.
1. Start neo4j.
Run the command:
bin/neo4j start
The default user name is |
2. Create a constraint so that each Person
node has a unique id
property.
You create a constraint on the id
property of Person
nodes to ensure that nodes with the Person
label will have a unique id
property.
Using Neo4j Browser, run the following Cypher:
CREATE CONSTRAINT personIdConstraint ON (person:Person) ASSERT person.id IS UNIQUE
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE CONSTRAINT personIdConstraint ON (person:Person) ASSERT person.id IS UNIQUE"
3. Create a constraint so that each Movie
node has a unique id
propery.
You create a constraint on the id
property of Movie
nodes to ensure that nodes with the Movie
label will have a unique id
property.
Using Neo4j Browser, run the following Cypher:
CREATE CONSTRAINT movieIdConstraint ON (movie:Movie) ASSERT movie.id IS UNIQUE
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE CONSTRAINT movieIdConstraint ON (movie:Movie) ASSERT movie.id IS UNIQUE"
4. Create an index for Country
node for the name
property.
Create an index on the name
property of Country
nodes to ensure fast lookups.
When using |
Using Neo4j Browser, run the following Cypher:
CREATE INDEX FOR (c:Country) ON (c.name)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j "CREATE INDEX FOR (c:Country) ON (c.name)"
5. Import data using LOAD CSV
1. Load the data from the persons.csv file.
You create nodes with the Person
label and the the the properties id
and name
.
Using Neo4j Browser, run the following Cypher:
LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine
CREATE (p:Person {id: toInteger(csvLine.id), name: csvLine.name})
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///persons.csv" AS csvLine CREATE (p:Person {id:toInteger(csvLine.id), name:csvLine.name})'
Output:
Added 4 nodes, Set 8 properties, Added 4 labels
|
2. Load the data from the movies.csv file.
You create nodes with the Movie
label and the the the properties id
, title
, and year
.
Also you create nodes with the Country
label.
Using MERGE
avoids creating duplicate Country
nodes in the case where multiple movies have the same country of origin.
The relationship with the type ORIGIN
will connect the Country
node and the Movie
node.
Using Neo4j Browser, run the following Cypher:
LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine
MERGE (country:Country {name: csvLine.country})
CREATE (movie:Movie {id: toInteger(csvLine.id), title: csvLine.title, year:toInteger(csvLine.year)})
CREATE (movie)-[:ORIGIN]->(country)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'LOAD CSV WITH HEADERS FROM "file:///movies.csv" AS csvLine MERGE (country:Country {name:csvLine.country}) CREATE (movie:Movie {id:toInteger(csvLine.id), title:csvLine.title, year:toInteger(csvLine.year)}) CREATE (movie)-[:ORIGIN]->(country)'
Output:
Added 4 nodes, Created 3 relationships, Set 10 properties, Added 4 labels
3. Load the data from the roles.csv file
Importing the data from the roles.csv file is a matter of finding the Person
node and Movie
node and then creating relationships between them.
For larger data files, it is useful to use the hint |
Using Neo4j Browser, run the following Cypher:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine
MATCH (person:Person {id: toInteger(csvLine.personId)}), (movie:Movie {id: toInteger(csvLine.movieId)})
CREATE (person)-[:ACTED_IN {role: csvLine.role}]->(movie)
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'USING PERIODIC COMMIT 500 LOAD CSV WITH HEADERS FROM "file:///roles.csv" AS csvLine MATCH (person:Person {id:toInteger(csvLine.personId)}), (movie:Movie {id:toInteger(csvLine.movieId)}) CREATE (person)-[:ACTED_IND {role:csvLine.role}]->(movie)'
Output:
Created 5 relationships, Set 5 properties
6. Validate the imported data
Check the resulting data set by finding all the nodes that have a relationship.
Using Neo4j Browser, run the following Cypher:
MATCH (n)-[r]->(m) RETURN n, r, m
Or using Neo4j Cypher Shell, run the command:
bin/cypher-shell --database=neo4j 'MATCH (n)-[r]->(m) RETURN n, r, m'
Output:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n | r | m |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| (:Movie {id: 3, title: "The Shawshank Redemption", year: 1994}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Movie {id: 2, title: "The American President", year: 1995}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Movie {id: 1, title: "Wall Street", year: 1987}) | [:ORIGIN] | (:Country {name: "USA"}) |
| (:Person {name: "Morgan Freeman", id: 4}) | [:ACTED_IN {role: "Carl Fox"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Charlie Sheen", id: 1}) | [:ACTED_IN {role: "Bud Fox"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Martin Sheen", id: 3}) | [:ACTED_IN {role: "Gordon Gekko"}] | (:Movie {id: 1, title: "Wall Street", year: 1987}) |
| (:Person {name: "Martin Sheen", id: 3}) | [:ACTED_IN {role: "President Andrew Shepherd"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
| (:Person {name: "Morgan Freeman", id: 4}) | [:ACTED_IN {role: "A.J. MacInerney"}] | (:Movie {id: 2, title: "The American President", year: 1995}) |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Was this page helpful?