As an administrator, you may need to populate data in a database from files that were extracted from an RDBMS.
In this module, you will learn how to import data from CSV files using the
import functionality of the
There are many ways that data can be imported into a Neo4j database where the source of the data is an RDBMS.
You have many options for importing data into Neo4j. Which option you choose depends on:
How much data you have.
What tools you are comfortable using.
How much time you have to perform the import.
The developers of your application may be responsible for importing the data. In other scenarios, you may be given a set of CSV files that will need to be imported into a database.
neo4j-admin tool is by far the quickest way to import data into the database.
CSV files contain one of:
row with header names
rows with data
header row and data
You must understand how headers relate to the data, what the field separator is, and make sure that the data is property formatted. You must ensure that the data in the CSV files is clean and ready for import.
When using neo4j-admin for import:
The database to be imported into must not exist as it will be created as part of the import.
The header information has additional information used for creating the nodes and relationships.
Node CSV files are structured differently from relationship CSV files.
All CSV files must use the same field separator.
You create the constraints (and indexes) after the import.
Here is portion of the beats.csv file with embedded header information for loading nodes of type Beat:
id:ID(beat-ref),:LABEL 1132,Beat 0813,Beat 0513,Beat
The beats.csv records represent data that will be loaded into a node with the label Beat. The id value is used to create the id property for the node. The ID(beat-ref) is used to store a reference to the node that is created so that it can be used later in the import.
Here is an example of the crimes_header.csv header file for loading nodes of type Crime:
And here is a portion of the associated crimes.csv file for loading nodes of type Crime:
8920441,Crime,12/07/2012 07:50:00 AM,AUTOMOBILE 4730813,Crime,05/09/2006 08:20:00 AM,POCKET-PICKING 7150780,Crime,09/28/2009 01:00:00 AM,CHILD ABANDONMENT 4556970,Crime,12/16/2005 08:39:24 PM,POSS: CANNABIS 30GMS OR LESS 9442492,Crime,12/28/2013 12:15:00 PM,OVER $500
The id value is used to create the id property for the node. The label for the node will be Crime. The date and description values are used to create the respective properties for each node. The ID(crime-ref) is used to store a reference to the node that is created so that it can be used later in the import.
Here is a portion of the primaryTypes.csv file for loading these nodes:
id:ID(primarytype-ref) ARSON OBSCENITY ROBBERY THEFT CRIM SEXUAL ASSAULT BURGLARY
The id value is used to create the id property for the node. The label to be used for creating the node is not specified in the header or the data so it will need to be specified in our argument to perform the import.
CSV files for loading relationships contain a row for every relationship where the ID for the starting and ending node is specified, as well as the relationship type.
Here is a portion of the crimesBeats.csv file that will be used to create the :ON_BEAT relationships between Crime and Beat nodes:
:START_ID(crime-ref),:END_ID(beat-ref),:TYPE 6978096,0911,ON_BEAT 3170923,2511,ON_BEAT 3073515,1012,ON_BEAT 8157905,0113,ON_BEAT
When the import tool processes this file, it has already saved references to the Crime and Beat nodes previously created. We specify the relationship to be created between the Crime and Beat nodes using the :TYPE column, in this case, ON_BEAT.
Here is a portion of a portion of the crimesPrimaryTypes.csv file that will be used to create the relationships between the Crime nodes and the nodes that contain the CrimeType data:
:START_ID(crime-ref),:END_ID(primarytype-ref) 5221115,NARCOTICS 4522835,DECEPTIVE PRACTICE 3432518,BATTERY 6439993,CRIMINAL TRESPASS
When the import tool processes this file, it has already saved references to the Crime and PrimaryType nodes previously created. There is no relationship specified in the data so we need to specify it in our argument when we import the data.
The relationship, :TYPE is not specified in this file so it will be specified in the arguments when you load the data from this file.
After you have created or obtained the CSV files for the data, you import the data. The data import creates a database so the database you specify must either be empty or not exist.
Here is the simplified syntax for creating a database from CSV files:
neo4j-admin import --database <database-name> --nodes [<rheader-csv-file-1>,]<csv-file-1> --nodes=<Label>=[<rheader-csv-file-2>,]<csv-file-2> --relationships [<jheader-csv-file-1>,]<join-csv-file-1> --relationships=<REL_TYPE>=[<jheader-csv-file-2>,]<join-csv-file-2> --trim-strings=true
This simplified syntax shows examples of specifying the Label for a node CSV file as well as a relationship type for a relationship CSV file In most cases, you will want to use the trim-strings argument to ensure that leading or trailing spaces are not included in the data imported.
|You must not have a space after the "," when specifying a header file with the CSV file.|
Refer to the Neo4j Operations Manual for details for using the import tool. Note that it is possible to specify regular expressions for the files specified when you import.
Make sure you have a terminal window open to your Docker Neo4j instance for this course.
Ensure the instance is started.
Download the ZIP file containing the CSV files to the files directory:
On OS X and Linux:
curl -O http://data.neo4j.com/admin-neo4j/crimes-dataset.zip
On Windows, open the file in a Web browser to download it.
Unzip the contents of the crimes-dataset.zip.
The crimes-dataset directory contains six CSV files. Move these six files to import folder for the Docker Neo4j instance.
Examine the CSV files. Make sure you understand the headers and how they will be used for the import.
Change directory to the import directory.
neo4j-admintool to import the CSV files using these guidelines:
--database crimes --nodes /import/crimes_header.csv,/import/crimes.csv --nodes /import/beats.csv --nodes=PrimaryType=/import/primaryTypes.csv --relationships /import/crimesBeats.csv --relationships=PRIMARY_TYPE=/import/crimesPrimaryTypes.csv --trim-strings=true
This is what you enter (ensure there are no newline characters):
[sudo] docker exec --interactive neo4j bin/neo4j-admin import --database crimes --nodes /import/crimes_header.csv,/import/crimes.csv --nodes /import/beats.csv --nodes=PrimaryType=/import/primaryTypes.csv --relationships /import/crimesBeats.csv --relationships=PRIMARY_TYPE=/import/crimesPrimaryTypes.csv --trim-strings=true
This import will take a couple of minutes. This is what you will see in the terminal window if all goes well:
cypher-shellcreate the crimes database.
Enter the following Cypher statements to view the schema of the database and return the number of nodes.
CALL db.schema.visualization(); MATCH (n) RETURN count(n);
The database information will now look as follows:
In this exercise you created a database from a set of CSV files using the import functionality of the
Before you will import data using
neo4j-admin, what is one thing you must do?
Select the correct answer.
Create the database.
Ensure the database does not exist.
Create the constraints in the database.
Create the indexes in the database.
Suppose that part of the import command that you issue to
neo4j-admin looks like this:
For this part of the import, where does the import process get information about the node labels?
Select the correct answers.
The products_header.csv file must have a field, USE_LABEL.
The products_header.csv file must have a field, :LABEL.
The products.csv file must have the label name in the corresponding USE_LABEL column.
The products.csv file must have the label name in corresponding the :LABEL column.
Need help? Ask in the Neo4j Community