Online Course Basic Neo4j 4.0 Administration Introduction to Neo4j Overview of Neo4j Administration Managing a Neo4j Instance Using cypher-shell to Manage Databases Copying Databases Changing the Database Location Checking Database Consistency Scripting to Manage Databases Configuring Plugins Managing HTTP Ports… Read more →

Importing Data

About this module

As an administrator, you may need to populate data in a database from files that were extracted from an RDBMS.

In this module, you will learn how to import data from CSV files using the import functionality of the neo4j-admin tool.

Importing data

There are many ways that data can be imported into a Neo4j database where the source of the data is an RDBMS.

You have many options for importing data into Neo4j. Which option you choose depends on:

  • How much data you have.
  • What tools you are comfortable using.
  • How much time you have to perform the import.

Options for importing data into Neo4j

ImportOptions

The developers of your application may be responsible for importing the data. In other scenarios, you may be given a set of CSV files that will need to be imported into a database.

Using the neo4j-admin tool is by far the quickest way to import data into the database.

CSV files for import

CSV files contain one of:

  • row with header names
  • rows with data
  • header row and data

You must understand how headers relate to the data, what the field separator is, and make sure that the data is property formatted. You must ensure that the data in the CSV files is clean and ready for import.

Using neo4j-admin for import

When using neo4j-admin for import:

  • The database to be imported into must not exist as it will be created as part of the import.
  • The header information has additional information used for creating the nodes and relationships.
  • Node CSV files are structured differently from relationship CSV files.
  • All CSV files must use the same field separator.
  • You create the constraints (and indexes) after the import.

Example: CSV file for nodes: Beat

Here is portion of the beats.csv file with embedded header information for loading nodes of type Beat:

id:ID(beat-ref),:LABEL
1132,Beat
0813,Beat
0513,Beat

The beats.csv records represent data that will be loaded into a node with the label Beat. The id value is used to create the id property for the node. The ID(beat-ref) is used to store a reference to the node that is created so that it can be used later in the import.

Example: CSV files for nodes: Crime

Here is an example of the crimes_header.csv header file for loading nodes of type Crime:

id:ID(crime-ref),:LABEL,date,description

And here is a portion of the associated crimes.csv file for loading nodes of type Crime:

8920441,Crime,12/07/2012 07:50:00 AM,AUTOMOBILE
4730813,Crime,05/09/2006 08:20:00 AM,POCKET-PICKING
7150780,Crime,09/28/2009 01:00:00 AM,CHILD ABANDONMENT
4556970,Crime,12/16/2005 08:39:24 PM,POSS: CANNABIS 30GMS OR LESS
9442492,Crime,12/28/2013 12:15:00 PM,OVER $500

The id value is used to create the id property for the node. The label for the node will be Crime. The date and description values are used to create the respective properties for each node. The ID(crime-ref) is used to store a reference to the node that is created so that it can be used later in the import.

Example: CSV file for nodes: PrimaryType

Here is a portion of the primaryTypes.csv file for loading these nodes:

id:ID(primarytype-ref)
ARSON
OBSCENITY
ROBBERY
THEFT
CRIM SEXUAL ASSAULT
BURGLARY

The id value is used to create the id property for the node. The label to be used for creating the node is not specified in the header or the data so it will need to be specified in our argument to perform the import.

CSV files for relationships

CSV files for loading relationships contain a row for every relationship where the ID for the starting and ending node is specified, as well as the relationship type.

Here is a portion of the crimesBeats.csv file that will be used to create the :ON_BEAT relationships between Crime and Beat nodes:

:START_ID(crime-ref),:END_ID(beat-ref),:TYPE
6978096,0911,ON_BEAT
3170923,2511,ON_BEAT
3073515,1012,ON_BEAT
8157905,0113,ON_BEAT

When the import tool processes this file, it has already saved references to the Crime and Beat nodes previously created. We specify the relationship to be created between the Crime and Beat nodes using the :TYPE column, in this case, ON_BEAT.

Here is a portion of a portion of the crimesPrimaryTypes.csv file that will be used to create the relationships between the Crime nodes and the nodes that contain the CrimeType data:

:START_ID(crime-ref),:END_ID(primarytype-ref)
5221115,NARCOTICS
4522835,DECEPTIVE PRACTICE
3432518,BATTERY
6439993,CRIMINAL TRESPASS

When the import tool processes this file, it has already saved references to the Crime and PrimaryType nodes previously created. There is no relationship specified in the data so we need to specify it in our argument when we import the data.

The relationship, :TYPE is not specified in this file so it will be specified in the arguments when you load the data from this file.

Importing the data

After you have created or obtained the CSV files for the data, you import the data. The data import creates a database so the database you specify must either be empty or should not exist.

Here is the simplified syntax for creating a database from CSV files:

neo4j-admin import
  --database <database-name>
  --nodes [<rheader-csv-file-1>,]<csv-file-1>
  --nodes=<Label>=[<rheader-csv-file-2>,]<csv-file-2>
  --relationships [<jheader-csv-file-1>,]<join-csv-file-1>
  --relationships=<REL_TYPE>=[<jheader-csv-file-2>,]<join-csv-file-2>
  --trim-strings=true

This simplified syntax shows examples of specifying the Label for a node CSV file as well as a relationship type for a relationship CSV file In most cases, you will want to use the trim-strings argument to ensure that leading or trailing spaces are not included in the data imported.

Note
You should not have a space after the “,” when specifying a header file with the CSV file.

You should refer to the Neo4j Operations Manual for details for using the import tool. Note that it is possible to specify regular expressions for the files specified when you import.

Example: Importing the data

Here is the what you will be doing in the next exercise to use the import command of neo4j-admin to create a database and import CSV files.

ImportCrimes1

Exercise #12: Importing data

In this exercise you will create a database from a set of CSV files using the import functionality of the neo4j-admin tool.

Before you begin

  1. Make sure you have a terminal window open to your Docker Neo4j instance for this course.
  2. Ensure the instance is started.

Exercise steps:

  1. Download the ZIP file containing the CSV files to the files directory:

On OS X and Linux:

cd /home/ubuntu/files
curl -O http://data.neo4j.com/admin-neo4j/crimes-dataset.zip

On Windows, open the file in a Web browser to download it.

  1. Unzip the contents of the crimes-dataset.zip.
  2. The crimes-dataset directory should contain six CSV files. Move these six files to import folder for the Docker Neo4j instance.
  3. Examine the CSV files. Make sure you understand the headers and how they will be used for the import.
  4. Change directory to the import directory.
  5. Use the neo4j-admin tool to import the CSV files using these guidelines:
             --database crimes
             --nodes /import/crimes_header.csv,/import/crimes.csv
             --nodes /import/beats.csv
             --nodes=PrimaryType=/import/primaryTypes.csv
             --relationships /import/crimesBeats.csv
             --relationships=PRIMARY_TYPE=/import/crimesPrimaryTypes.csv
             --trim-strings=true

This is what you should enter (ensure there are no newline characters):

[sudo] docker exec --interactive neo4j bin/neo4j-admin import --database crimes --nodes /import/crimes_header.csv,/import/crimes.csv --nodes /import/beats.csv --nodes=PrimaryType=/import/primaryTypes.csv --relationships /import/crimesBeats.csv --relationships=PRIMARY_TYPE=/import/crimesPrimaryTypes.csv --trim-strings=true

This import will take a couple of minutes. This is what you should see in the terminal window if all goes well:

ImportCrimesDocker1

  1. In cypher-shell create the crimes database.
  2. Enter the following Cypher statements to view the schema of the database and return the number of nodes.
:use crimes
CALL db.schema.visualization();
MATCH (n) RETURN count(n);

The database information should now look as follows:

Afteradmin-toolImportDocker

Exercise summary

In this exercise you created a database from a set of CSV files using the import functionality of the neo4j-admin tool.

Check your understanding

Question 1

Before you will import data using neo4j-admin, what is one thing you must do?

Select the correct answer.

  • Create the database.
  • Ensure the database does not exist.
  • Create the constraints in the database.
  • Create the indexes in the database.

Question 2

Suppose that part of the import command that you issue to neo4j-admin looks like this:

--nodes products_header.csv,products.csv

For this part of the import, where does the import process get information about the node labels?

Select the correct answers.

  • The products_header.csv file must have a field, USE_LABEL.
  • The products_header.csv file must have a field, :LABEL.
  • The products.csv file must have the label name in the corresponding USE_LABEL column.
  • The products.csv file must have the label name in corresponding the :LABEL column.

Question 3

Suppose you want to import data using six node CSV files and eight relationship CSV files. How many times must you execute the import process using neo4j-admin?

Select the correct answer.

  • 1
  • 6
  • 8
  • 14

Summary

You should now be able to import data from a set of CSV files using the neo4j-admin tool.

Stay Connected

Sign up to find out more about Neo4j's upcoming events & meetups.