Developer Guides Getting Started Getting Started What is a Graph Database? Intro to Graph DBs Video Series Concepts: RDBMS to Graph Concepts: NoSQL to Graph Getting Started Resources Neo4j Graph Platform Graph Platform Overview Neo4j Desktop Intro Neo4j Browser Intro… Read more →

Developer Guides

Want to Speak? Get $ back.

How-To: Neo4j ETL Tool

The ETL Tool is one of our Neo4j Labs projects. To learn more, visit the Neo4j ETL Tool Labs page.
Goals
Through this guide, you will learn how to take your data from a relational database or table format and import to Neo4j. You will understand how the data model is transformed behind-the-scenes and how to connect different types of data formats for loading.
Prerequisites
You should be familiar with Neo4j’s property-graph data model and relational databases. It is also helpful to have read the Concepts: Relational to Neo4j guide. We will focus on how to use this tool through the Neo4j Desktop application, so you should also know how to use the basic functionality of Neo4j Desktop.
Beginner

If you haven’t already, download Neo4j and use the provided instructions (shown when downloading) to get a project and database ready to run.

This import tool uses a command flag that requires Neo4j 3.4.0 and higher versions. If you are running an older version of Neo4j, you can upgrade the database or use our command line tool.

Once you have a project and a database instance created, you will need to go to the Graph Applications tab in Neo4j Desktop, copy and paste the https://r.neo4j.com/neo4j-etl-app url for the Neo4j ETL tool in the install box, and click the Install button.

Step 1

etl1a install graph app

Step 2

etl1b agree graph app install

Step 3: verify ETL tool is listed

300

Now you should have access, but you still need to add the application to your Neo4j Desktop. If you go back to your Projects icon along the left, and choose the project you want to work within, you should see an Applications bar along the top of the pane on the right. Within that section, you can click on the grey box with Add Application in it. A list of applications that you have access to add should appear in a popup. From there, you can click the Add button on the ETL app option and wait for it to install. You may have another popup to agree to terms before it installs.

Step 4

etl2 add app

Step 5

etl3 add project app

Once that completes, you can click on the new application pane to open the ETL tool.

Step 6: open ETL tool

etl4 load app

ETL Tool Steps and Process

Now that the application is all set up, there are only a few steps to import data.

  1. Set up a database connection for a relational database.
  2. Choose the relational database (from the step above) to import from and the Neo4j database to import to.
  3. Verify schema mapping and make adjustments to graph data model.
  4. Import the data with a chosen import method.

Pre-Steps: Loading ETL Tool and Choosing a Project

When you first load the ETL Tool, you will see a default project and the database instances in that project. If you do not have any graph database instances created, you will see an image like the one below. You will need to create a graph before the next steps with the ETL tool.

Once you have a database created, you will need to select a project to work within. This determines which Neo4j databases will populate as options for you to import the data in a later step. The screenshot below shows the steps, with the red highlighting and numbering to show where to click and in what order.

Set Up Connection to Relational Database

After choosing a project, we will need to set up a connection to a relational database. You can do this by clicking the Add Connection button in the lower left of the pane.

If you previously set up connections with the ETL tool, these will be remembered and listed. You can always create new connections and edit or remove existing connections.

When you click to add a new connection, you will see a form to enter the relational database connection details.

We recommend first choosing the type of database from the dropdown box in the middle of the page, as that will automatically fill in some of the other fields for you. The image below shows the list of relational databases included in the tool. If you are using a database that is not specified in the dropdown list, you can still connect by choosing the JDBC Driver option and filling in the details.

Both the MySQL and PostgreSQL connections pre-populate many of the configurations for you. For any of the other database options, you will also specify the driver file for that database.

Once you have filled in the form, click the Test and Save Connection button at the bottom. The results for whether it was successful or not will show in a blue (success) or red (error) message bar at the top of your window.

Choose the ‘From’ and ‘To’ Databases for Import

To begin the ETL process, we need to tell the application which relational database we want as the source and which Neo4j database we want as the target. In the screenshot below, you can see the list of relational databases on the left side of the pane, and the Neo4j database options on the right side of the pane.

Choose your relational database, then the Neo4j database and click the Start Mapping button in the lower righthand side. Just as before, a blue message bar will show at the top of your screen if the step was successful or a red one will show if step failed. In this case, it was successful, so our Start Mapping button inactivates, and the Next button activates to proceed to the next step!

Review data model transformation and make adjustments

This step is where the actual translation of the relational data into graph data happens. There are three rules the tool uses to convert from relational to graph.

  • A table with a foreign key is treated as a join and imported as a node with a relationship etl10 mapping rule1
  • A table with 2 foreign keys is treated as a join table and imported as a relationship etl10 mapping rule2
  • A table with >2 foreign keys is treated as n intermediate node and imported as a node with multiple relationships etl10 mapping rule3

Those rules create a graph data model like the one below.

This example is using the popular Northwind data set. You can download and test this data set using the links at the bottom of this page.

You can edit this mapping to change any of the translations, such as property names, data types, and relationships.

You can use the mouse to zoom in on the model or drag the image to focus on certain areas of the graph. If you do not see a component you are looking for, you can start typing the entity name in the search bar on the left side of the pane. Any matching results will show immediately.

Nodes and relationships are listed in respective tabs on the left side. To update, simply click the entity in the list. To edit the details, click on the pencil icon next to entity in the list or double click on the entity in the visualization on the right. A popup box will list the fields and offer options for any changes. You can click Save to apply your changes to the graph. The image below shows an example of some changes to relationship types.

Updated Graph Data Model (click to zoom)

etl11b update model

Once you have made any changes here, you can click Save Mapping. The status of the change will show in a blue (success) or red (error) message bar at the top of your window. If successful, you can click Next to go to the import step.

Choose an Import Method and Import the Data

There are 2 ways that the ETL Tool can import data to Neo4j. Each import method has certain requirements and advantages, which are listed below.

  • a. Bulk Import – fast loader for bulk import. Requires the graph database to be shutdown for loading.
  • b. Online Import – runs Cypher via BOLT connection for import. Graph database is online and active during the load.

After you choose your import method from the dropdown box, you can click the Import Data button in the lower right corner to start the load. If it is successful, you will see a screen similar to the one below (this example used the Bulk Import method).

Importing (click to zoom)

etl13 successful import

If it fails, you will see a red error message at the top of the screen, and you can troubleshoot the issue with the logs by clicking the See Logs button at the bottom.

This import tool uses a command flag that requires Neo4j 3.4.0 and higher versions. If you are running an older version of Neo4j, this step will fail. To continue, you can upgrade the database or use the command line tool.

Now, you can query the Neo4j database or use Neo4j Browser to verify the data loaded to properly. Your relational data has now been transformed to a graph!

How To Do This Example Yourself and Other ETL Options

If you want to test the ETL Tool, and you do not already have a data set, you can use the Northwind example, as we did here. We have included links to download both PostgreSQL and MySQL, if you don’t already have a relational database in mind.

  • Download db of choice – Postgresql, MySQL, or other option
  • Download JDBC driver (only if not using MySQL or PostgreSQL)
  • Insert data to relational db – PostgreSQL Northwind
  • Install ETL tool on Neo4j Desktop (or download GitHub command line tool), then follow import steps from this page.

There are also other options for ETL. Feel free to check out some partner integrations, the LOAD CSV functionality, and the APOC developer library.