Goals Through this guide, you will learn how to take your data from a relational database or table format and import to Neo4j. You will understand how the data model is transformed behind-the-scenes and how to connect different types of… Read more →

Goals
Through this guide, you will learn how to take your data from a relational database or table format and import to Neo4j. You will understand how the data model is transformed behind-the-scenes and how to connect different types of data formats for loading.
Prerequisites
You should be familiar with Neo4j’s property-graph data model and relational databases. It is also helpful to have read the Concepts: Relational to Neo4j guide. We will focus on how to use this tool through the Neo4j Desktop application, so you should also know how to use the basic functionality of Neo4j Desktop.
Beginner


Installing and Setting Up the ETL Tool

If you haven’t already, download Neo4j and use the provided instructions (shown when downloading) to get a project and database ready to run. Then, you will need to contact us for an activation key for the Neo4j ETL tool. Once you receive the key, go to your Neo4j Desktop under the User Profile icon along the left. Click on the + Add Activation Key link and copy/paste the ETL tool activation key into the white input box and click the Add button.

Now you should have access, but you still need to add the application to your Neo4j Desktop. If you go back to your Projects icon along the left, and choose the project you want to work within, you should see an Applications bar along the top of the pane on the right. Within that section, you can click on the grey box with Add Application in it. A list of applications that you have access to add should appear in a popup. From there, you can click the Add button on the ETL app option and wait for it to install. Note that you may have another popup to agree to terms before it installs.

The images below walk through each step.

Step 1

etl1 tool add license key

Step 2

etl2 tool add app

Step 3

300

Step 4

etl4 tool load app

ETL Tool Steps and Process

Now that the application is all set up, there are only a few steps to move through the process. We will discuss each of these in detail in the paragraphs below.

1) Set up a database connection for the relational database. 2) Choose the relational connection (from step above) to import from and the Neo4j database to import to. 3) Verify schema mapping and make adjustments to graph data model. 4) Import the data with using a chosen import method.

Pre-Steps: Loading ETL Tool and Choosing a Project

When you first load the ETL Tool, the main screen asks you to select a project to work within. This determines which Neo4j databases will populate as options for you to import the data in a later step. The image below shows the main ETL window and the dropdown box where you choose the project.

We recommend already having a Neo4j database created, so that it’s ready to for your import. If you don’t, we recommend completing that step now.

Set Up Connection to Relational Database

To begin the ETL process, we need to tell the application which relational database we want to import the data from. The screenshot below shows how to start this step, with the red highlighting and numbering to show where to click and in what order.

We recommend choosing the type of database from the dropdown box in the middle of the page, as that will determine some of the fields needed and will automatically fill in some of the values for you. Both the MySQL and PostgreSQL connections pre-populate much of the configurations for you. For any of the other database options, you will just need one extra step to specify the driver file for that database.

The image below shows the list of relational databases included in the tool. If you are using a database that is not specified in the dropdown list, you can still connect by choosing the JDBC Driver option and filling in the details.

Once you have filled in the necessary info, click the Test and Save Connection button at the bottom. The results for whether it was successful or not will show in a blue (success) or red (error) message bar at the top of your window.

Choose the ‘From’ and ‘To’ Databases for Import

The next step is to go to the next tab down on the left-side icon menu and choose which database you are importing from and which database you are importing to. In the screenshot below, you can see the list of relational databases at the top, and the Neo4j database options below them.

Choose your relational database, then the Neo4j database and verify the connection info on the right side for both databases. You can see in the screenshot that the Neo4j instance details are not populated because I hadn’t selected that instance yet. Once you click on each database instance, you should see the instance details on the right.

This example is for a PostgreSQL relational database, so your instance details will be different if you are using a different database type.

Now that you have chosen your databases, you can click the Start Mapping button in the lower righthand side.

Review data model transformation and make adjustments

This is where the actual translation of the relational data into graph data happens. There are three rules the tool uses to convert from relational to graph, as follows:

  • A table with a foreign key is treated as a join and imported as a node with a relationship etl9 mapping rule1
  • A table with 2 foreign keys is treated as a join table and imported as a relationship etl9 mapping rule2
  • A table with >2 foreign keys is treated as n intermediate node and imported as a node with multiple relationships etl9 mapping rule3

Those rules create a graph data model like the one below.

This example is using the popular Northwind data set. You can download and test this data set as well from links at the bottom of this page.

You can edit this mapping to clarify some of the weakly-named relationships. If you want to change anything from the relational model, such as property names and data types, you can change them here before the data is put into the graph. The image below shows an example of some changes.

Updated Graph Data Model (click to zoom)

etl10 update model

There are two ways you can also make edits behind-the-scenes. One is at this step, where the tool creates a mappings.json file under the database import directory.

In this file, you can change the data you want to load (remove or cut certain anything you do not want) or make the field/table changes mentioned above.

The other place to edit is right before import, where the tool creates csv files that it uses to import. These are created in a csv folder within the same import directory, and you can change those files to determine what gets loaded into the graph.

Those two edit points are shown more clearly in the process map below.

neo4j etl architecture

Choose an Import Method and Import the Data

There are 4 different ways that the ETL Tool can import data to Neo4j. Each import method has certain requirements and advantages, which are listed below.

  • a. Neo4j Import – fast loader for bulk import. Requires the graph database to be shutdown for loading.
  • b. Neo4j Shell – embedded importer. Also requires graph database to be shutdown for loading.
  • c. Cypher Shell – executes Cypher statements that are generated to an editable Cypher script. Graph database can be running.
  • d. Direct Cypher – has a direct BOLT connection for data import and also creates an editable Cypher script. Graph database can be running.

After you choose your import method from the dropdown box, you can click the Import button in the lower right corner to start the load. The lower part of the screen shows the logs of what the tool is actually executing for you, and the status returned. If it is successful, you will see a screen similar to the one below (this example used the standard Neo4j Import method).

Importing (click to zoom)

etl12 import success

Now, you can query the Neo4j database or use Neo4j Browser to verify the data loaded to properly. Your relational data has now been transformed to a graph, and you can start analyzing your data!

How To Do This Example Yourself and Other ETL Options

If you want to test the ETL Tool, and you don’t already have a data set, you can use the Northwind example, as we did here. We have included links to download both PostgreSQL and MySQL, if you don’t already have a relational database in mind.

  • Download db of choice – Postgresql, MySQL, or other option
  • Download JDBC driver (only if not using MySQL or PostgreSQL)
  • Insert data set to relational db – PostgreSQL Northwind
  • Install ETL tool on Neo4j Desktop (or download GitHub command line tool), then follow import steps from this page.

There are also other options for ETL. To add to the list, take a look at some partner integrations, the LOAD CSV functionality, and the APOC developer library.