Goals Through this guide, you will learn how to take your data from a relational database or table format and import to Neo4j. You will understand how the data model is transformed behind-the-scenes and how to connect different types of… Read more →
Through this guide, you will learn how to take your data from a relational database or table format and import to Neo4j. You will understand how the data model is transformed behind-the-scenes and how to connect different types of data formats for loading.
You should be familiar with Neo4j’s property-graph data model and relational databases. It is also helpful to have read the Concepts: Relational to Neo4j guide. We will focus on how to use this tool through the Neo4j Desktop application, so you should also know how to use the basic functionality of Neo4j Desktop.
- Installing and Setting Up the ETL Tool
- ETL Tool Steps and Process
- Pre-Steps: Loading ETL Tool and Choosing a Project
- Set Up Connection to Relational Database
- Choose the ‘From’ and ‘To’ Databases for Import
- Review data model transformation and make adjustments
- Choose an Import Method and Import the Data
- How To Do This Example Yourself and Other ETL Options
Installing and Setting Up the ETL Tool
If you haven’t already, download Neo4j and use the provided instructions (shown when downloading) to get a project and database ready to run.
Then, you will need to contact us for an activation key for the Neo4j ETL tool.
Once you receive the key, go to your Neo4j Desktop under the
User Profile icon along the left.
Click on the
+ Add Activation Key link and copy/paste the ETL tool activation key into the white input box and click the
Now you should have access, but you still need to add the application to your Neo4j Desktop.
If you go back to your
Projects icon along the left, and choose the project you want to work within, you should see an
Applications bar along the top of the pane on the right.
Within that section, you can click on the grey box with
Add Application in it.
A list of applications that you have access to add should appear in a popup.
From there, you can click the
Add button on the ETL app option and wait for it to install.
Note that you may have another popup to agree to terms before it installs.
The images below walk through each step.
ETL Tool Steps and Process
Now that the application is all set up, there are only a few steps to move through the process. We will discuss each of these in detail in the paragraphs below.
1) Set up a database connection for the relational database. 2) Choose the relational connection (from step above) to import from and the Neo4j database to import to. 3) Verify schema mapping and make adjustments to graph data model. 4) Import the data with using a chosen import method.
Pre-Steps: Loading ETL Tool and Choosing a Project
When you first load the ETL Tool, the main screen asks you to select a project to work within. This determines which Neo4j databases will populate as options for you to import the data in a later step. The image below shows the main ETL window and the dropdown box where you choose the project.
Set Up Connection to Relational Database
To begin the ETL process, we need to tell the application which relational database we want to import the data from. The screenshot below shows how to start this step, with the red highlighting and numbering to show where to click and in what order.
We recommend choosing the type of database from the dropdown box in the middle of the page, as that will determine some of the fields needed and will automatically fill in some of the values for you. Both the MySQL and PostgreSQL connections pre-populate much of the configurations for you. For any of the other database options, you will just need one extra step to specify the driver file for that database.
The image below shows the list of relational databases included in the tool.
If you are using a database that is not specified in the dropdown list, you can still connect by choosing the
JDBC Driver option and filling in the details.
Once you have filled in the necessary info, click the
Test and Save Connection button at the bottom.
The results for whether it was successful or not will show in a blue (success) or red (error) message bar at the top of your window.
Choose the ‘From’ and ‘To’ Databases for Import
The next step is to go to the next tab down on the left-side icon menu and choose which database you are importing from and which database you are importing to. In the screenshot below, you can see the list of relational databases at the top, and the Neo4j database options below them.
Choose your relational database, then the Neo4j database and verify the connection info on the right side for both databases. You can see in the screenshot that the Neo4j instance details are not populated because I hadn’t selected that instance yet. Once you click on each database instance, you should see the instance details on the right.
Now that you have chosen your databases, you can click the
Start Mapping button in the lower righthand side.
Review data model transformation and make adjustments
This is where the actual translation of the relational data into graph data happens. There are three rules the tool uses to convert from relational to graph, as follows:
- A table with a foreign key is treated as a join and imported as a node with a relationship
- A table with 2 foreign keys is treated as a join table and imported as a relationship
- A table with >2 foreign keys is treated as n intermediate node and imported as a node with multiple relationships
Those rules create a graph data model like the one below.
You can edit this mapping to clarify some of the weakly-named relationships. If you want to change anything from the relational model, such as property names and data types, you can change them here before the data is put into the graph. The image below shows an example of some changes.
There are two ways you can also make edits behind-the-scenes. One is at this step, where the tool creates a mappings.json file under the database import directory.
In this file, you can change the data you want to load (remove or cut certain anything you do not want) or make the field/table changes mentioned above.
The other place to edit is right before import, where the tool creates csv files that it uses to import. These are created in a csv folder within the same import directory, and you can change those files to determine what gets loaded into the graph.
Those two edit points are shown more clearly in the process map below.
Choose an Import Method and Import the Data
There are 4 different ways that the ETL Tool can import data to Neo4j. Each import method has certain requirements and advantages, which are listed below.
- a. Neo4j Import – fast loader for bulk import. Requires the graph database to be shutdown for loading.
- b. Neo4j Shell – embedded importer. Also requires graph database to be shutdown for loading.
- c. Cypher Shell – executes Cypher statements that are generated to an editable Cypher script. Graph database can be running.
- d. Direct Cypher – has a direct BOLT connection for data import and also creates an editable Cypher script. Graph database can be running.
After you choose your import method from the dropdown box, you can click the
Import button in the lower right corner to start the load.
The lower part of the screen shows the logs of what the tool is actually executing for you, and the status returned.
If it is successful, you will see a screen similar to the one below (this example used the standard Neo4j Import method).
Now, you can query the Neo4j database or use Neo4j Browser to verify the data loaded to properly. Your relational data has now been transformed to a graph, and you can start analyzing your data!
How To Do This Example Yourself and Other ETL Options
If you want to test the ETL Tool, and you don’t already have a data set, you can use the Northwind example, as we did here. We have included links to download both PostgreSQL and MySQL, if you don’t already have a relational database in mind.