Developer Guides Getting Started Getting Started What is a Graph Database? Intro to Graph DBs Video Series Concepts: RDBMS to Graph Concepts: NoSQL to Graph Getting Started Resources Neo4j Graph Platform Graph Platform Overview Neo4j Desktop Intro Neo4j Browser Intro… Read more →
How-To: Neo4j ETL Tool
Through this guide, you will learn how to take your data from a relational database or table format and import to Neo4j. You will understand how the data model is transformed behind-the-scenes and how to connect different types of data formats for loading.
You should be familiar with Neo4j’s property-graph data model and relational databases. It is also helpful to have read the Concepts: Relational to Neo4j guide. We will focus on how to use this tool through the Neo4j Desktop application, so you should also know how to use the basic functionality of Neo4j Desktop.
If you haven’t already, download Neo4j and use the provided instructions (shown when downloading) to get a project and database ready to run.
Once you have a project and a database instance created, you will need to go to the
Graph Applications tab in Neo4j Desktop, copy and paste the
https://r.neo4j.com/neo4j-etl-app url for the Neo4j ETL tool in the install box, and click the
Now you should have access, but you still need to add the application to your Neo4j Desktop.
If you go back to your
Projects icon along the left, and choose the project you want to work within, you should see an
Applications bar along the top of the pane on the right.
Within that section, you can click on the grey box with
Add Application in it.
A list of applications that you have access to add should appear in a popup.
From there, you can click the
Add button on the ETL app option and wait for it to install.
You may have another popup to agree to terms before it installs.
Once that completes, you can click on the new application pane to open the ETL tool.
Now that the application is all set up, there are only a few steps to import data.
- Set up a database connection for a relational database.
- Choose the relational database (from the step above) to import from and the Neo4j database to import to.
- Verify schema mapping and make adjustments to graph data model.
- Import the data with a chosen import method.
When you first load the ETL Tool, you will see a default project and the database instances in that project. If you do not have any graph database instances created, you will see an image like the one below. You will need to create a graph before the next steps with the ETL tool.
Once you have a database created, you will need to select a project to work within. This determines which Neo4j databases will populate as options for you to import the data in a later step. The screenshot below shows the steps, with the red highlighting and numbering to show where to click and in what order.
After choosing a project, we will need to set up a connection to a relational database.
You can do this by clicking the
Add Connection button in the lower left of the pane.
When you click to add a new connection, you will see a form to enter the relational database connection details.
We recommend first choosing the type of database from the dropdown box in the middle of the page, as that will automatically fill in some of the other fields for you.
The image below shows the list of relational databases included in the tool.
If you are using a database that is not specified in the dropdown list, you can still connect by choosing the
JDBC Driver option and filling in the details.
Both the MySQL and PostgreSQL connections pre-populate many of the configurations for you. For any of the other database options, you will also specify the driver file for that database.
Once you have filled in the form, click the
Test and Save Connection button at the bottom.
The results for whether it was successful or not will show in a blue (success) or red (error) message bar at the top of your window.
To begin the ETL process, we need to tell the application which relational database we want as the source and which Neo4j database we want as the target. In the screenshot below, you can see the list of relational databases on the left side of the pane, and the Neo4j database options on the right side of the pane.
Choose your relational database, then the Neo4j database and click the
Start Mapping button in the lower righthand side.
Just as before, a blue message bar will show at the top of your screen if the step was successful or a red one will show if step failed.
In this case, it was successful, so our
Start Mapping button inactivates, and the
Next button activates to proceed to the next step!
This step is where the actual translation of the relational data into graph data happens. There are three rules the tool uses to convert from relational to graph.
- A table with a foreign key is treated as a join and imported as a node with a relationship
- A table with 2 foreign keys is treated as a join table and imported as a relationship
- A table with >2 foreign keys is treated as n intermediate node and imported as a node with multiple relationships
Those rules create a graph data model like the one below.
You can edit this mapping to change any of the translations, such as property names, data types, and relationships.
You can use the mouse to zoom in on the model or drag the image to focus on certain areas of the graph. If you do not see a component you are looking for, you can start typing the entity name in the search bar on the left side of the pane. Any matching results will show immediately.
Nodes and relationships are listed in respective tabs on the left side.
To update, simply click the entity in the list.
To edit the details, click on the pencil icon next to entity in the list or double click on the entity in the visualization on the right.
A popup box will list the fields and offer options for any changes.
You can click
Save to apply your changes to the graph.
The image below shows an example of some changes to relationship types.
Once you have made any changes here, you can click
The status of the change will show in a blue (success) or red (error) message bar at the top of your window.
If successful, you can click
Next to go to the import step.
There are 2 ways that the ETL Tool can import data to Neo4j. Each import method has certain requirements and advantages, which are listed below.
- a. Bulk Import – fast loader for bulk import. Requires the graph database to be shutdown for loading.
- b. Online Import – runs Cypher via BOLT connection for import. Graph database is online and active during the load.
After you choose your import method from the dropdown box, you can click the
Import Data button in the lower right corner to start the load.
If it is successful, you will see a screen similar to the one below (this example used the Bulk Import method).
If it fails, you will see a red error message at the top of the screen, and you can troubleshoot the issue with the logs by clicking the
See Logs button at the bottom.
Now, you can query the Neo4j database or use Neo4j Browser to verify the data loaded to properly. Your relational data has now been transformed to a graph!
If you want to test the ETL Tool, and you do not already have a data set, you can use the Northwind example, as we did here. We have included links to download both PostgreSQL and MySQL, if you don’t already have a relational database in mind.