Exploring Data Migration Techniques with Neo4j

Written by Juan Pablo Albuja, originally posted on the Dustland blog


For the last 2 years, Dustland has worked with Cisco on a number of web-based applications.  Recently, I was invited to attend Neo4J training at Cisco’s headquarters as they are starting to utilize Neo4J for a number of applications.  I attended a group training session consisting of other approved Cisco vendors.  During the training we learned how Cisco wanted to leverage the Neo4J platform and ways to import and migrate data.  This article looks at a couple explorations that were conducted during that training session.

Neo4J is the most popular graph database in the world.  Built in Java, comes in two versions; the Community version which is ideal for smaller applications and the Enterprise version that suits the needs of large, cloud-based applications with diverse integration points.  As mentioned before, Neo4J is a graph database meaning that it stores data as node, relationships or edges and properties versus traditional tables.

A key advantage of using a Graph database like Neo4J is the intuitive way to model the information. It is different from relational databases because you don’t need to think in terms of related tables. Instead, you need to think in terms of data relationships.  Much like we do in everyday life, data is organized in terms of nodes that are connected naturally and logically.  This process is covered in length in the following articles:https://en.wikipedia.org/wiki/Graph_theory and https://neo4j.com/docs/stable/data-modeling-examples.html.

Neo4J uses Cypher Query for querying data and manage data modifications.  If you have a significant number of relationships between the data entities there is no need for using joins as you would with relational databases.  This results in improved data performance as there is no need to compute Cartesian products of the indices of the tables when you are doing a join operation. With Neo4J when you are querying, you follow simple paths to find the related information.

Another big advantage with Neo4J is the speed to import data. Importing data from one system to another is one of the most critical tasks when migrating between systems. Neo4J has many techniques to import data to build the Graph. One of the easiest ways to accomplish this task is to use Cypher statements.   With Cypher you can use leverage simple CREATE and MERGE statements to import the data.  In addition, there are many JDBC project that can be used to connect your application to NEO4J https://neo4j.com/developer/language-guides/.  This is ideal for smaller batches of data that need to be migrated and/or merged.

When importing big data, the batch-import project (https://github.com/jexp/batch-import/tree/20) provided some key performance and data organization benefits.   Data is imported from a tab-separated CSV file for the nodes and another CSV for the relationships.   The project generates the Neo4J data with an index structure.  This approach is powerful for the first time import of data because it generates the index from scratch.

Finally, we experimented was the LOAD CSV technique that is described in this article – https://neo4j.com/docs/stable/cypherdoc-importing-csv-files-with-cypher.html.  As is often the case, data is delivered in CSV files.   You must generate CSV files that represents your nodes and relationships and then execute Cypher statements over the CSV files. This technique is incredible fast to process hundreds of thousands of data (few minutes) and also can be really useful to do incremental approaches. The way to execute those cypher statements in a really fast way is using the Neo4J shell that comes also with the Community Edition.

Neo4J provides several improvements on traditional data import and migration methodologies commonly used with relational databases.  It is a very useful tool for organizing and retrieving relationships in a high performance environment.  Perhaps most importantly, the Graph model lends itself to intuitive and logical data models simply by just connecting nodes.

– See more at: https://www.dustland.com/blog/exploring-data-migration-techniques-with-neo4j#sthash.OMcUGsMj.dpuf

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook