Michael Hunger, Spring Data Neo4j and community leader at Neo Technology,  heads into the Neo4j labs and imports 2 billion nodes (2 properties) and 20 billion relationships (1 property) 1.4 TB of data.

As massive data insertion performance has bothered me for a while, I made it the subject of my last lab days (20% time) at Neo4j. The results of my work are available on GitHub and I explain the approach below.

Data Insertion issues

When getting started with a new database like the graph database Neo4j it is important to quickly get an initial data-set to work with. Either you write a data-generator to generate it or you have existing data in relational- or NOSQL-databases that you want to import. In both cases the import is unusual as oftentimes hundreds of millions or billions of nodes and relationships have to be imported in a short time. The normal write load of a graph database doesn’t cater for those insertion speeds. That’s why Neo4j has a BatchInserter that is able to import data quickly by loosening the transactional constraints but in doing so only working in a single thread. If for instance only nodes w/o properties are imported, the inserter reaches a speed of 1 million nodes per second which is nice. But as soon as relationships and properties come into the picture the insertion speed drops noticeably. The reason for that degradation is that the single-threaded approach doesn’t utilize all available resources in a modern system. Neither the plethora of CPU’s nor the high concurrent throughput (up to 200MB/s) of modern (SSD) even in multiple streams is used.

Approach to Parallelization

So the idea was to identify independent parts of the insertion process which can be run in parallel and possibly also parallelized in themselves. I used the LMAX Disruptor as a simple and very scalable framework to parallelize the tasks. It uses a large ring-buffer filled with preallocated, struct-like objects and a lock free implementation for synchronizing producer and consumer tasks. Disruptor achieves high throughput and low latency operations on modern hardware. The 7 distinct operations of the batch-insertion process are:
  1. node-id generation
  2. property encoding
  3. property-record creation
  4. relationship-id creation and forward handling of reverse relationship chains
  5. writing node-records
  6. writing relationship-records
  7. writing property-records
The node-id generation doesn’t have to performed as a separate task b/c disruptor already has a sequence id that increases linearly. Except for the property-encoding all other operations are currently executed in only one instance due to single shared state – either the generated id’s or the writing to the store-files. It is possible to stripe them but right now that was out of scope.
Read the full article.  

Keywords: