Developer

Neo4j 2.1 – Graph ETL for everyone

Senior Developer Advocate, Neo4j

May 29, 2014

4 min read

It’s an exciting time for Neo4j users and, of course, the Neo4j team as we’re releasing the 2.1 version of Neo4j! You’ve probably already seen the amazing strides we’ve taken when releasing our 2.0 version at the start of the year, and Neo4j 2.1 continues to improve the user experience while delivering some impressive under-the-hood improvements, and some interesting work on boosting Cypher too.

Easy import with ETL features directly in Cypher

Graphs are everywhere, but sometimes they’re buried in other systems and legacy databases. You need to extract the data then bring it into Neo4j to experience its true graph form. To help you do this, we’ve brought bulk load functionality directly into Cypher. The new LOAD CSV clause makes that a pleasant and simple task, optimized for graphs around millions scale – the kind of size that folks typically encounter when getting started with Neo4j.

To illustrate, consider this small set of fictional Twitter users and their followers:

user	follower
Charlie Sheen	Morgan Freeman
Charlie Sheen	Oliver Stone
Oliver Stone	Charlie Sheen
Michael Douglas	Oliver Stone
Michael Douglas	Morgan Freeman
Martin Sheen	Oliver Stone
Martin Sheen	Morgan Freeman
Martin Sheen	Charlie Sheen
Morgan Freeman	Charlie Sheen

We can easily represent this as a CSV file as follows:

user,follower
Charlie Sheen,Morgan Freeman
Charlie Sheen,Oliver Stone
Oliver Stone,Charlie Sheen
Michael Douglas,Oliver Stone
Michael Douglas,Morgan Freeman
Martin Sheen,Oliver Stone
Martin Sheen,Morgan Freeman
Martin Sheen,Charlie Sheen
Morgan Freeman,Charlie Sheen

(note that the LOAD CSV separator is strictly a comma, not comma and whitespace!)

The CSV file can then in turn be loaded into the Neo4j graph like so:

[cypher]
LOAD CSV WITH HEADERS FROM “file:./Twitter.csv” AS csvLine
MERGE (u:Person { name: csvLine.user })
MERGE (f:Person { name: csvLine.follower })
MERGE (u)<-[:FOLLOWS]-(f);
[/cypher]

That is, you simply point LOAD CSV to a file, then pair it with an update statement (like CREATE or MERGE). Each row of the file will be applied to the statement sequentially, available as a map.

Which in turn creates a graph that looks like:

More sophisticated graphs can easily be created, and the operation can be repeated with multiple CSV files to import any kind of data into Neo4j. There’s a full example in the Neo4j manual too, see: Importing CSV Files with Cypher.

Dense nodes support

Neo4j 2.1 brings together lots of great improvements into one package, and of particular interest are optimizations we’ve made around dense nodes. A dense node can occur in any domain, but it’s easily reasoned about when you think about social graphs. For example, Britney Spears may have many millions of incoming FAN relationships, but relatively few FRIEND relationships, and fewer still familial relationships like MOTHER or COUSIN.

This release marks the start of our dense nodes management features and provides a transparent performance boost when accessing those relatively fewer relationships amongst the general mass of relationships by separating them out (by relationship name and direction) in the database. Now when you want to surgically pick out Britney’s friends and family, you can do so without having to sift through her fans too.

New Cypher functionality and experimental query planner

Cypher has become the primary interface for much of the work that we do in the graph. The Cypher team has been tremendously productive during this release period both adding new user-facing features (like LOAD CSV that we saw above) and internals.

New to Cypher in Neo4j 2.1 is the UNWIND function, which converts collections into row data as exemplified by Mark Needham in this posting.

Under the covers, things are even more interesting. There’s a new experimental Cypher optimizer that improves performance of some queries. This is invoked by specifying “CYPHER 2.1.EXPERIMENTAL” at the start of your Cypher query. For some queries this can provide a substantial boost in performance while for others it might not, so measure your performance if you’re going to use it.

Use this with care, as some queries may run more slowly with the experimental optimizer, but please give us your feedback if you try it out!

Other goodies

Other notable improvements included in Neo4j 2.1:

A new lock manager in Neo4j Enterprise Edition, that improves performance in many-core computers
Official support for OpenJDK 7, adding to the ongoing support for Oracle Java 7

Available now!

Let the fun begin:

Download Neo4j 2.1
But first, check out the upgrade guide if moving from Neo4j 1.9
Join the conversation to let us know your experiences

Cheers,

Philip and the Neo4j Team

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook