The Neo4j 2.1.0 Milestone 1 release – import and dense nodes

Chief Scientist, Neo4j

February 24, 2014

4 min read

The Neo4j 2.1.0 Milestone 1 release – import and dense nodes

We’re pleased to announce the release of Neo4j 2.1 Milestone 1, the first drop of the 2.1 release schedule whose dual goals are productivity and performance.

In this release we’ve improved the experience at both ends of the Neo4j learning curve. On the data import side, we now support CSV import directly in the Cypher query language. For large, densely connected graphs we’ve changed the way relationships are stored in Neo4j to make navigating densely connected nodes much quicker for common cases.

CSV import in Cypher

Getting started with Neo4j is pretty easy, especially since the 2.0 release with all its lovely Cypher and WebUI goodness. But once you’re past the stage of playing with smaller graphs, you typically want to load larger amounts of existing data into Neo4j.

Traditionally at this point you’d have searched for Neo4j import tools and found a plethora of options for getting initial data ingested into the database. While there’s definitely a place for extremely high performance programmatic insert tools for huge amounts of data, for graphs of millions of items – the typical size for proving a concept – it’s probably overkill.

From Neo4j 2.1 M01 importing millions of data items is simply a matter of loading CSV directly into Cypher from a file or URL like so:

Treating every line as a collection (faster):

LOAD CSV FROM "file:///tmp/movies.csv" AS csvLine
MERGE (p:Person { name: csvLine[0]})
MERGE (m:Movie { title: csvLine[1]})
CREATE (p)-[:PLAYED { role: csvLine[2]}]->(m)

or a map:

LOAD CSV WITH HEADERS FROM "file:///tmp/movies.csv" AS csvLine
MERGE (p:Person { name: csvLine.name})
MERGE (m:Movie { title: csvLine.title})
CREATE (p)-[:PLAYED { role: csvLine.role}]->(m)

Note this should currently only be used from a neo4j-shell connected to the running Neo4j server.

For the best performance, you can give Cypher a hint about transaction size so that after n updates a transaction will be committed, both persisting the updates in case of failure and releasing resources keeping the database engine running smoothly. This is easy to declare:

USING PERIODIC COMMIT

which commits using the default of every 10000 updates, or if the default doesn’t work well for you, you can also change it, e.g.

USING PERIODIC COMMIT 500

to specify a particular number of updates (500 in this case) between commits.

If you’d like to read a fully-worked example, Chris Leishman has used the CSV import feature to create OpenDisclosure, a dataset that follows the contributions and expenditure of Oakland politicians created CSV data in the public domain.

New store format for densely connected nodes

Sometimes in a domain you find that some nodes serve a far more important role in the graph leading to interesting topologies with highly localised dense knots. For example in social graphs we find that celebrities have many more fans (dense) than friends (sparse by comparison) and we can take advantage of the way those relationships are differently scaled to improve query performance.

In this release we’ve added support to the Neo4j store to partition relationships incident on a node by type and direction. Therefore if you’re searching for the handful of celebrity friends amongst the legion of fans, your queries will be much faster since most relationships (between the celebrity and their fans) won’t be touched.

You won’t need to make any changes to your code to take advantage of this feature, but if you’re copying over an existing database you’ll have to set the allow_store_upgrade flag to true in your neo4j.properties file.

If you’ve got a densely connected domain, then try it out and let us have your performance feedback over on the Neo4j Google group.

Remember, milestones are for early access…

We want to get new features into your hands as early as we responsibly can. Milestones are all about feature feedback, so please download the release and tell us what you think at our Neo4j Google group.

…but they’re not for Production

Milestone releases don’t represent a stable platform for production use. We don’t certify them for production, and it may even be the case that in order to get some features released quickly, others are incomplete (for example, you can’t perform a rolling cluster upgrade with this milestone). So have fun with the milestone, but be safe!

Also upgrades between milestones (e.g. when they include store format changes) are not supported.

Start your engines

So if you’re ready to explore a little, head over to the Neo4j downloads page and get graphing!

Jim Webber for the Neo4j Team

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook