The Neo4j 2.1.0 Milestone 1 Release – Import and Dense Nodes

Chief Scientist, Neo4j
4 min read

The Neo4j 2.1.0 Milestone 1 Release – Import and Dense Nodes
We’re pleased to announce the release of Neo4j 2.1 Milestone 1, the first drop of the 2.1 release schedule whose dual goals are productivity and performance. In this release we’ve improved the experience at both ends of the Neo4j learning curve. On the data import side, we now support CSV import directly in the Cypher query language. For large, densely connected graphs we’ve changed the way relationships are stored in Neo4j to make navigating densely connected nodes much quicker for common cases.CSV Import in Cypher
Getting started with Neo4j is pretty easy, especially since the 2.0 release with all its lovely Cypher and WebUI goodness. But once you’re past the stage of playing with smaller graphs, you typically want to load larger amounts of existing data into Neo4j. Traditionally at this point you’d have searched for Neo4j import tools and found a plethora of options for getting initial data ingested into the database. While there’s definitely a place for extremely high performance programmatic insert tools for huge amounts of data, for graphs of millions of items – the typical size for proving a concept – it’s probably overkill. From Neo4j 2.1 M01 importing millions of data items is simply a matter of loading CSV directly into Cypher from a file or URL like so: Treating every line as a collection (faster):LOAD CSV FROM "file:///tmp/movies.csv" AS csvLine MERGE (p:Person { name: csvLine[0]}) MERGE (m:Movie { title: csvLine[1]}) CREATE (p)-[:PLAYED { role: csvLine[2]}]->(m)or a map:
LOAD CSV WITH HEADERS FROM "file:///tmp/movies.csv" AS csvLine MERGE (p:Person { name: csvLine.name}) MERGE (m:Movie { title: csvLine.title}) CREATE (p)-[:PLAYED { role: csvLine.role}]->(m)Note this should currently only be used from a neo4j-shell connected to the running Neo4j server. For the best performance, you can give Cypher a hint about transaction size so that after n updates a transaction will be committed, both persisting the updates in case of failure and releasing resources keeping the database engine running smoothly. This is easy to declare:
USING PERIODIC COMMITwhich commits using the default of every 10000 updates, or if the default doesn’t work well for you, you can also change it, e.g.
USING PERIODIC COMMIT 500to specify a particular number of updates (500 in this case) between commits. If you’d like to read a fully-worked example, Chris Leishman has used the CSV import feature to create OpenDisclosure, a dataset that follows the contributions and expenditure of Oakland politicians created CSV data in the public domain.