The Neo4j 2.1.0 Milestone 1 Release – Import and Dense Nodes
We’re pleased to announce the
release of Neo4j 2.1 Milestone 1, the first drop of the 2.1 release schedule whose dual goals are productivity and performance.
In this release we’ve improved the experience at both ends of the Neo4j learning curve. On the data import side, we now support CSV import directly in the Cypher query language. For large, densely connected graphs we’ve changed the way relationships are stored in Neo4j to make navigating densely connected nodes much quicker for common cases.
CSV Import in Cypher
Getting started with Neo4j is pretty easy, especially since the
2.0 release with all its lovely Cypher and WebUI goodness. But once you’re past the stage of playing with smaller graphs, you typically want to load larger amounts of existing data into Neo4j.
Traditionally at this point you’d have searched for Neo4j import tools and found
a plethora of options for getting initial data ingested into the database. While there’s definitely a place for extremely high performance programmatic insert tools for huge amounts of data, for graphs of millions of items – the typical size for proving a concept – it’s probably overkill.
From Neo4j 2.1 M01 importing millions of data items is simply a matter of
loading CSV directly into Cypher from a file or URL like so:
Treating every line as a collection (faster):
LOAD CSV FROM "file:///tmp/movies.csv" AS csvLine
MERGE (p:Person { name: csvLine[0]})
MERGE (m:Movie { title: csvLine[1]})
CREATE (p)-[:PLAYED { role: csvLine[2]}]->(m)
or a map:
LOAD CSV WITH HEADERS FROM "file:///tmp/movies.csv" AS csvLine
MERGE (p:Person { name: csvLine.name})
MERGE (m:Movie { title: csvLine.title})
CREATE (p)-[:PLAYED { role: csvLine.role}]->(m)
Note this should currently only be used from a neo4j-shell connected to the running Neo4j server.
For the best performance, you can give Cypher a hint about transaction size so that after
n updates a transaction will be committed, both persisting the updates in case of failure and releasing resources keeping the database engine running smoothly. This is easy to declare:
USING PERIODIC COMMIT
which commits using the default of every 10000 updates, or if the default doesn’t work well for you, you can also change it, e.g.
USING PERIODIC COMMIT 500
to specify a particular number of updates (500 in this case) between commits.
If you’d like to read a fully-worked example,
Chris Leishman has used the CSV import feature to create
OpenDisclosure, a dataset that follows the contributions and expenditure of Oakland politicians created CSV data in the public domain.
New Store Format for Densely Connected Nodes
Sometimes in a domain you find that some nodes serve a far more important role in the graph leading to interesting topologies with highly localised dense knots. For example in social graphs we find that celebrities have many more
fans (dense) than
friends (sparse by comparison) and we can take advantage of the way those relationships are differently scaled to improve query performance.
In this release we’ve
added support to the Neo4j store to partition relationships incident on a node by type and direction. Therefore if you’re searching for the handful of celebrity
friends amongst the legion of
fans, your queries will be much faster since most relationships (between the celebrity and their fans) won’t be touched.
You won’t need to make any changes to your code to take advantage of this feature, but if you’re copying over an existing database you’ll have to set the
allow_store_upgrade flag to
true in your
neo4j.properties file.
If you’ve got a densely connected domain, then try it out and let us have your performance feedback over on the
Neo4j Google group.
Remember, milestones are for early access…
We want to get new features into your hands as early as we responsibly can. Milestones are all about feature feedback, so please download the release and tell us what you think at our
Neo4j Google group.
…but they’re not for Production
Milestone releases don’t represent a stable platform for production use. We don’t certify them for production, and it may even be the case that in order to get some features released quickly, others are incomplete (for example, you can’t perform a rolling cluster upgrade with this milestone). So have fun with the milestone, but be safe!
Also upgrades between milestones (e.g. when they include store format changes) are not supported.
Start your engines
So if you’re ready to explore a little, head over to the
Neo4j downloads page and get graphing!
Jim Webber for the Neo4j Team
Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.
Download My Ebook