The Neo4j 2.1.0 Milestone 1 Release – Import and Dense Nodes

We’re pleased to announce the release of Neo4j 2.1 Milestone 1, the first drop of the 2.1 release schedule whose dual goals are productivity and performance. In this release we’ve improved the experience at both ends of the Neo4j learning curve. On the data import side, we now support CSV import directly in the Cypher query language. For large, densely connected graphs we’ve changed the way relationships are stored in Neo4j to make navigating densely connected nodes much quicker for common cases.

CSV Import in Cypher

Getting started with Neo4j is pretty easy, especially since the 2.0 release with all its lovely Cypher and WebUI goodness. But once you’re past the stage of playing with smaller graphs, you typically want to load larger amounts of existing data into Neo4j. Traditionally at this point you’d have searched for Neo4j import tools and found a plethora of options for getting initial data ingested into the database. While there’s definitely a place for extremely high performance programmatic insert tools for huge amounts of data, for graphs of millions of items – the typical size for proving a concept – it’s probably overkill. From Neo4j 2.1 M01 importing millions of data items is simply a matter of loading CSV directly into Cypher from a file or URL like so: Treating every line as a collection (faster):
LOAD CSV FROM "file:///tmp/movies.csv" AS csvLine
MERGE (p:Person { name: csvLine[0]})
MERGE (m:Movie { title: csvLine[1]})
CREATE (p)-[:PLAYED { role: csvLine[2]}]->(m)
or a map:
LOAD CSV WITH HEADERS FROM "file:///tmp/movies.csv" AS csvLine
MERGE (p:Person { name:})
MERGE (m:Movie { title: csvLine.title})
CREATE (p)-[:PLAYED { role: csvLine.role}]->(m)
Note this should currently only be used from a neo4j-shell connected to the running Neo4j server.  For the best performance, you can give Cypher a hint about transaction size so that after n updates a transaction will be committed, both persisting the updates in case of failure and releasing resources keeping the database engine running smoothly. This is easy to declare:
which commits using the default of every 10000 updates, or if the default doesn’t work well for you, you can also change it, e.g.
to specify a particular number of updates (500 in this case) between commits. If you’d like to read a fully-worked example, Chris Leishman has used the CSV import feature to create OpenDisclosure, a dataset that follows the contributions and expenditure of Oakland politicians created CSV data in the public domain.

New Store Format for Densely Connected Nodes

Sometimes in a domain you find that some nodes serve a far more important role in the graph leading to interesting topologies with highly localised dense knots. For example in social graphs we find that celebrities have many more fans (dense) than friends (sparse by comparison) and we can take advantage of the way those relationships are differently scaled to improve query performance. In this release we’ve added support to the Neo4j store to partition relationships incident on a node by type and direction. Therefore if you’re searching for the handful of celebrity friends amongst the legion of fans, your queries will be much faster since most relationships (between the celebrity and their fans) won’t be touched. You won’t need to make any changes to your code to take advantage of this feature, but if you’re copying over an existing database you’ll have to set the allow_store_upgrade flag to true in your file. If you’ve got a densely connected domain, then try it out and let us have your performance feedback over on the Neo4j Google group.

Remember, milestones are for early access…

We want to get new features into your hands as early as we responsibly can. Milestones are all about feature feedback, so please download the release and tell us what you think at our Neo4j Google group.

…but they’re not for Production

Milestone releases don’t represent a stable platform for production use. We don’t certify them for production, and it may even be the case that in order to get some features released quickly, others are incomplete (for example, you can’t perform a rolling cluster upgrade with this milestone). So have fun with the milestone, but be safe! Also upgrades between milestones (e.g. when they include store format changes) are not supported.

Start your engines

So if you’re ready to explore a little, head over to the Neo4j downloads page and get graphing! Jim Webber for the Neo4j Team Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook



About the Author

Jim Webber , Chief Scientist, Neo4j

Jim Webber Image

Jim Webber is the Chief Scientist at Neo4j working on next-generation solutions for massively scaling graph data. Prior to joining Neo4j, Jim was a Professional Services Director with ThoughtWorks where he worked on large-scale computing systems in finance and telecoms. Jim has a Ph.D. in Computing Science from the Newcastle University, UK.


Anonymous says:

Regarding dense supernodes, will Neo4j eventually support vertex-centric indices like Titan does? It seems like the current gains are only around edge direction.

Pierre says:

Thanks for that useful post to start with load from CSV files. You mention two one can treat lines as collections or map. Does the former drop the CSV header line? Would it make sense to allow a second header line – not exactly CSV format compliant though – to let one specify / force the data column type (string, long, double)?

Julian Simpson says:

Hey Pierre, the Neo4j Google Group is probably a good place to ask that kind of question. Thanks for getting in touch!

1 Trackback

Leave a Reply

Your email address will not be published. Required fields are marked *


Upcoming Event


Have a Graph Question?

Stack Overflow
Community Forums
Contact Us

Share your Graph Story?

Email us: