Neo4j 2.2.0 – Massive write & read scalability, faster Cypher query performance, improved developer productivity

March 25, 2015

6 min read

The Neo4j team is very proud to announce the the immediate availability of Neo4j 2.2, with major updates allowing organizations to derive maximum value from their data relationships.

Our latest release represents an astounding 20+ person years’ worth of engineering effort on top of Neo4j 2.1, making version 2.2 a significant step forward for Neo4j, as well as the graph database industry as a whole.

This release would not have been possible without a continuous stream of input from community members. Neo4j 2.2 also represented our largest beta to date, with participation from more than two thousand users.

With the release of Neo4j 2.2.0, we’re serving up both read and write performance at massive operational throughput – making it our fastest and most scalable graph database yet.

How do we achieve massive scale with Neo4j while improving read and write performance? We consider the size of the data, the shape of the data, and what you want to do with it.

Massive writes

Up to 100 times higher write throughput

Neo4j has always been fast. But what if you have hundreds or thousands of concurrent transactions all writing to the database at the same time? The new enhancements that we have made to the database engine, in particular Neo4j’s new fast write buffering architecture, improve write scaling dramatically, both for initial loading of the graph, and for highly concurrent server applications.

In Neo4j 2.2, concurrent writes are bundled together, optimizing throughput by minimizing the number of disk operations and amortizing transaction cost. The result is a huge improvement in concurrent transactions per second.

Before 2.2

neo4j-pre-2-2

With 2.2

neo4j-post-2-2

We have also eliminated the two-phase commit that occurred with every write operation, by unifying the transaction logs for graph data and related indexes, recognizing them as a single, mechanically sympathetic operational event. This immediately benefits all transactions.

For data import we went one step further, allowing bulk imports to be performed as one highly mechanically efficient operation. The new neo4j-import tool writes the graph directly to disk, bypassing the transactional database writer for offline databases. This results in sustained write throughput into the million of records per second, for graphs of all sizes, even into the tens of billions of nodes and relationships.

Neo4j 2.2’s write performance delivers exceptional performance:

Up to 100x higher transactional throughput with concurrent load
Vastly improved core scale-up, to more fully utilize modern hardware
Bulk data import at over 1M records/second, loading all of DBpedia (4.58M nodes and 20M+ relationships) in under 100 seconds

Massive read scalability

Up to 10 times higher read throughput

Let’s turn our attention to reads: when scaling to hundreds and thousands of read threads, our internal benchmarking revealed an interesting bottleneck. The memory subsystem relied on the OS (specifically memory mapped IO) for low-level caching operations. This works fine up to a point, but it breaks down under very high demand, since the OS is optimized for generalized caching and not graphs.

Neo4j 2.2 includes a brand new in-memory page cache designed to deliver extreme performance and scalability under highly concurrent workloads. Based on the long-established LRU-K algorithm, the in-memory page cache uses a statistically optimal strategy to populate the cache with frequently used data which in turn minimizes the need for slower disc access. This results in vastly improved performance for highly concurrent workloads, up to 10 times faster than Neo4j 2.1, while providing uniform scaling in multi-core environments.

The new page cache provides the benefits of being a purely in-memory database, without the craziness of sudden data loss, without any such downsides (data loss), while supporting highly granular node-level and relationship-level locking.

Faster Cypher query performance

Up to 100 times faster query performance

Cypher has always been the most convenient way to write queries, but, like any new query language, the onus at times has been on the query developer to provide hints and structure for the most efficient queries. To make Cypher queries deliver optimal performance across the board, Neo4j 2.2 introduces a cost-based optimizer.

Cost-based optimizer

The new cost-based optimizer is much smarter at planning queries and more transparent about what it’s doing.

The new statistics-gathering capability tracks the scale and the shape of the graph. Neo4j utilizes that new information about data use to inform the cost-based query optimizer, so that it can pick the best execution plan.

The result is that many Cypher queries are now as much as 100 times faster than previous versions benefiting all Neo4j developers regardless of their Cypher skills.

neo4j-2-2-optimizer

We realize that sometimes, the user is in the best position to understand what a query is doing and how to make it better. For those users, the rules-based optimizer is still available via “hints”. In fact to help we’ve also introduced profiling to provide insight into query execution to help you write faster queries. Just prepend PROFILE to any query to receive a visualization of the query plan. You can also view the query plan without running the query by using the EXPLAIN command.

This may help you realize that your query considers huge swathes of the graph, while needing only a small part of it. This can be really insightful, and allows you to spot opportunities to rephrase your query in a way that allows an optimization to occur.

Query plan visualization

neo4j-2-2queryplan

Improved developer productivity

Neo4j 2.2 is our most developer-friendly release to date, with lots of new additions to speed up your development. Some key capabilities include:

Quick start with code examples built right into the browser
Guide on how to migrate from an RDBMS using the new Northwind Graph example
Graceful rendering of overlapping relationships in Neo4j browser with curving arrows
New command button in the Neo4j browser to kill a running Cypher query
Query plan visualization to help identify choke points
Export of graph model as embeddable graphics (SVG, PNG) for sharing visualizations
Integrated product feedback

We think you’ll love this release as much as we do. Please drop us a line [email protected] to tell us what you think.

To download Neo4j 2.2 immediately, visit https://neo4j.com/download.

Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today.

Download My Ebook