Up to 100 times higher write throughputNeo4j has always been fast. But what if you have hundreds or thousands of concurrent transactions all writing to the database at the same time? The new enhancements that we have made to the database engine, in particular Neo4j’s new fast write buffering architecture, improve write scaling dramatically, both for initial loading of the graph, and for highly concurrent server applications. In Neo4j 2.2, concurrent writes are bundled together, optimizing throughput by minimizing the number of disk operations and amortizing transaction cost. The result is a huge improvement in concurrent transactions per second.
With 2.2We have also eliminated the two-phase commit that occurred with every write operation, by unifying the transaction logs for graph data and related indexes, recognizing them as a single, mechanically sympathetic operational event. This immediately benefits all transactions. For data import we went one step further, allowing bulk imports to be performed as one highly mechanically efficient operation. The new
neo4j-importtool writes the graph directly to disk, bypassing the transactional database writer for offline databases. This results in sustained write throughput into the million of records per second, for graphs of all sizes, even into the tens of billions of nodes and relationships. Neo4j 2.2’s write performance delivers exceptional performance:
- Up to 100x higher transactional throughput with concurrent load
- Vastly improved core scale-up, to more fully utilize modern hardware
- Bulk data import at over 1M records/second, loading all of DBpedia (4.58M nodes and 20M+ relationships) in under 100 seconds
Massive Read Scalability
Up to 10 times higher read throughputLet’s turn our attention to reads: when scaling to hundreds and thousands of read threads, our internal benchmarking revealed an interesting bottleneck. The memory subsystem relied on the OS (specifically memory mapped IO) for low-level caching operations. This works fine up to a point, but it breaks down under very high demand, since the OS is optimized for generalized caching and not graphs. Neo4j 2.2 includes a brand new in-memory page cache designed to deliver extreme performance and scalability under highly concurrent workloads. Based on the long-established LRU-K algorithm, the in-memory page cache uses a statistically optimal strategy to populate the cache with frequently used data which in turn minimizes the need for slower disc access. This results in vastly improved performance for highly concurrent workloads, up to 10 times faster than Neo4j 2.1, while providing uniform scaling in multi-core environments. The new page cache provides the benefits of being a purely in-memory database, without the craziness of sudden data loss, without any such downsides (data loss), while supporting highly granular node-level and relationship-level locking.
Faster Cypher Query Performance
Up to 100 times faster query performanceCypher has always been the most convenient way to write queries, but, like any new query language, the onus at times has been on the query developer to provide hints and structure for the most efficient queries. To make Cypher queries deliver optimal performance across the board, Neo4j 2.2 introduces a cost-based optimizer.
Cost-Based OptimizerThe new cost-based optimizer is much smarter at planning queries and more transparent about what it’s doing. The new statistics-gathering capability tracks the scale and the shape of the graph. Neo4j utilizes that new information about data use to inform the cost-based query optimizer, so that it can pick the best execution plan. The result is that many Cypher queries are now as much as 100 times faster than previous versions benefiting all Neo4j developers regardless of their Cypher skills. We realize that sometimes, the user is in the best position to understand what a query is doing and how to make it better. For those users, the rules-based optimizer is still available via “hints”. In fact to help we’ve also introduced profiling to provide insight into query execution to help you write faster queries. Just prepend PROFILE to any query to receive a visualization of the query plan. You can also view the query plan without running the query by using the EXPLAIN command. This may help you realize that your query considers huge swathes of the graph, while needing only a small part of it. This can be really insightful, and allows you to spot opportunities to rephrase your query in a way that allows an optimization to occur.
Query Plan Visualization
Improved Developer ProductivityNeo4j 2.2 is our most developer-friendly release to date, with lots of new additions to speed up your development. Some key capabilities include:
- Quick start with code examples built right into the browser
- Guide on how to migrate from an RDBMS using the new Northwind Graph example
- Graceful rendering of overlapping relationships in Neo4j browser with curving arrows
- New command button in the Neo4j browser to kill a running Cypher query
- Query plan visualization to help identify choke points
- Export of graph model as embeddable graphics (SVG, PNG) for sharing visualizations
- Integrated product feedback
We think you’ll love this release as much as we do. Please drop us a line firstname.lastname@example.org to tell us what you think.
To download Neo4j 2.2 immediately, visit http://neo4j.com/download.Want to learn more about graph databases? Click below to get your free copy of O’Reilly’s Graph Databases ebook and discover how to use graph technologies for your application today. Download My Ebook