News

Neo4j 2.3: Enhanced Enterprise Applications at Scale

Chief Technology Officer, Neo4j

October 22, 2015

10 min read

Discover the New Features and Improvements Now Available in Neo4j 2.3

If you didn’t hear the news from Emil Eifrem’s keynote presentation at GraphConnect San Francisco yesterday: The Neo4j team is pleased to announce the general availability of Neo4j 2.3.0.

Everyone from across Neo Technology has worked hard to make sure this the best, fastest and most scalable release of Neo4j ever, and we’re all very excited to bring it to you.

Neo4j 2.3 is about building bigger and better applications and has a range of new features, ranging from a fully off-heap cache and a faster query optimizer, to schema improvements and (by popular demand) official support for Docker.

We’re excited to make Neo4j 2.3 available to our community and customers, and look forward to seeing how these new advancements will impact your graph applications.

So, what’s new in Neo4j 2.3? Let’s take a closer look:

Theme #1: Intelligent Applications at Scale

Today’s applications must be smarter and faster than ever, supporting analytic transactions in real time to solve tomorrow’s challenges. Intelligent applications seamlessly apply and follow business rules within a connected context, harnessing data relationships to extract insights that were previously thought impossible.

As more business takes place online, the performance and functionality of your applications demand a fast and scalable database that you can trust, and Neo4j 2.3 helps you meet those challenges by making significant improvements in performance and operability at scale.

Eliminating the JVM-Based Object Cache

With Neo4j 2.3, your application breaks free of JVM-imposed limitations by moving the graph cache fully off the Java Virtual Machine heap. This gives you:

Higher vertical scaling and improved across-the-board performance with large graphs
Higher levels of concurrency
Read throughput for highly concurrent workloads is markedly improved
Improved operational characteristics at scale
Simplified tuning by reducing the number of configuration knobs

The New In-Memory Page Cache for Neo4j 2.3

Neo4j 2.3 moves the database cache off heap in order to improve concurrency and scale.

Neo4j 2.3 concludes a 2-year journey that started with Neo4j 2.1, to move the graph cache fully off of the JVM heap. Neo4j 2.2 was a major step along the way, introducing a new in-memory page cache that now (in 2.3) fully replaces the object cache.

The new page cache introduced in Neo4j 2.2 made it possible to more efficiently map small fragments of store-files into memory and lessened the need for a higher-level cache. This also provided the Neo4j DBMS with more granular control over how data is cached and locked than was previously possible earlier versions, which used memory-mapped IO underneath a JVM-based object cache (both now gone in Neo4j 2.3).

One consequence of removing the object cache is that it frees up memory inside of your JVM and on your machine for other useful things. The off-heap cache also places far less strain on the garbage collector.

The off-heap cache creates a number of operational benefits, allowing significant improvements in cluster operations when clusters are pushed to the limits: resulting in less GC activity and fewer long pause times.

The new page cache also provides telling improvements in concurrent read scaling on machines with multiple cores (we’ve seen improvements of up to 7x in the lab).

Smarter Cypher Query Planner

The Cypher query planner in Neo4j 2.3 improves significantly upon the cost-based optimizer introduced in February with Neo4j 2.2, offering better performance for many queries. The planner is now better at:

Finding cheaper execution paths where they exist
Running common queries, thanks to new algorithms such as triadic selection, which commonly shows up in recommendations queries
Auto-detecting and using indexes where they exist, in particular for queries combining graph pattern matching with numeric range queries

Note: You can always force the planner to revert to the previous rule-based planner with: CYPHER planner=rule and remember to check your query plans visually by prefixing slow queries with EXPLAIN or PROFILE.

Cypher Graph + Text String Search

With Neo4j 2.3, Cypher now includes new STARTS WITH, CONTAINS and ENDS WITH string operators. These help you build more powerful applications by more easily bringing search into your graph queries.

Here’s an example:

MATCH (c:Company)
WHERE c.name STARTS WITH "Neo"
RETURN c

This query would then return results like “Neo4j” and “Neo Technology”, using the schema index on Company.

Note: All three new operators use an exact (case sensitive) match. Also, schema indexes are used for STARTS WITH. Each CONTAINS or ENDS WITH can be used as filters, but will not (yet) use an index, as schema indexes index text properties leading edge first.

Theme #2: Developer Enablement: Productivity and Governance

With every release of Neo4j, we strive to make things more convenient for the developer, and Neo4j 2.3 is no exception.

In this release, convenience takes several forms: for the project and for the individual.

As more projects come to rely on Neo4j, it’s become clear that having a schema definition that goes beyond the simpler “the schema is the data” perspective on the model is very useful for communication, productivity and governance.

For this reason, Neo4j 2.3 expands on the initial notions of schema / meta model provided by relationship types (1.x), and labels and unique constraints (2.0). This has uses both for the individual and the team, with benefits ranging from data quality and governance to collaboration to data integration.

Separately, for the individual developer, the accumulation of small features and improvements should make Neo4j 2.3 all the more pleasant to develop with.

Property Existence Constraints *

*Note: This is a Neo4j Enterprise feature.

Property Existence Constraints let you specify a set of mandatory properties for a given label or relationship type.

This feature is useful because it helps to specify and understand what data is in the database, and convenient because it lets you push some rules down into the database that would otherwise live inside of the application (probably for a large application or multiple applications, in several places).

Property existence constraints can also play a useful role when importing data, to validate that the incoming data matches a certain minimum bar and matches up with the rules in the source system.

Let’s say I have the following graph data model with the labels and relationship types specified, and a unique constraint in place for PersonID and AssetID (denoted by the underline). Now let’s say I want to add a Property Existence Constraint for some other fields:

The New Property Existence Constraints in Neo4j 2.3

A graph data model with property existence constraints.

To create property existence constraints, you would formulate CREATE CONSTRAINT statements in Cypher using the pattern below, for each of the items:

CREATE CONSTRAINT ON (p:Person) ASSERT EXISTS (p.FirstName);
CREATE CONSTRAINT ON (a:Asset) ASSERT EXISTS (a.AssetName);
CREATE CONSTRAINT ON ()-[r:OWNS]-() ASSERT EXISTS(r.SinceDate);

Spring Data Neo4j 4.0

If you haven’t already heard the news, we have launched a fully-supported Neo4j data integration library for the Spring Framework. This library is especially recommended for any new Spring projects interacting with Neo4j 2.3.

This is pretty big news: It’s a major upgrade of Spring Data Neo4j, representing over a calendar year’s worth of work, optimized for Neo4j Server deployments.

Prior to this release, Spring Data Neo4j assumed embedded, performing well in an embedded scenario, but not so well when accessing your Neo4j database over the wire.

This latest version of Spring Data Neo4j provides Spring Data capabilities and is optimized for remote use through Cypher. Spring Data Neo4j 4.0 includes:

Object-Graph Mapping
Spring Data Repository support
Fast metadata scanning
Neo4j Template

Find out more information about the Spring Data Neo4j 4.0 library on the official Pivotal page or on our developer pages.

Windows PowerShell Support!

We would be remiss in making operability improvements without a nod to Windows users. Neo4j users on Windows can now use a full complement of PowerShell scripts for managing Neo4j, making it convenient to orchestrate the management of Neo4j.

PowerShell support means improved Windows automation and scripting as well as streamlined administration of Neo4j in a Windows environment.

See the figure below for new PowerShell commands available with Neo4j 2.3.

New Windows Powershell Commands in Neo4j 2.3

New Windows PowerShell commands available with Neo4j 2.3.

Other Improvements

A few other improvements to look for:

First, removing a node along with any existing relationships is such a common operation that we’ve added a special variation on DELETE which concisely expresses just that. DETACH DELETE now eliminates a node and all of its relationships using a single command.

Here is a Cypher example where we remove a mistakenly added Person from the Movie Graph (sorry Emil!):

MATCH (p:Person) WHERE p.name = "Emil Eifrem" DETACH DELETE p

Second, there is more support for Cypher query warnings in the Neo4j Browser. To help you write the best queries possible, Cypher now reports warnings about queries which might be less-than-awesome.

In the Neo4j Browser, you may notice a little yellow warning sign pop up when something about your query could be improved. Click on the warning, and you’ll receive an explanation about the concern, such as this explanation about calculating a Cartesian product:

An example of a Cypher query warning in the Neo4j 2.3 Browser explaining the calculation of a Cartesian product.

A few other things in Cypher features worth mentioning, mostly around performance:

Range queries can now be solved using index seeks (!!)
Function exists() may use an index scan
LIMIT influences planning costs
Inequality predicates can now be chained on the form 0 < n.prop < 10

And finally, last and probably least… but nonetheless useful, the Neo4j 2.3 Browser now comes with three selectable themes (default, dark and outline), to provide easy viewing across a range of circumstances. (I personally like “Outline View” when using a projector.)

Check out the new looks below and look for more themes with future Neo4j releases.

The default Neo4j Browser theme.

The dark Neo4j Browser theme.

The outline Neo4j Browser theme.

Theme #3: DevOps Enablement for On-Premise and Cloud

As more and more environments embrace polyglot persistence, the number of databases that need to be managed continues to increase in complexity and number. We’re therefore proud to include a few features in Neo4j 2.3 that will make it easier for IT professionals to operate Neo4j environments on premise and in the cloud.

Neo4j 2.3 enables DevOps teams by flattening learning curves, getting projects going quickly and easily integrating into existing architectures.

Official Docker Support

The last year has seen a lot of community activity with Neo4j and Docker, including (as of this writing) nearly 150 community-contributed repositories on Docker Hub.

We get it! It’s time for an official repo, and (for customers) official Neo Technology support of Neo4j on Docker. It’s now here.

New Mac Installer & Launcher

For all those of you who develop or tinker with Neo4j on your Mac, Neo4j 2.3 now includes an installer and launcher for Mac OSX machines.

The new Neo4j Mac Installer.

The new Neo4j Mac Launcher.

The Mac Installer allows for easy drag and drop installation of Neo4j 2.3. What’s actually most exciting about this is the bundled JVM, which avoids the need to download (the right version of) Java separately.

For more information on the Mac Installer and Launcher, see the screencast below (or just try it out!):

Neo4j Metrics*

*Note: This is a Neo4j Enterprise feature.

It’s always been possible to get at a variety of operational stats in Neo4j, with quite a few more stats available to Neo4j Enterprise users than with Neo4j Community Edition. A nice addition to Neo4j 2.3 is for Neo4j to stream those directly to your monitoring system of choice, configurable via config parameters.

Internally, we use Graphite to analyze the results of our own testing, which we find useful for comparing performance over time and inside of discrete instances across a cluster.

Speaking with customers, we realized this would make a useful addition inside of Neo4j Enterprise, so we integrated it into the product with Neo4j 2.3.

Here’s what the properties look like if you’re going to connect to Graphite (below). Ganglia is also supported, and we’re very open to adding support for other monitoring tools – just reach out via Support.

Neo4j Metrics allow you to push operational metrics and monitoring to third-party solutions such as Graphite or Ganglia. (Graphite example shown.)

Faster Backups, Upgrades and Bulk Loading of Data

There is a little-known background utility that’s key to keeping your database healthy, through bulk operations, and that is the Neo4j consistency checker.

When you’re inserting data using Neo4j’s transactional APIs, the database engine makes sure that data is consistent. For example, if you insert two nodes and a relationship, these will always succeed or fail together, ensuring the integrity of the graph.

(This incidentally is why writing graphs to a backend not built to handle graphs, such as a key-value store, document database or column-family database, can be a pernicious affair. You need extra checks to ensure your graph doesn’t get corrupted over time, which means supporting a connected – or graph – unit of work.)

However, when you’re carrying out a bulk activity in Neo4j, such as a “super fast” bulk load using the neo4j-import utility introduced in Neo4j 2.2, a backup (especially a hot backup, which are supported in Neo4j Enterprise), or a major upgrade (which sometimes involves a behind-the-scenes upgrade of the data store), you need to run a full health check on the store files after the fact to make sure the bulk activity occurred without any corruption.

Neo4j’s consistency checker utility is what does this for you. We’ve totally overhauled the consistency checker, making it run significantly faster at scale, making more efficient use of memory, parallelism and new multi-pass algorithms.

We’re excited about this, as it exemplifies the kind of work one does when building a database: background heavy lifting, so that you can build your cool app and not worry about things like database integrity checks.

Concretely, faster consistency checking specifically translates into swifter upgrades from Neo4j 1.9, 2.1 and 2.2, making it easier to get your hands onto Neo4j 2.3!

Conclusion

We hope you find this latest release as much a pleasure to use as it was to create. We look forward to your feedback as you start to use Neo4j 2.3 and as we head back into the development shop to work on the next set of amazing things.

On behalf of the entire Neo4j team: Enjoy.

–Philip Rathle

Ready to get started with Neo4j 2.3.0? Click below to download the latest release and start building your enterprise application today.

Download Neo4j 2.3