Neo4j 1.9 M02 - Under the Hood

We have been working hard over the last weeks to tune and improve many aspects in the Neo4j internals, to deliver an even faster, more stable and less resource intensive graph database in this 1.9.M02 milestone release. Those efforts span a lot of areas that benefit everyone from the typical developer to sysops and to most other Neo4j users.

We are thrilled about the feedback we got from customers, and our community via Google Group, Stack Overflow and Twitter. Thanks for helping us improve.

While the new changes might not be visible at the first glance, let’s look into Neo4j’s engine room to see what has changed.

Everyone’s most beloved query language, Cypher, has matured a lot thanks to Jake and Andres’ incredible work. They have made query execution much faster, for most use-cases, while utilizing less memory. The lazy execution of queries has sneaked away lately, so Andres caught it and put it back in. That means you can run queries with potentially infinitely large result sets without exhausting memory. Especially when streaming results (no aggregation and ordering) it will use only a tiny fraction of your memory. The very frequent construct ORDER BY … LIMIT … now benefits from a better top-n-select algorithm. These latest improvements are closing the performance gap to the core-API even more. We’ve also glimpsed a new internal SPI, that will allow Cypher to run even faster in the future.

For top speed please make sure to use query parameters everywhere, it helps a lot even if the now configurable query cache size (e.g. query_cache_size=1000) allows for a larger number of queries to be cached.

As for some eye-candy, we provide you with a slicker version of the Neo4j console which now features interactive jQuery data result tables to allow in-browser filtering, searching and paging.

Our shiny new High Availability Cluster can now be upgraded seamlessly from an existing Zookeeper based setup to the new infrastructure that runs on a Paxos (coordinator-free) implementation. So you can upgrade your test-clusters without any downtime, if you want to know more, please check out the HA documentation.

For a better HA experience, we have added a new extension providing current cluster information that a load balancer can act on to update its routing. For the curious, there is also much more JMX monitoring information about the cluster available in the web interface.

In the depths of the Kernel, several performance improvements have been applied resulting in a better overall performance of Neo4j.

A nice feature we created due to a user request is the OrderByTypeExpander that keeps the provided order of relationship-types AND directions during the traversal.

To be better safe than sorry, we have sandboxed the JavaScript traversals that are exposed via the Server REST API to be more secure.

Now, go ahead and give your project some Neo4j love. Test it with the lastest and greatest Neo4j release so far and tell us how we did. Download it from the new Neo4j website that we created for your convenience and learning experience. And while you’re on it, please explore the new site and provide us with feedback on its helpfulness.

Happy hacking!

/Peter and Michael