2013: What’s Coming Next in Neo4j!
Following the recent 2012 retrospective, we’d like to share some of our plans for the coming year. If you’ve been following our latest progress, you already know that we have a 1.9 version in the works. Neo4j 1.9 makes great strides in HA cluster manageability by eliminating the need to run Zookeeper. We will release 1.9 when it’s ready. (We are readiness driven more so than we are date driven.) As of this writing, the latest Neo4j 1.9-M04 milestone has just rolled out. We are expecting a release announcement for Neo4j 1.9 GA some time in February.
Beyond Neo4j 1.9:Even though roadmaps can change, and it’s nice not to spoil all of the surprises, we do feel it’s important to discuss priorities within our community. We’ve spent a lot of time over the last year taking to heart all of of the discussions we’ve had, publicly and privately, with our users, and closely looking at the various ways in which Neo4j is used. Our aim in 2013 is to build upon the strengths of today’s Neo4j database, and make a great product even better. The 2013 product plan breaks down into a few main themes. This post is dedicated to the top two, which are: 1. Ease of Use. Making the product easier to learn, use, and maintain, for new & existing users, and 2. Big(ger) Data. Handling ever-bigger data and transaction volumes.
Ease of UseOur goal is to make it as easy as possible to learn Neo4j, to build applications using Neo4j, and to maintain Neo4j. One observation we’ve made is that most Neo4j users develop their own way to classify nodes: either with type nodes, or with properties. We’ve decided it’s time that we build in support for this. So for the first time since Neo4j founder and CEO Emil Eifrem sketched out what today is called the property graph model on the back of a napkin on a flight to Bombay in Fall of 2000, we will be extending Neo4j’s data model. The key idea is that you will be able to tag nodes with labels, for example: “Person”, “Employee”, “Parcel”, “Gene”, etc. We will soon invite discussion on the Neo4j Google Group concerning the implementation details. Expect to hear more about this soon. Another area where we are planning improvements is indexing. Auto indexing reduces the effort required to manage indexes. However they’re not suitable in all cases, and Cypher is likewise still lacking in its support for index creation and maintenance. We have a number of improvements in store here, but the most significant one is that as of the 2.0 release, you can perform every single index operations via Cypher. This means that many applications can be built with Cypher as the sole and exclusive way to access Neo4j. These improvements are both planned in a 2.0 release, which will be the next major release after 1.9, and is expected in the first half of the year. Designating 2.0 as a major release also means that we will also be removing all features and functions deprecated in 1.x. Now is a good time to start making sure that you’re not using anything that’s been deprecated.
Big(ger) DataBefore we dive into futures, let’s revisit what Neo4j can do today. Presently, Neo4j’s clustering solution has excellent availability and read scaling characteristics, for up to dozens of servers. Features such as cache sharding (which allows cluster members to each keep a different portion of the graph in memory) and turbo cache (which provides up to 10x improvements in cache speed for very large cache sizes), are just a few of the technologies that allow Neo4j to scale extremely well. We have seen customers running globally-distributed 24×7 Neo4j clusters across EC2 regions on three continents, and replicate hundreds of thousands of write transactions between continents seconds. And we have seen mission-critical use of graphs range from tens of thousands of nodes & relationships at the low end, to billions of nodes and relationships in a single (usually clustered) graph. Tens of billions turns out to be enough for the vast majority of uses we see. It allows a social graph (for example) to scale to right about the size of the people and friendships in Facebook. Today, a single Neo4j instance can handle graphs into the tens of billions of nodes/ relationships/ properties. Using the right design patterns and some application development, it is possible to partition a graph oneself, and scale even higher. Whilst we have some users doing this, our goal is to minimize the amount of effort required to scale, no matter how large. Looking ahead, we do see a need to support even bigger graphs, particularly as data volumes, together with the demand for graph databases, continue to grow. For this reason, we have several initiatives in 2013 aimed at scaling into the 100B+ range and beyond (in a single graph), horizontally or vertically. Let’s talk about vertical scalability first. We will be increasing the upper size limits of a single instance, to a number that’s high enough not to be a concern. These limits were never meant to be obstacles, but are rather storage optimizations intended to keep on-disk footprint as modest as possible. Few users ever bump into these limits. But we want to raise them before anyone does. Another project we have planned around “bigger data” is to add some specific optimizations to handle traversals across densely-connected nodes, having very large numbers (millions) of relationships. (This problem is sometimes referred to as the “supernodes” problem.) The last item on the Big(ger) Data list is definitely not the least. It is a horizontally-scalable, distributed graph, designed to spread massive graphs across clusters of machines. This is a multi-year project that’s been going on in the background for some time, code-named Rassilon. Rassilon extends Neo4j’s existing horizontal scalability story to allow distributed read & write processing across clusters of machines for graphs of any size. Rassilon fully supports sharding of graphs. Together with the improvements mentioned above, it will ensure that whatever graph you have will be able to be stored in Neo4j, no matter how large. We are continuing to work actively on this, and will share further details on this as the year progresses.
Some Other ItemsOther key items in this year’s plan include the following:
- Neo4j will become easier to run on EC2 and Azure, to cater to the increasing number of users who are running on these platforms.
- Cypher will continue to see improvements in both functionality and performance, including the indexing improvements mentioned above. Cypher has been popular because of its power, compactness, and readability.
- The remote API will continue to improve, as Server becomes the primary means of accessing Neo4j, over Embedded. We’re looking at a variety of ways to improve the experience, beyond simply making REST tweaks, though we will do that too. We look forward to opening this topic up for community discussion when we start the design. One important improvement is the ability for a transaction to span multiple REST calls. We expect to have this out by mid year.
- Work is planned to make it easier for our community members to make and share contributions to the Neo4j ecosystem, such as visualization tools, language bindings, and adapters.
- We wil improve the learning experience, via improvements to documentation, the web UI, etc.
- And finally, robustness. We will continue the steady stream of behind-the-scenes changes to ensure that Neo4j remains robust across a wide and growing range of applications.