Graph Data Platforms: From a Napkin Sketch to a Category Leader

A famous Chinese proverb says: “A journey of a thousand miles begins with a single step.”

On its path to being named a leading Graph Data Platform by Forrester Research in its recent “Forrester Wave™: Graph Data Platform, Q4 2020,” Neo4j learned useful lessons for developers at every single step.

Michael Hunger, Neo4j’s head of developer relations, recounts some of the experiences that forged Neo4j into the role of an industry leader as it advanced from an idea to a sketch, to a new database type, to a category, to industry leadership in which Amazon and Microsoft have entered as challengers.

Blaise James: Michael, we often hear the Neo4j Napkin origin story. For those unfamiliar, could you relay what this is?

Michael Hunger: Neo4j was extracted from a production application in the DMS/CMS SaaS space in Sweden in the early 2000’s. Originally, two use cases could not be satisfied by relational databases despite much effort: real-time permission resolution for a SaaS business and complex semantic networks of hierarchical, translated keywords. That led the engineers on the project to explore other means, one of which was an in-memory graph model.

To gather help building that, our co-founder Emil flew to Mumbai and on that flight sketched the basic building blocks of the pragmatic property graph model – nodes and relationships with properties. And yes, this is documented on a literal airplane napkin.

Later on, that graph model approach was implemented successfully in Sweden and formed the kernel of the Neo4j graph database platform that we know today.

Learn about Neo4j as a graph data platform leader.

James: Can you share more about what the Neo4j technology journey has looked like between now and then?

Michael: At the starting point, we were a bit naive: “Building a database… how hard can that be?” It turned out – pretty hard. But we invested thousands of person-years into building Neo4j from nimble beginnings into the broad platform it is today. Since the beginning, Neo4j’s focus was always on making developer’s lives easier; that’s why we decided to use the pragmatic property graph model and not more scientific models like RDF. Now we are doing the same for data scientists, by making graph algorithms approachable and easy to use.

For our customers and users, having a graph database at their disposal enables them to solve problems and gain insights they would otherwise not be able to.

Our technology has evolved significantly, growing from a core library with the graph model to handling transactions, memory and I/O like a proper database. Early on, the library was wrapped into a server with APIs, which then enabled the first graphical user interfaces. Shortly after, we started implementing the Cypher query language to make working with the database available to all kinds of programming languages and environments. Meanwhile, the clustered Neo4j solution took several iterations from Zookeeper to Paxos and now (Multi-)Raft which powers our causal clusters.

Like many other successful data-intensive services (Cassandra, Kafka, Spark) we rely on the JVM for scalability and portability. To make it easier to build applications, we devised a binary protocol (bolt) and with that official drivers for .Net, JavaScript, Python, Go and Java. To improve the usability of the platform, Neo4j Browser, Neo4j Desktop and Neo4j Bloom form our initial set of developer and end-user facing applications that bring modern web-application (React and GraphQL) feel to database interactions. These are supplemented by an ever-growing list of extensions in the form of graph apps.

While Neo4j has been available as an official Docker image for a long time, our cloud offering – Neo4j AuraDB – utilizes kubernetes operators. As part of our Neo4j Labs efforts, a large number of user-defined utility procedures (APOC), as well as the GRANDstack GraphQL make application development easier. Our first class integrations for Kafka, Spark and JDBC enable the addition to modern data architectures. A more recent addition to the graph platform is the Graph Data Science Library, which uses resource-efficient computation to enable large-scale graph computation on complex connected data.

At the recent NODES conference, Emil spoke of making the impossible possible, then usable and then magical. As you can see in our journey so far, we already made good progress on that path and the widespread use and enthusiasm of developers for our capabilities, features and tools confirm that.

Graphs are already magical and Neo4j puts that into your hands, which is why folks often declare their “love” for Neo4j or Cypher.

While developers have been excited about using graphs for a long time, making the case within an organization can be a bit more challenging. While our customers regularly demonstrate in presentations and articles how they could achieve new capabilities, speed up their development or just save money, it’s often not enough to convince your boss.

That’s why the Forrester Wave for Graph Data Platforms is a really important publication. It is the first major analyst report that covers what we have been working on for more than 10 years. For once it confirms the maturity of the graph space to have a leading analyst firm taking the time and effort to evaluate this segment of data platforms. Also having an independent third party compare and evaluate the different offerings objectively provides credibility and trust to the results.

Many organizations try to reduce the risk in adopting new technologies so they either look to their peers or independent sources to confirm their choices. With the Forrester Wave, you can provide that information to your decision-makers to support the graph projects you are convinced are the right choice for your organization.

James: We’re excited that Neo4j outperformed 12 other vendors in the space. Can you share your perspectives on why that leadership is warranted?

Michael: There are several interesting areas that speak to the strength of our engineering team and the elegance of the core product architecture. In the case of performance which is critical, especially to transaction-based use cases, our scalable core architecture and Cypher engine enable users to achieve the performant results they expect in their production environments, while enabling us to constantly improve many aspects of the platform.

Our scalability is top-rated in the report, both for transactional and data science workloads. From scaled up single instances to clustered or sharded environments you can choose the scale aspects that are the best fit for your needs.

That said, graph databases don’t need to be gigantic to deliver a lot of value – even small graphs of a few thousand nodes can be worth millions of dollars. But Neo4j is used by some of the largest companies in the world to run real-time, customer-facing production applications, which fortunately take much fewer compute resources than other comparable database solutions. Instead of running clusters of 100 or 60 or 12 machines, often a three-instance cluster of Neo4j is enough to serve the workload. At the same time, Neo4j is proven to scale out for many billions of elements in the graph both for transactional and analytics workloads.

Since day one, Neo4j has been transactional and has never given up on that guarantee. The fine-grained graph model requires transactionality to safely handle complex network updates without compromising data consistency.

James: Another thing that struck me – as a relative newcomer to Neo4j – is the fact that leadership is about so much more than the actual technology.

Michael: Yes, that’s right. Our active, supportive community is often cited as one of the main reasons for choosing Neo4j. I’ve been involved in growing and supporting our community for the last 10 years and am proud to say it’s the best community I’ve ever been part of – extremely helpful, friendly and knowledgeable, and a place for both newcomers and experts alike.

I think our community is one of the main reasons why if you talk graph databases, you talk Neo4j. We created and grew the graph database category over the last 10 years and invested a lot of effort into educating developers, data scientists and other users. When searching for any topic related to graph databases or graph data science, our technology and resources are top of the list. This is also represented by our leading position in the graph database category on the popular site, which takes as many (12) metrics into account for its scoring. But that’s a conversation for a future discussion!

Another thing I’ve observed is that we don’t compromise on quality. For instance, the Neo4j customer support team exceeds expectations every time. We have a super high satisfaction rating combined with a quick turnaround time on tickets. If need be, involvement of core engineering for fast and thorough resolution of issues is their core competency.

The last key contributor to our leadership is that we have a truly global perspective. Neo4j grew from its Swedish roots and is now headquartered in the Bay Area with engineering in London (UK) and Malmö (Sweden) and everyone else distributed across the globe including APAC. This means that we both serve our customers locally and globally, depending on their needs. Similar to the versatility of use cases, our users work and operate all over the world.

James: What do you think that says about Neo4j and its approach to innovation?

Michael: Like every mature company, you need to balance customer commitment with pure innovation. In our case, we carefully pick the areas where we innovate a lot – like graph data science, Cypher runtime, cloud operations and developer tools – versus the parts where we are quite conservative – data safety and security, transactions, operations and core database features.

We can’t and don’t want to risk the trust of our users and customers. At the same time, there are a lot of exciting developments in many areas that we are participating in. For instance, modern application development with GraphQL, integrations with Apache Spark or Kafka. We also try to push the envelope by providing the world with the best, open graph query language, GQL, which was recently approved by the ISO committee as the first new query language effort since the inception of SQL.

As part of our engagement with users, we gather firsthand qualitative and quantitative feedback from discussions, training classes, customer engagements and community forums. This feeds into our product roadmap prioritization. With a renewed focus on developer experience (DX), especially in our cloud offering Neo4j AuraDB, we want to make the new user experience as friction free as possible.

James: What does Neo4j make of behemoths like Amazon and Microsoft entering the graph market space?

Michael: Usually you would be concerned about such a development, given the deep pockets and engineering prowess of the giants. But in this case, we welcome their entry for two reasons: they validate our market segment and grow the visibility of graph technology – “a rising tide lifts all boats” situation. But given their size, graphs are not a focus for them, it is just one of the many topics they juggle.

For us, graph technology is at the heart of what we do, every hour of every day. So we can excel at it and provide the best service to our users and customers while the big vendors only provide a “checkbox” implementation.

James: Given that Neo4j earned the highest possible score on its roadmap, would you give some insight into what new features developers might see from Neo4j in the not-too-distant future?

Michael: As Emil also eluded in his NODES keynote, we put a lot of focus on making Neo4j available on all cloud platforms for both transactional and data science uses supporting needs from individual developers to large enterprises. Democratizing graph data science is another one of our big goals.

And last but not least, continuing to make it easier to build applications, import data and integrate in cloud native services. All these topics are serving our ultimate goal – helping people make sense of data.

Ready to get your free copy of the report The Forrester Wave™: Graph Data Platforms, Q4 2020, to learn more about new and emerging database technology that allow enterprises to solve complex problems and create meaningful insights quickly.

Get My Free Copy