Neo4j Research Center
Welcome to Neo4j Research, where we make science into technology.
At Neo4j, we perform systems research on all parts of the graph data stack. We are currently working on a diverse set of projects that target temporal graph use-cases, leaderless transaction processing methods, and novel query runtimes based on dynamic programming languages.
Our aim is to understand how to build graph processing systems for modern cloud environments that are more capable than the current state-of-the-art and a departure from the classic (relational) approach.
Current graph database runtimes are built using the same techniques and principles as relational databases which can inhibit their performance and functionality. The fundamental issue is that graph runtimes have to handle a lot of irregularity, stemming from both schema-optionality and irregularity of workload and topology from a machine point of view.
To solve these problems we are building a next-generation query runtime that is inspired by dynamic programming languages technology. It allows us to optimize schema-less graphs through dynamic code optimization and to scale processing by adopting new compute paradigms such as disaggregated compute or accelerated computing with specialized hardware.
Modern graph database management systems (DBMSs) allow users to model real-world interactions as a set of nodes and relationships at a billions-to-trillion scale. However, existing systems ignore the temporal dimension of data: how a graph evolved over time. Lacking native temporal support, ad-hoc strategies are implemented that only achieve good performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms.
To tackle this problem, we designed Aion, a transactional temporal graph DBMS that generalizes previous approaches for labeled property graphs (LPGs). Aion is built directly atop Neo4j and adopts a hybrid temporal storage approach. For point lookups and small subgraph queries, it uses LineageStore that indexes graph updates by entity identifiers. For queries that require full graph reconstruction at arbitrary time points, it uses TimeStore that indexes updates by time.
To enable incremental graph computations for improved latency, Aion introduces a compute-efficient in-memory LPG representation. Our experiments so far show that Aion achieves up to 7x higher throughput against existing non-transactional temporal systems and provides up to an order of magnitude speedup over Neo4j with minimal storage overhead.
Transaction protocols have historically been decoupled from the data models they support. Consequently graph databases either support one of two suboptimal choices: either protocols that are too strict which sacrifice performance while maintaining correctness, or too loose which offers better performance but corrupts data in normal operation.
In the long term we need better options. We are investigating one such approach called “Conjunction of Majorities” where transaction messages carry metadata about their predecessors. Participants use this metadata to compare against their local state to determine compatibility. For single-shard transactions, if a majority of participants discover that a transaction is compatible then it can proceed through a conventional two-phase consensus protocol. For multi-shard transactions each shard must have a majority and hence “conjunction of majorities” in the general case.
We have undertaken a theoretical investigation to establish the limits of the approach with respect to correctness (specifically reciprocal consistency for graphs) and global constraints. We are also building a prototype system to evaluate the performance of the approach in real-world conditions.
Neo4j has a strong publication history, and often collaborates with universities and other industrial researchers.
Neo4j is built upon a solid research foundation. Research is a collaborative endeavor, and we work alongside colleagues in academia to push forward the boundaries of graph data. We offer research funding across a range of activities, from masters level through to project and program funding.
Prospective masters students in computing science or allied disciplines are invited to contact us about thesis-level project opportunities. Students will be supported by our R&D team to build and evaluate real-world database implementations for their thesis.
Building on our successful track record of collaboration with leading research-intensive universities, Neo4j are able to offer a limited number of bursaries for Ph.D. studentships to investigate areas of research interest in graph databases and related fields. Available bursaries are announced through partner universities.
Neo4j researchers collaborate with leading research institutions on the most challenging graph database research problems. Funding is made available to partner universities for post-doctoral staff to work on medium-term systems research in graph databases.
Examples of Neo4j’s current and past research collaborations can be found below.