Neo4j Research

Welcome to Neo4j Research, where we turn science into technology.

Projects

Publications

Funding

Collaborations

At Neo4j, we build a graph intelligence platform that delights our users and always strive to surpass customer expectations. Our products are built on a rich, collaborative history of computer science research and solid engineering.

Neo4j Research is where we explore future possibilities, both to enhance current products and to discover new opportunities. As multi-disciplinary computer scientists, we work with product teams at Neo4j as well as leading universities around the world with the common goal to accelerate graph technology.

Current projects

At Neo4j, we perform systems research on all parts of the graph data stack. We are currently working on a diverse set of projects that target temporal graph use-cases, leaderless transaction processing methods, and novel query runtimes based on dynamic programming languages.

Our aim is to understand how to build graph processing systems for modern cloud environments that are more capable than the current state-of-the-art and a departure from the classic (relational) approach.

Current graph database runtimes are built using the same techniques and principles as relational databases which can inhibit their performance and functionality. The fundamental issue is that graph runtimes have to handle a lot of irregularity, stemming from both schema-optionality and irregularity of workload and topology from a machine point of view.

To solve these problems we are building a next-generation query runtime that is inspired by dynamic programming languages technology. It allows us to optimize schema-less graphs through dynamic code optimization and to scale processing by adopting new compute paradigms such as disaggregated compute or accelerated computing with specialized hardware.

Principal Investigator: Dr. James Clarkson

Aurendil is part-funded by UKRI.

Graph databases have had to make hard choices about transactions. They can choose a protocol that is strict which sacrifices performance, or too loose which risks corrupting data under normal (no-fault) operation.

Our work on RIOT (Replicated Independently-Ordered Transactions) is intended to provide both scalability and ACID guarantees (with high isolation) for graph and other kinds of databases. In RIOT every server is a leader, so instead of a log with fully serialized operations, there is a DAG of transactions which captures concurrency and dependencies. RIOT guarantees that all replicas maintain a logically identical DAG, preserving order where conflicts require it while allowing commutative operations to execute concurrently.

RIOT ensures consistency by attaching leading edges—metadata representing a server’s recent DAG history—to protocol messages. Recipients validate this metadata against their own history to detect divergence before proceeding. This pervasive exchange allows participants to issue “qualified votes,” conditioning their approval of an operation on the coordinator’s awareness of their specific history.

We have demonstrated that RIOT upholds reciprocal consistency for graphs, even across shards. It ensures atomic agreement on entries and their ordering constraints, even in the presence of failures. We have built a prototype system to evaluate the performance of the approach in real-world conditions. Using this prototype as a testbed, we are now undertaking research into optimized recovery, lower network utilization, and other topics that make RIOT a very practical underlay for many systems.

Principal Investigator: Dr. Jim Webber

RIOT is joint research with UC Berkeley.

Previous projects

Neo4j Research has a successful recent history of projects, which have produced useful systems and significant results. You can read more about our

Modern graph database management systems (DBMSs) allow users to model real-world interactions as a set of nodes and relationships at a billions-to-trillion scale. However, existing systems ignore the temporal dimension of data: how a graph evolved over time. Lacking native temporal support, ad-hoc strategies are implemented that only achieve good performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms.

To tackle this problem, we designed Aion, a transactional temporal graph DBMS that generalizes previous approaches for labeled property graphs (LPGs). Aion is built directly atop Neo4j and adopts a hybrid temporal storage approach. For point lookups and small subgraph queries, it uses LineageStore that indexes graph updates by entity identifiers. For queries that require full graph reconstruction at arbitrary time points, it uses TimeStore that indexes updates by time.

To enable incremental graph computations for improved latency, Aion introduces a compute-efficient in-memory LPG representation. Our experiments so far show that Aion achieves up to 7x higher throughput against existing non-transactional temporal systems and provides up to an order of magnitude speedup over Neo4j with minimal storage overhead.

Publications

Neo4j has a strong publication history, and often collaborates with universities and other industrial researchers.

2026

RIOT: Replicated Independently-Ordered Transactions
SIGMOD 2026
Jim Webber, Georgios Theodorakis, Hugo Firth, and Natacha Crooks

Incremental Multilingual Text2Cypher with Adapter Combination
GRADES-NDA 2026
Makbule Gulcin Ozsoy

Improving Text2Cypher with Confidence-Based Test-Time Strategies
KG-LLM 2025
Rima Dessi and Makbule Gulcin Ozsoy

2025

Performance Evaluation of a Multi-Folder Ring Protocol for Total Ordering of Messages
MASCOTS 2025
Paul Ezhilchelvan, Isi Mitrani, and Jim Webber

TuskFlow: An Efficient Graph Database for Long-Running Transactions
VLDB 2025
George Theodorakis, Hugo Firth, James Clarkson, Natacha Crooks, and Jim Webber

Throughput-Driven Database Replication Using a Ring-Based Order Protocol
IDEAS 2025
Ye Lie, Paul Ezhilchelvan, Yingming Wang, and Jim Webber

Text2Cypher Across Languages: Evaluating and Finetuning LLMs
NLPIR 2025
Makbule Gulcin Ozsoy and William Tai

Enhancing Text2Cypher with Schema Filtering
LLM-TEXT2KG 2025
Makbule Gulcin Ozsoy

Text2Cypher: Data Pruning using Hard Example Selection
LLM-DPM 2025
Makbule Gulcin Ozsoy

Text2Cypher: Bridging Natural Language and Graph Databases
GenAIK 2025
Makbule Gulcin Ozsoy, Leila Messallem, Jon Besga, and Gianandrea Minneci

2024

Hardware-Efficient Data Imputation through DBMS Extensibility
VLDB 2024
Hubert Mohr-Daurat, George Theodorakis, and Holger Pirk

An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System
ICDEW 2024 (SEAGraph)
George Theodorakis, James Clarkson, and Jim Webber

Seraph: Continuous Queries on Property Graph Streams
EDBT 2024
Riccardo Tommasini, Christopher Rost, Angela Bonifati, Emanuele Della Valle, Erhard Rahm, Keith W. Hare, Stefan Plantikow, Petra Selmer, and Hannes Voigt

Implementations Based Evaluation of No-Wait Approach for Resolving Conflicts in Databases
EPEW 2024
Yingming Wang, Paul Ezhilchelvan, Jack Waudby, and Jim Webber

Aion: Efficient Temporal Graph Data Management
EDBT 2024
George Theodorakis, James Clarkson, and Jim Webber

BIFROST: A Future Graph Database Runtime
ICDE 2024
George Theodorakis, James Clarkson, and Jim Webber

Property Graph Stream Processing in Action with Seraph
SIGMOD 2024
Riccardo Tommasini, Christopher Rost, Angela Bonifati, Emanuele Della Valle, Erhard Rahm, Keith W. Hare, Stefan Plantikow, Petra Selmer, and Hannes Voigt

A Roadmap to Graph Analytics
SIGMOD 2024
Angela Bonifati, M. Tamer Ozsu, Yuanyuan Tian, Hannes Voigt, Wenyuan Yu, and Enjie Zhang

PG-Schema: Schemas for Property Graphs
ACM Proceedings of the ACM on Management of Data
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Alastair Green, Jan Hidders, Bei Li, Leonid Libkin, Victor Marsault, Wim Martens, Filip Murlak, Stefan Plantikow, Ognjen Savkovic, Michael Schmidt, Juan Sequeda, Slawek Staworko, Dominik Tomaszuk, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Dusan Zivkovic

VLDB Journal: Special issue on big graph data management and processing (editorial)
Special issue on big graph data management and processing
Anglela Bonifati and Hannes Voigt (eds).

2023

Evaluating the Performance of No-Wait Approach to Resolving Write Conflicts in Databases
EPEW 2023

Paul Ezhilchelvan, Isi Mitrani, Jim Webber, and Yingming Wang

Mammoths are Slow: The Overlooked Transactions of Graph Data
VLDB 2023

Audrey Cheng, Jack Waudby, Hugo Firth, Natacha Crooks, and Ion Stoica

Analysis of an epoch commit protocol for distributed processing systems
QEST 2023
Paul Ezhilchelvan, Isi Mitrani, and Jim Webber

Engineering Fast Algorithms for the Bottleneck Matching Problem
ESA 2023
Ioannis Panagiotas, Grégoire Pichon, Somesh Singh, and Bora Uçar

2022

A Performance Study of Epoch-based Commit Protocols in Distributed OLTP Databases
SRDS 2022

Jack Waudby, Paul Ezhilchelvan, Isi Mitrani, and Jim Webber

Pick & Mix Isolation Levels: Mixed Serialization Graph Testing
TPCTC 2022
Jack Waudby, Paul Ezhilchelvan, and Jim Webber

Graph Pattern Matching in GQL and SQL/PGQ
SIGMOD 2022
Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoč, Mingxi Wu, and Fred Zemke

2021

A GraphBLAS implementation in Pure Java
GRADES-NDA 2021

Florentin Dörre, Alexander Krause, Dirk Habich, and Martin Junghanns

PG-Keys: Keys for Property Graphs
SIGMOD 2021
Renzo Angles, Angela Bonifati, Stefania Dumbrava, George Fletcher, Keith W. Hare, Jan Hidders, Victor E. Lee, Bei Li, Leonid Libkin, Wim Martens, Filip Murlak, Josh Perryman, Ognjen Savković, Michael Schmidt, Juan Sequeda, Slawek Staworko, and Dominik Tomaszuk

The Future is Big Graphs! A Community View on Graph Processing Systems
Communications of the ACM (Vol. 64, No. 9)
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow, Mohamed Ragab, Matei R. Ripeanu, Semih Salihoglu, Christian Schulz, Petra Selmer, Juan F. Sequeda, and Joshua Shinavier

2020

Modeling the Gradual Degradation of Eventually-Consistent Distributed Graph Databases
Queueing Models and Service Management
Paul Ezhilchelvan, Isi Mitrani, and Jim Webber

2019

Big Graph Processing Systems
Dagstuhl Seminar 19491
Angela Bonifati, Alexandru Iosup, Sherif Sakr, and Hannes Voigt

Efficient Query Processing for Dynamically Changing Datasets
SIGMOD Record
Muhammad Idris, Martín Ugarte, Stijn Vansummeren, Hannes Voigt, and Wolfgang Lehner

Schema Validation and Evolution for Graph Databases
Conceptual Modeling. ER 2019
Angela Bonifati, Peter Furniss, Alastair Green, Russ Harmer, Eugenia Oshurko, and Hannes Voigt

Period Index: A Learned 2D Hash Index for Range and Duration Queries
SSTD 2019
Andreas Behrend, Anton Dignös, Johann Gamper, Philip Schmiegelt, Hannes Voigt, Matthias Rottmann, and Karsten Kahl

Understanding Trolls with Efficient Analytics of Large Graphs in Neo4j
BTW 2019
David Allen, Amy Hodler, Michael Hunger, Martin Knobloch, William Lyon, Mark Needham, and Hannes Voigt

Updating Graph Databases with Cypher
VLDB 2019
Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Martin Schuster, Petra Selmer, and Hannes Voigt

Approximate Querying for the Property Graph Language Cypher
Big Data 2019
George Fletcher, Alexandra Poulovassilis, Petra Selmer, and Peter T. Wood

2018

Cypher: An Evolving Query Language for Property Graphs
SIGMOD 2018
Nadime Francis, Alastair Green, Paolo Guagliardo, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Stefan Plantikow, Mats Rydberg, Petra Selmer, and Andrés Taylor

Declarative and distributed graph analytics with GRADOOP
VLDB 2018
Martin Junghanns, Max Kiessling, Niklas Teichmann, Kevin Gómez, André Petermann, and Erhard Rahm

openCypher: New Directions in Property Graph Querying
EDBT 2018
Alastair Green, Martin Junghanns, Max Kiessling, Tobias Lindaaker, Stefan Plantikow, and Petra Selmer

An early look at the LDBC social network benchmark’s business intelligence workload
GRADES-NDA 2018
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter Boncz, Vlad Haprian, and János Benjamin Antal

2017

ACTiCLOUD: Enabling the Next Generation of Cloud Applications
ICDCS 2017
Georgios Goumas, Konstantinos Nikas, Ewnetu Bayuh Lakew, Christos Kotselidis, Andrew Attwood, Erik Elmroth, Michail D. Flouris, Nikos Foutris, John Goodacre, Davide Grohmann, Vasileios Karakostas, Panagiotis Koutsourakis, Martin Kersten, Mikel Luján, Einar Rustad, John Thomson, Luis Tomás, Atle Vesterkjaer, Jim Webber, Ying Zhang, and Nectarios Koziris

2016

Investigations on Path Indexing for Graph Databases
Euro-Par 2016: Parallel Processing Workshop
Jonathan M. Sumrall, George H. L. Fletcher, Alexandra Poulovassilis, Johan Svensson, Magnus Vejlstrup, Chris Vest, and Jim Webber

2015

The LDBC Social Network Benchmark: Interactive Workload
SIGMOD 2015
Orri Erling, Alex Averbuch, Josep Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat, Minh-Duc Pham, and Peter Boncz

2012

A Programmatic Introduction to Neo4j
SPLASH (OOPSLA) 2012
Jim Webber

The Graph Traversal Pattern
Graph Data Management: Techniques and Applications
Marko A. Rodriguez and Peter Neubauer

Funding

Neo4j is built upon a solid research foundation. Research is a collaborative endeavor, and we work alongside colleagues in academia to push forward the boundaries of graph data. We offer research funding across a range of activities, from masters level through to project and program funding.

M.Sc. dissertations

Prospective masters students in computing science or allied disciplines are invited to contact us about thesis-level project opportunities. Students will be supported by our R&D team to build and evaluate real-world database implementations for their thesis.

In the past Neo4j has hosted students from TU Eindhoven, KTH, University of Leipzig, TU Munich, LTH.

Ph.D. scholarships

Building on our successful track record of collaboration with leading research-intensive universities, Neo4j are able to offer a limited number of bursaries for Ph.D. studentships to investigate areas of research interest in graph databases and related fields. Available bursaries are announced through partner universities.

We are looking to recruit a Ph.D. student in a jointly-supervised thesis at the University of Surrey (UK) using AI on the inside of databases. Previously we have sponsored students from Newcastle University and Birkbeck University of London.

Post-doctoral collaboration

Neo4j researchers collaborate with leading research institutions on the most challenging graph database research problems. Funding is made available to partner universities for post-doctoral staff to work on medium-term systems research in graph databases.

Presently Neo4j is working with Newcastle University, LIRIS, University of Sydney, and UC Berkeley.

Collaborations

Examples of Neo4j’s current and past research collaborations can be found below.

Ongoing Academic Projects

Newcastle University (UK)

Neo4j is engaged with the team lead by Dr Paul Ezhilchelvan and Prof Isi Mitrani working on novel transaction protocols for graph databases. The work has involved the design, modeling, verification and implementation of new kinds of transaction processing protocols for scalable, fault-tolerant graph databases. The work is an ongoing collaboration with Ph.D. students on the team having the opportunity to intern at Neo4j as part of their studies.

Ongoing Academic Projects

LIRIS (France)

Neo4j support the ongoing work of Prof Angela Bonifati and her team on query languages for graphs, including both Neo4j’s Cypher and the forthcoming ISO GQL standard.

Multi-Institution Projects

ACTiCLOUD

An EC funded H2020 research project to create elastic infrastructure for the cloud, including servers with large aggregate RAM and cores. As part of this work, Neo4j performed research to exploit the aggregate resources provided by the underlying platform by extending the Cypher runtime and query planner to execute queries in a parallel and NUMA-aware fashion.

Even on standard hardware the results of this research means that Cypher queries can be parallelized and have locality cost built into their query plans. Since Neo4j 5.13, users have been able to efficiently run large graph analytics jobs in the database that have previously been the domain of compute platforms, using the parallel runtime, without custom code.

Multi-Institution Projects

LDBC

Neo4j was a founding member of the Linked Data Benchmark Council (LDBC). LDBC is an independent authority for specifying benchmarks, benchmarking procedures and verifying/publishing results for software systems designed to manage connected data. Since its foundation other database vendors have joined the effort, including: Oracle, IBM, AWS, and SAP.

Contact us

If you’d like to get in touch us, please email us:
[email protected]