openCypher Will Pave the Road to GQL for Cypher Implementers

openCypher will pave the road to GQL for Cypher implementations.

The GQL ISO standard has just landed (April 11, 2024), marking a historical moment for graph database languages and a huge milestone in what has already been a relatively long history of language development.

In a previous post, I briefly sketched how the Neo4j proprietary implementation of Cypher is becoming GQL compliant. However, the Cypher world is bigger than Neo4j, its original creator: most graph databases support Cypher via the openCypher project.

The post starts with a brief history of property graph languages to give context to the most recent graph database practitioners. It then describes a vision of the openCypher project’s future in what will soon become a GQL world.

The Origins of Cypher: A New Beginning

The Cypher language emerged in 2010, during the early halcyon days of NoSQL. The early part of this decade saw more and more languages emerge for querying graph databases. Nearly every new vendor entering the stage invented their own language. In fact Neo4j itself had at least three languages for quite some time.

Cypher was declarative, unlike most other graph database query languages at the time. It was modeled after SQL, where you describe an outcome and let the database do the work of finding the right results. Cypher also strove to reuse wherever possible and innovate only when necessary.

Most graph languages took the alternative approach, called imperative graph querying. With this method, developers had to spell out each step the database should take. While easier for vendors, this burdened users. Slowly but surely, Neo4j users upvoted Cypher with their keyboards. Even though the earliest versions were limited, we saw users choose Cypher whenever they could.

By 2015, Cypher had gained a lot of maturity and evolved for the better, thanks to real-world hard knocks and community feedback.

Yet as time progressed, the graph query languages kept coming—still none of them with anything close to Cypher’s success. If this kept up, the graph database space would continue to accumulate new languages, making it more and more confusing for users and for the budding ecosystem of graph tools, connectors, and consultancies.

At Neo4j, we realized that if we cared about solving this problem, we needed to give away Cypher.

openCypher 9: The Need for Convergence

In October 2015, Neo4j launched a new open initiative called openCypher. openCypher not only made the Cypher language available to the ecosystem (competitors included!). It also included documentation, tests, and code artifacts to help implementers incorporate Cypher into their products1. Last but not least, it was run as a collaboration with fellow members of the graph database ecosystem, very much in keeping with Neo4j’s open source ethos. This started a new chapter in the graph database saga: one of convergence.

openCypher proved a huge success. More than a dozen graph databases support Cypher, many dozens of tools & connectors also support Cypher, and there are tens of thousands of projects using Cypher, with tens of thousands of certified Cypher professionals, many universities and online courses that include Cypher, and hundreds of thousands of developers who know and use Cypher.

It turns out there is another step one could take: go from a de facto standard to a de jure standard. This entails going to an official standards body with global standing, investing time, rigor, and diligence across multiple parties and many, many meetings and documents, and coming out the other end with an iron-clad and thoroughly vetted standard.

The Advent of GQL: Setting New Standards

In 2016, Neo4j started work towards that goal. We approached other vendors about collaborating on a formal standard, participated in a multi-vendor and academic research project to build a graph query language from scratch on paper, and eventually joined ISO. Momentum reached a crescendo in 2018, when, just ahead of a critical ISO vote, we polled the database community with an open letter to vendors, asking the community if we database vendors should work out our differences and settle on one language, rather than minting out new ones every few months. Not surprisingly, the answer was a resounding yes. Challenge accepted! The die was cast.

In 2019, the International Organization for Standardization (ISO) announced a new project to create a standard graph query language. They called it GQL for Graph Query Language. Since then, Neo4j and several other database vendors have been diligently working to define a standard language.

Fast forward to today, and GQL is finally here. The ISO committee has officially published GQL as the new international standard for graph query languages. The publication of this standard holds immense potential for the future of graph query languages.

Looking Ahead: openCypher Becomes a Road to GQL

GQL is changing the graph query language scene. In this exciting new world, we have decided to keep the openCypher project alive. The rest of this post will explain what we are going to do to reboot the openCypher project and why we are doing it.

openCypher was initially meant as a language specification project, with an open forum for discussions on new language features followed by specification and community votes. In the new GQL world, ISO takes most of that role. Anyone interested in discussing language features should join ISO or any forum such as LDBC and actively participate in GQL development.

The other original role of openCypher was to help language developers by supplying useful artifacts. The openCypher community is quite big, and it is very likely that most openCypher implementers are now thinking about GQL and how to get there. Here, openCypher can still help.

The basic idea is to use the openCypher project to help Cypher database and tooling vendors on their road to GQL. All openCypher implementers, and all their users, start the road to GQL from a similar starting point, which is a very good one, given the similarities between Cypher and GQL. We can walk the remainder of the path together. Over time, openCypher will become a GQL implementation.

What Happens Now?

Our plan does not yet dot all the i’s and cross all the t’s, but here are the highlights.

We will ‘freeze’ the current openCypher 9, which stays as it is.

openCypher will start publishing openCypher Improvements Proposals (CIPs) that introduce variations and extensions to openCypher to make it GQL compliant. Only features coming from the GQL standard will be considered for inclusion. The CIPs will provide an explanation and audit trail, provide a tie-in to the GQL spec, and introduce GQL features in a way that is as least disruptive as possible for Cypher users.

Since ISO has already discussed and vetted these features while creating the GQL standard, the openCypher processes will slim down considerably. From the perspective of openCypher, the work of language design will all happen inside of ISO, making the openCypher work about implementing and reflecting the standard within the language artifacts.

We will also start making new, versioned openCypher releases on a regular time cadence (to be defined, let’s assume six months). A release will collect all the CIPs published in that period. We are working on a release naming strategy, but ideally, the name will mention the GQL standard we are working towards (GQL:2024) and some other component to indicate the progressive steps towards it.

openCypher will continue supplying artifacts: in addition to the language specification (CIPs), it will include an updated grammar that incorporates the core GQL syntax elements introduced by the published CIPs. We intend to keep the SDK up to date, but this will be second to updating the specs and grammar. Any help with SDK work will be welcomed.

openCypher has fulfilled its initial purpose, serving as the basis for a graph database lingua franca across much of the industry. It is heartwarming to the team that has been invested in curating openCypher to think that now that GQL is finally here, openCypher can still have a different but useful role in ramping implementers and users onto GQL. Our dream is to see all openCypher implementations becoming GQL-conformant implementations, after which we will all be speaking GQL! Let’s make it happen.

To learn more, the following blogs and documents provide additional information about the GQL standard, Neo4j Cypher, and openCypher:

1 To be clear, the artifacts provided by openCypher are at the language level. Implementations are up to each individual builder, i.e. planner, runtime, database statistics, internal storage formats, and so on.