Creating the GQL Database Language Standard


This blog was written by the Neo4j Query Languages Standards and Research Team1.

Creating the GQL Database Language Standard.


A new standard for a property graph database language, ISO/IEC 39075 Information technology — Database languages — GQL, has been published2.

This new standard was developed by the international standards committee, SC32 WG33, which is also responsible for developing and enhancing the SQL database language standard.

The Neo4j LANGSTAR (languages, standards, and research) team has been actively participating in the development of the GQL standard since the project began.

This post is a short summary of our (as in LANGSTAR) involvement in what is quite a unique experience. New query languages don’t come very often. The last database query language standard developed by ISO was SQL!

What’s in GQL?

In a nutshell, GQL fuses ideas from industry-proven graph query languages, like openCypher, GSQL, and PGQL, with SQL, the foundational language of the database industry, into a full new database language standard, based on eight key ideas:

    1. Querying, updating, and managing graph databases using the property graph model.
    2. User-friendly and familiar language syntax.
    3. Consistent use of visual ‘ASCII-art’ style graph patterns that have been so successful in openCypher.
    4. Natural composition of complex queries, such as read-write-read queries (via reading direction-aligned linear statement flow in the style of “MATCH … RETURN …”).
    5. Flexible management of multiple graph data products in a central data catalog.
    6. Gradual schema design enabled by supporting both schema-free and fixed-schema graphs.
    7. SQL-compatible data types and expressions extended with support for native nested data and aligned with established technical standards (Unicode, IEEE 754, ISO 8601).
    8. A language foundation that is useful on its own, ready for incorporating additional forms of data into the property graph paradigm, and upon which future versions of the standard can be built.

We will get more into the details on GQL in future blog posts.

How Is an ISO Standard Created?

International standards are created by people who have interest and expertise in the topic being standardized. In the case of SC32 WG3, the participants (individual experts) are delegated by the standards organizations of various ISO member countries around the world. To get involved in the international committee, one must first join a standards organization in some country. The process for joining varies depending on your country.

Working on a standard takes a fair amount of time and effort so, in practice, most individual participants work for companies whose expertise and interest is in data and databases. Experts set their commercial differences aside (but not their opinions!) for the advancement of the space. Standards participants also have to have a tolerance for acronyms and some amount of bureaucracy.

Over more than 40 years of creating database language standards, SC32 WG3 has developed a certain style and culture. Database language standards carefully specify (sometimes in exhausting detail) the syntax and semantics of the language. WG3 builds consensus on the content of the standards using detailed written papers and discussions during meetings. When WG3 participants accept the paper, the GQL editors integrate the changes into the next version of the GQL draft. If a paper is rejected, the author(s) may revise it and bring it back at a later date or may abandon the ideas completely. In any case, we have worked to create an environment where we can argue about the technical ideas during the meeting, and have dinner together in the evenings.

The official project to produce the GQL standard was initiated in 2019 with a New Work Item Proposal (NWIP). The NWIP was initiated by WG3 in June 2019 and approved by the national body participants in September 2019. We had done some amount of work preparing for the GQL NWIP, so it took about five years to complete the GQL standard.

It Takes a Lot of Meetings to Make a Standard!

Prior to 2020, WG3 held two or three in-person meetings a year in various locations around the world. In 2020, we met in person once in January, before face-to-face meetings went out of style, for reasons any reader can probably relate to. When it became clear that we were not going to be able to have face-to-face meetings, we adapted and moved to shorter but more frequent web conferences. Since WG3 is an international standards committee, we have participants from Japan, Korea, China, sometimes Australia, mainland Europe, the UK, and the US. With this distribution of participants, there is no single time that works for everyone. So, we rotated the start time for the web conferences so that everyone had a lousy time of day at some point. The 38 meetings to produce the GQL standard included eleven face-to-face meetings and twenty-seven web conferences.

In the US, there were also two-hour expert group meetings every other Tuesday from 2019 through 2023 plus lots of work and async communication in-between sessions.

GQL Trivia

How big is GQL?

628 pages in total. This is about the same number of pages as SQL-92, which was not the first, but the second major revision of SQL.

How many papers does it take to make a standard?

In addition to the main spec, the GQL standard incorporates 430 papers that were developed, reviewed, discussed, and accepted into the GQL standard.

430 papers across 38 meetings works out to a dozen or so GQL papers per meeting. However, while the GQL standard was undergoing development, we were also developing the 2023 edition of the SQL standard. A key addition here was in a part of the new SQL standard that overlaps with GQL. Among other things, this defines rules for mapping tables to graphs, and adds GQL syntax for matching patterns. (In case we didn’t mention it, the same Neo4j team that focused on the GQL standard also contributed to adding graph pattern matching to the SQL/PGQ standard.)

Fun4 statistics

The most papers at a single meeting was in February 2023 where we reviewed and accepted 85 GQL papers during a five-day meeting.

The length of a GQL paper varies. WG3 change proposals include some introduction, descriptive text, examples, the proposed change to the current draft, and some additional stuff such as references and a checklist. It is difficult to do anything in less than about three pages. Ten to twenty pages is common, a number of papers exceeded 50 pages, and a much smaller number exceeded 100 pages. A paper on graph types stands out. The paper was a total of 177 pages, although the last 100 pages were examples illustrating the results of the proposed changes.

The longest of the 430 GQL papers: 177-page paper on Graph Types

The longest of the 430 GQL papers: 177-page paper on Graph Types

Who Did the Work?

Creating a standard is a collaborative effort – the GQL standard is the result of work by all of the participants.

Organizations that participated in the U.S. GQL Expert Group include Datastax, Google, IBM, Intel, Katana Graph, Neo4j, Optum Technology, Oracle, PuppyQuery, RelationalAI, and TigerGraph. Many of the GQL expert group participants also participated in the international group, WG3. In addition, WG3 had participants from Actian, Boray Data, Cannan Consultancy, CnTechSystems, EDB, Profium, LDBC, TF Informatik, Tokyo Metropolitan University, and University of Edinburgh.

Reflections

From a Neo4j LANGSTAR point of view, working on a database language standard was a steep learning curve. Writing WG3 change proposals is a skill one learns by experience, by reviewing papers written by others, and by being reviewed. The WG3 process requires writing, presenting, reviewing, modifying, and re-presenting. We learned a lot doing that, and in spite (or because) of the long hours and complex technical discussions, it was gratifying fun!

Our colleagues in INCITS GQL Expert Group and WG3 are mostly employed by database vendors, so we’ve gained extensive experience playing well with others (often in odd hours, due to coordinating meetings between experts from around the globe). In a world where it looks increasingly impossible to agree on anything, we have shown that even business competitors can work together to make something good for their customers. That was refreshing and very rewarding.

Additional Information

To learn more, the following blogs and documents provide additional information about the GQL standard, Neo4j Cypher, and openCypher:





1 The Neo4j Query Languages Standards and Research Team includes Finbar Good, Keith Hare, Stefan Plantikow, and Hannes Voigt.

2 As of April 11, 2024.

3 The full designation is “ISO/IEC JTC1 SC32 WG3 Database Languages” where ISO is the International Organization for Standardization and IEC is the International Electrotechnical Commission. JTC1 is a Joint Technical Committee underneath ISO and IEC. JTC1 is responsible for most of the computer related standards. SC32 is SubCommittee 32, which is responsible for data management and interchange standards. WG3 is Working Group 3, which is responsible for database language standards, namely the SQL standard and now the GQL standard.

4 Standards people might have a weird sense of fun.