Apache Spark Developers Have Voted to Include Cypher in Spark 3.0 [Update]

The community vote by Apache Spark contributors has just closed – and the results are positive. Thank you to everyone who participated when we asked for your votes and feedback.

Learn all about the vote to include Cypher queries and graph algorithms in Apache Spark 3.0

As part of the preparations for a forthcoming Apache Spark 3.0 release, the Spark development community has just completed a positive vote for a Spark Project Improvement Proposal (SPIP) to add property graphs based on DataFrames to Spark.

Based on the achievements of the ongoing Cypher for Apache Spark project, Spark 3.0 users will be able to use the well-established Cypher graph query language for graph query processing, as well as having access to graph algorithms stemming from the GraphFrames project.

The Apache Spark SPIP for property graphs, cypher queries and graph algorithms in Spark 3.0

This is a great step forward for a standardized approach to graph analytics – including querying and algorithms – in an extremely widely-used data science and data integration platform. The vote reflects much patient and detailed work from many groups, and it’s great to see collaboration by many contributors to bring additional graph capability to such a large open source project.

Cypher and Plans for GQL

Cypher continues to gain new implementations in research and industry. Besides its ease of use and strong graph-specific feature set, Cypher is attractive to vendors and users because the openCypher community and implementing vendors are strongly supportive of the plan to create a single standard declarative query language called GQL (Graph Query Language), which will draw heavily on the ASCII-art, pattern-based representation of sub-graphs pioneered by Cypher, extended in Oracle’s PGQL and LDBC’s G-CORE research language.

The goal is that GQL will be a formal international standard, specified and maintained by the ISO working group that also manages the SQL standard (WG3).

The WG3 committee met last month in Brisbane, and they discussed and encouraged further work on shaping a proposal to initiate the GQL project. The new project should start formally in the second half of this year. Proposals from Neo4j, Oracle and TigerGraph on the content and scope of GQL were discussed at the meeting.

Property Graph & RDF Standards Specialists Will Meet at W3C Workshop

Supporters of GQL – including implementers of Cypher, PGQL and GSQL – are joining experts from the RDF world at a forthcoming W3C workshop on graph data management standards in Berlin early in March.

The over-subscribed W3C workshop will bring together 100 RDF, labelled property graph and SQL standards specialists to figure out the best ways of creating bridges between these disparate but related data models and languages. The goal is to benefit users who increasingly want to create effective graph-aware applications which fit well with existing data technologies.

An openCypher Implementers Meeting (oCIM) Will Follow

The fifth openCypher Implementers Meeting (oCIM) will also be taking place – at the same venue in Berlin – immediately after the W3C workshop.

oCIM participants will be discussing language improvement requests and proposals. These include the ability to carry out Cypher queries that project new graphs – and to incorporate those queries in parameterized views – as well as designs for domain-specific property graph types and relational-to-graph mappings.

Both these key features were first implemented in Cypher for Apache Spark, and they have also been discussed in previous implementers’ meetings. (The graph types and SQL source mappings are also reflected in Neo4j proposals for the forthcoming Property Graph Querying extension to SQL, which is seen as a read-only subset of the planned GQL language.)

The theme of creating a managed and orderly transition from Cypher to GQL is an overarching concern and opportunity for the openCypher community. With my Neo4j hat on, I can say that our company takes the need to avoid disruption to existing customers and their applications extremely seriously.

So, while we are big backers of GQL, we are strong advocates of carefully preserving working and familiar features from the “input” languages that are contributing to the future GQL specification. From a product perspective, we see Cypher as having a long future life while the industry defines – and then standardizes on – the GQL language over the coming years. For more information info@opencypher.org.

GQL Community: The Property Graph Schema Working Group Will Also Meet Face to Face

openCypher advocates, designers and implementers from several companies are active in a broader, emerging GQL community that has already spawned informal working groups to analyze existing graph query languages and to discuss the scope and designs for stronger property graph schema.

There is a strong felt need for property graph schema/typing and a high interest in how to apply flexible or partial schema. The Property Graph Schema Working Group is also meeting face to face after the W3C workshop in Berlin, where there will have been an opportunity to correlate the property graph view against WC3 recommendations like OWL and SHACL, which overlap in their concerns.

It’s great to see this level of activity with so many contributors on so many fronts: the push for standardization reflects continuing growth in all aspects of the graph database software and services market.

Want to learn how relational databases compare to their graph counterparts? Get The Definitive Guide to Graph Databases for the RDBMS Developer, and discover when and how to use graphs in conjunction with your relational database.

Get the Ebook