As part of the preparations for a forthcoming Apache Spark 3.0 release, the Spark development community has just completed a positive vote for a Spark Project Improvement Proposal (SPIP) to add property graphs based on DataFrames to Spark.
Based on the achievements of the ongoing Cypher for Apache Spark project, Spark 3.0 users will be able to use the well-established Cypher graph query language for graph query processing, as well as having access to graph algorithms stemming from the GraphFrames project.
This is a great step forward for a standardized approach to graph analytics – including querying and algorithms – in an extremely widely-used data science and data integration platform. The vote reflects much patient and detailed work from many groups, and it’s great to see collaboration by many contributors to bring additional graph capability to such a large open source project.
Cypher and Plans for GQL
Cypher continues to gain new implementations in research and industry. Besides its ease of use and strong graph-specific feature set, Cypher is attractive to vendors and users because the openCypher community and implementing vendors are strongly supportive of the plan to create a single standard declarative query language called GQL (Graph Query Language), which will draw heavily on the ASCII-art, pattern-based representation of sub-graphs pioneered by Cypher, extended in Oracle’s PGQL and LDBC’s G-CORE research language.
The goal is that GQL will be a formal international standard, specified and maintained by the ISO working group that also manages the SQL standard (WG3).
The WG3 committee met last month in Brisbane, and they discussed and encouraged further work on shaping a proposal to initiate the GQL project. The new project should start formally in the second half of this year. Proposals from Neo4j, Oracle and TigerGraph on the content and scope of GQL were discussed at the meeting.
Property Graph & RDF Standards Specialists Will Meet at W3C Workshop
Supporters of GQL – including implementers of Cypher, PGQL and GSQL – are joining experts from the RDF world at a forthcoming W3C workshop on graph data management standards in Berlin early in March.
The over-subscribed W3C workshop will bring together 100 RDF, labelled property graph and SQL standards specialists to figure out the best ways of creating bridges between these disparate but related data models and languages. The goal is to benefit users who increasingly want to create effective graph-aware applications which fit well with existing data technologies.
An openCypher Implementers Meeting (oCIM) Will Follow
The fifth openCypher Implementers Meeting (oCIM) will also be taking place – at the same venue in Berlin – immediately after the W3C workshop.
oCIM participants will be discussing language improvement requests and proposals. These include the ability to carry out Cypher queries that project new graphs – and to incorporate those queries in parameterized views – as well as designs for domain-specific property graph types and relational-to-graph mappings.
Both these key features were first implemented in Cypher for Apache Spark, and they have also been discussed in previous implementers’ meetings. (The graph types and SQL source mappings are also reflected in Neo4j proposals for the forthcoming Property Graph Querying extension to SQL, which is seen as a read-only subset of the planned GQL language.)
The theme of creating a managed and orderly transition from Cypher to GQL is an overarching concern and opportunity for the openCypher community. With my Neo4j hat on, I can say that our company takes the need to avoid disruption to existing customers and their applications extremely seriously.
So, while we are big backers of GQL, we are strong advocates of carefully preserving working and familiar features from the “input” languages that are contributing to the future GQL specification. From a product perspective, we see Cypher as having a long future life while the industry defines – and then standardizes on – the GQL language over the coming years. For more information firstname.lastname@example.org.
GQL Community: The Property Graph Schema Working Group Will Also Meet Face to Face
openCypher advocates, designers and implementers from several companies are active in a broader, emerging GQL community that has already spawned informal working groups to analyze existing graph query languages and to discuss the scope and designs for stronger property graph schema.
There is a strong felt need for property graph schema/typing and a high interest in how to apply flexible or partial schema. The Property Graph Schema Working Group is also meeting face to face after the W3C workshop in Berlin, where there will have been an opportunity to correlate the property graph view against WC3 recommendations like OWL and SHACL, which overlap in their concerns.
It’s great to see this level of activity with so many contributors on so many fronts: the push for standardization reflects continuing growth in all aspects of the graph database software and services market.
Get the Ebook
About the Author
Alastair Green , Query Languages Standards & Research Lead, Neo4j
Alastair Green leads Neo4j’s work on graph query language development and standards, and he is part of the team making the Cypher language available in Apache Spark. He has a background in enterprise data integration and transaction processing product design and deployment.
He brings a strong mix of consulting, architecture, and product skills to the Neo4j team. He is Neo4j’s product manager for the Cypher language, and member of the Neo4j Cypher Language Group (CLG)
His career in IT began in software development, evolving into pre-sales and post-sales, then into various architect, consulting and business roles, and then eventually founding and running a startup specialized in distributed transaction management. For the last eight years, Alastair has worked in senior data-related product management and enterprise architecture positions inside of financial services: First at Barclays, and then at RBS where he was the head of Design Architecture for the Risk Solutions group.