Open Source Contribution Makes ‘SQL for Graphs’ Available on Apache Spark

New contribution to the openCypher project expands Hadoop ecosystem with graph analytic capabilities for Spark, making Cypher available to the popular in-memory analytic engine

NEW YORK, Oct. 24, 2017 /PRNewswire/ — GraphConnect  – Neo4j, the market leader in connected data, today announced that it has donated an early version of Cypher for Apache™ Spark® (CAPS) language toolkit to the openCypher project. This contribution will allow big data analysts to incorporate graph querying in their workflows, making it easier to bring graph algorithms to bear, dramatically broadening how they reveal connections in their data.  Developers of Spark applications now join the users of Neo4j, SAP HANA, Redis Graph and AgensGraph, among others, in gaining access to Cypher, the leading declarative property graph query language. This also expands the tooling available to any developer, under Apache 2.0 licenses from the openCypher project.

As graph-powered applications and analytic projects gain success, big data teams are looking to connect more of their data and personnel into this work. This is happening at places like eBay for recommendations via conversational commerce, Telia for smart home, and Comcast for smart home content recommendations. Until now, the full power of graph pattern matching has been unavailable to data scientists using Spark or for data wrangling pipelines. Now, with Cypher for Apache Spark, these data scientists can iterate easier and connect adjacent data sources to their graph applications much more quickly.

“Cypher for Apache Spark is an important milestone in both the pervasiveness of graph technology, and in the evolution of the Cypher query language itself,” explains Philip Rathle, VP of product at Neo4j. “In making Cypher available for Apache Spark, we looked closely at the way Spark works with immutable data sets, and then in coordination with the openCypher group, brought in facilities that let graph queries operate over the results of graph queries, and an API that allows graphs to be split, transformed, snapshotted and linked together in processing chains that give huge flexibility in shaping graph data, including data from users’ data lakes. Cypher for Apache Spark is the first implementation of Cypher to allow queries to return graphs, as well as tables of data.”

Cypher for Apache Spark also implements the new multiple graph and composable query features emerging from the work of the openCypher Implementers Group which formed earlier this year. The openCypher project is hosting Cypher for Apache Spark as alpha-stage open source under the Apache 2.0 license, in order to allow other contributors to join in the evolution of this important project at an early stage.

“As data accumulates in lakes at accelerating speeds and in unprecedented volumes, the challenge of extracting value from it by traversing differentiated structures and inferring context from them grows exponentially,” said Stephen O’Grady, analyst and co-founder at RedMonk. “Neo4j and its Cypher graph query language intend to be the de facto solution to precisely this problem.”

Neo4j is the graph platform leader and a relentless champion of making graph technology accessible for a larger, wider audience. This announcement builds on Neo4j’s sponsorship of openCypher in October 2015, as an effort to push the whole graph industry forward by tapping into the open source community and making Cypher’s evolution an open exercise while avoiding redundant research. Today over 20 organizations and universities are participating in the openCypher implementers group. This team meets regularly to discuss how Cypher should evolve. These meetings are open to the public with governance by consensus in the Cypher community.