Introducing the Neo4j Connector for Apache Spark

SAN MATEO, Calif. – November 12, 2020Neo4j®, the leader in graph technology, announced today the Neo4j Connector for Apache Spark, an integration tool to move data bi-directionally between the Neo4j Graph Platform and Apache Spark™.  

Apache Spark is the enterprise data orchestration layer of choice, particularly for complex data pipelines for machine learning applications and predictive data analytics. The Neo4j Connector for Apache Spark provides easy, bi-directional access between Neo4j graph datasets and many other data sources – including relational databases, semistructured and unstructured (NoSQL) repositories – transforming data from tables to graphs and back as needed. The new connector is available at no cost and is fully supported for Neo4j customers.

Spark Connector Workflow and Arch

Caption: With the Neo4j Connector for Apache Spark, users can meld Spark data and Neo4j graph data to answer more questions, gain new insights and create new solutions.

The Neo4j Connector for Apache Spark comes in response to high demand from the Spark and Neo4j communities to apply graphs to machine learning pipelines, unify data silos and derive greater value from existing data stores. According to an independent survey, "Technology Executive Priorities for Knowledge Graphs” recently conducted by Pulse, the top three reasons motivating enterprise IT decision makers to expand their use of knowledge graphs are to improve machine learning and artificial intelligence systems (60%), open new revenue streams (50%) and connect data silos to make information more accessible (50%). 

For Neo4j Customers: Neo4j graphs can be connected to any other system or data source via Spark. The Spark Connector transforms tabular data sources to graph data to reveal more context and insight inside Neo4j. The bidirectional integration means that Spark cleans and transforms data that drives Neo4j graph applications, feeding graph data into any Spark workflow. 

For Spark Users: The Neo4j Connector for Apache Spark brings advanced graph capabilities to the Spark ecosystem so businesses can use contextual information to improve forecasting, analytics and predictions. This connector enables teams to easily add Neo4j graph data to improve high-value processes, like machine learning, without reworking existing pipelines.

Amy E. Hodler, Director of Graph Analytics and AI Programs at Neo4j shared why customers are excited to connect Neo4j and Spark. 

“The vast majority of Neo4j’s enterprise customer base has Apache Spark in their data environment,” Hodler said. “With the Neo4j Connector for Apache Spark, our customers can consolidate their data pipelines and supercharge their Neo4j Graphs with access to the massive Spark ecosystem. The connector allows data scientists and application developers to easily meld Neo4j graph data and Spark data to answer more questions, gain new insights and create new solutions.”

In January, Gartner published An Introduction to and Evaluation of Apache Spark for Modern Data Architectures*. The report states, “Spark has evolved into a viable production platform to meet enterprise needs. It is easy for developers to learn and use to develop solutions. Spark has also cultivated a vibrant community of committers and solutions. Spark’s architecture and its applicability to ingest, process and analyze both operational and analytical workloads allow it to reduce the time between obtaining data and delivering insights.”

*Gartner, “An Introduction to and Evaluation of Apache Spark for Modern Data Architectures”, Sanjeev Mohan &  Sumit Pal, 14 January 2020.

Learn More about the Neo4j Apache Spark Connector

The Neo4j Apache Spark Connector is available for download here. For more information, read “The Great Hookup: Announcing Neo4j Connector for Apache Spark” blog post. 

Neo4j’s next virtual event on November 17, Connections: Graph Architecture and Integrations will highlight the key use cases and best practices around integrating Neo4j’s graph database with popular technologies, including Apache Spark.


About Neo4j

Neo4j is the leader in graph database technology. As the world’s most widely deployed graph database, we help global brands – including Comcast, NASA, UBS and Volvo Cars – to reveal and predict how people, processes and systems are interrelated. Using this relationships-first approach, applications built using Neo4j tackle connected data challenges such as analytics and artificial intelligence, fraud detection, real-time recommendations and knowledge graphs. Find out more at

Share this on Twitter


© 2020 Neo4j, Inc., Neo Technology®, Neo4j®, Cypher®, Neo4j® Bloom™ and Neo4j® Aura™ are registered trademarks or a trademark of Neo4j, Inc. Apache Spark™ is a trademark of the Apache Software Foundation. All other marks are owned by their respective companies.