The Neo4j Connector for Apache Spark is intended to make integrating graphs with Spark easy.
There are effectively two ways of using the connector:
As a data source: you can read any set of nodes or relationships as a DataFrame in Spark.
As a sink: you can write any DataFrame to Neo4j as a collection of nodes or relationships or use a Cypher® statement to process records in a DataFrame into the graph pattern of your choice.
Because the connector is based on the new Spark DataSource API, other Spark interpreters for languages such as Python and R work.
The API remains the same, and mostly only slight syntax changes are necessary to accommodate the differences between (for example) Python and Scala.
This connector works with Neo4j 4.4 and 5.x, whether run as a single instance, as a cluster, or a managed service in Neo4j AuraDB. The connector does not rely on any Enterprise Edition features and as such works with Neo4j Community Edition as well, with the appropriate version number.
This connector currently supports Spark 3.0+ with Scala 2.12 and Scala 2.13.
Depending on the combination of Spark and Scala versions you need a different JAR.
JARs are named in the form:
Ensure that you have the appropriate JAR file for your environment. Here’s a compatibility table to help you choose the correct JAR.
|Spark 3.0.x and 3.1.x
|3.2.x and above
If you want an introduction on the Neo4j Connector for Apache Spark, take a look at the training that Andrea Santurbano presented at NODES2020.
This connector is provided under the terms of the Apache 2.0 license, which can be found in the GitHub repository.
For Neo4j Enterprise and Neo4j AuraDB customers, official releases of this connector are supported under the terms of your existing Neo4j support agreement. This support extends only to regular releases and excludes alpha, beta, and pre-releases. If you have any questions about the support policy, get in touch with Neo4j.