Project overview

The Neo4j Connector for Apache Spark is intended to make integrating graphs with Spark easy.

There are effectively two ways of using the connector:

  • As a data source: you can read any set of nodes or relationships as a DataFrame in Spark.

  • As a sink: you can write any DataFrame to Neo4j as a collection of nodes or relationships or use a Cypher statement to process records in a DataFrame into the graph pattern of your choice.

Multiple languages support

Because the connector is based on the new Spark DataSource API, other Spark interpreters for languages such as Python and R work.

The API remains the same, and mostly only slight syntax changes are necessary to accommodate the differences between (for example) Python and Scala.

Compatibility

Neo4j compatibility

This connector works with Neo4j 3.5 and the entire 4.x series of Neo4j, whether run as a single instance, in Causal Cluster mode, or run as a managed service in Neo4j AuraDB. The connector does not rely on Enterprise Edition features and as such works with Neo4j Community Edition as well, with the appropriate version number.

Neo4j versions prior to 3.5 are not supported.

Spark and Scala compatibility

This connector currently supports:

  • Spark 2.4.5+ with Scala 2.11 and Scala 2.12.

  • Spark 3.0+ with Scala 2.12.

Depending on the combination of Spark and Scala versions you need a different JAR. JARs are named in the form: neo4j-connector-apache-spark_${scala.version}_${connector.version}_for_${spark.version}

Ensure that you have the appropriate JAR file for your environment. Here’s a compatibility table to help you choose the correct JAR.

Table 1. Compatibility table
Spark 2.4 Spark 3.0.x and 3.1.x 3.2.x and above

Scala 2.11

neo4j-connector-apache-spark_2.11-4.1.5_for_spark_2.4.jar

(not available)

(not available)

Scala 2.12

neo4j-connector-apache-spark_2.12-4.1.5_for_spark_2.4.jar

neo4j-connector-apache-spark_2.12-4.1.5_for_spark_3.jar

neo4j-connector-apache-spark_2.12-5.0.0_for_spark_3.jar

Scala 2.13

(not available)

neo4j-connector-apache-spark_2.13-4.1.5_for_spark_3.jar

neo4j-connector-apache-spark_2.13-5.0.0_for_spark_3.jar

Training

If you want an introduction on the Neo4j Connector for Apache Spark, take a look at the training that Andrea Santurbano presented at NODES2020.

Availability

This connector is provided under the terms of the Apache 2.0 license, which can be found in the GitHub repository.

Support

For Neo4j Enterprise and Neo4j AuraDB customers, official releases of this connector are supported under the terms of your existing Neo4j support agreement. This support extends only to regular releases and excludes alpha, beta, and pre-releases. If you have any questions about the support policy, get in touch with Neo4j.