Project Overview

This chapter provides an introduction to the Neo4j Connector for Apache Spark

Overview

The Neo4j Connector for Apache Spark is intended to make integrating graphs together with spark easy. There are effectively two ways of using the connector:

  • As a data source: read any set of nodes or relationships as a DataFrame in Spark

  • As a sink: write any DataFrame to Neo4j as a collection of nodes or relationships, or alternatively; use a Cypher statement to process records in a DataFrame into the graph pattern of your choice.

Multi-Languages

Because the connector is based on the new Spark DataSource API, other spark interpreters for languages such as Python and R will work.

The API remains the same, and mostly only slight syntax changes are necessary to accomodate the differences between (for example) Python and Scala.

Compatibility

Neo4j Compatibility

This connector works with Neo4j 3.5, and the entire 4+ series of Neo4j, whether run as a single instance, in causal cluster mode, or run as a managed service in Neo4j Aura. The connector does not rely on enterprise features, and as such will work with Neo4j Community as well, with the appropriate version number.

Neo4j versions prior to 3.5 are not supported

Spark Compatibility

This connector currently supports Spark 2.4.5+ with Scala 2.11 and Scala 2.12 and Spark 3.0+ with Scala 2.12. Depending on the combination of Spark and Scala version you’ll need a different JAR. JARs are named in the form neo4j-connector-apache-spark_${scala.version}_${spark.version}_${connector.version}

Here’s a compatibility table to help you choose the correct JAR.

Table 1. Compatibility Table
Spark 2.4 Spark 3.0+

Scala 2.11

neo4j-connector-apache-spark_2.11-4.0.1_for_spark_2.4.jar

(not available)

Scala 2.12

neo4j-connector-apache-spark_2.12-4.0.1_for_spark_2.4.jar

neo4j-connector-apache-spark_2.12-4.0.1_for_spark_3.jar

Training

If you want an introduction on the Neo4j Connector for Apache Spark, take a look at the training that Andrea Santurbano presented at NODES2020.

Scala Compatibility

This connector works with Scala 2.11 and 2.12. Because of the differences in the APIs, different JAR files are needed depending on your scala version. Ensure that you have the appropriate JAR file for your environment.

Availability

This connector is provided under the terms of the Apache 2.0 license, which can be found in the GitHub repository.

Support

For Neo4j Enterprise and Neo4j Aura customers, official releases of this connector are supported under the terms of your existing Neo4j support agreement. This support extends only to regular releases, and excludes alpha, beta, and pre-releases. If you have any questions about the support policy, please get in touch with Neo4j.