Neo4j Connector for Apache Spark

The Neo4j Connector for Apache Spark provides integration between Neo4j and Apache Spark.

You can use the connector to process and transfer data between Neo4j and other platforms such as Databricks and several data warehouses. Based on the Spark DataSource API, the connector supports all the programming languages that Spark supports.

Graphs and DataFrames

The connector uses schema inference to convert Neo4j graphs into Spark table-based DataFrames. For example, consider a graph with the following schema:

The connector creates a DataFrame with :Customer and :Product nodes connected by the BOUGHT relationship, along with any node or relationship properties. The Schema inference section shows a more detailed example of this process, while the Data type mapping section shows how data types are mapped between Neo4j and Spark.

The connector supports writing DataFrames to Neo4j as well, and custom Cypher^® queries both for reading and for writing data.

Compatibility

Neo4j compatibility

The connector supports Neo4j 5.x and 4.4, whether run as a managed service in Neo4j Aura, as a single instance, or as a cluster. It supports both the Community and the Enterprise Edition.

Spark and Scala compatibility

The connector currently supports Spark 3.0+ with Scala 2.12 and Scala 2.13.

Training

An introduction to the connector by Andrea Santurbano is available on YouTube.

License

The source code is provided under the terms of the Apache 2.0 license. You are free to download, modify, and redistribute the connector; however, Neo4j support applies only to official builds provided by Neo4j.

Support

For Neo4j Enterprise and Neo4j AuraDB customers, official releases of this connector are supported under the terms of your existing Neo4j support agreement. This support extends only to regular releases and excludes alpha, beta, and pre-releases. If you have any questions about the support policy, get in touch with Neo4j.

License: Creative Commons 4.0