Using the Apache Spark Connector

This tutorial shows how to use the Neo4j Connector for Apache Spark to write to and read data from an Aura instance.

Setup

  1. Download Apache Spark.

    Example: Spark 3.4.1, pre-built for Apache Hadoop 3.3 and later with Scala 2.12.

  2. Download the Neo4j Connector JAR file, making sure to match both the Spark version and the Scala version.

    Example: Neo4j Connector 5.1.0, built for Spark 3.x with Scala 2.12.

  3. Decompress the Spark file and launch the Spark shell as in the following example:

    $ spark-3.4.1-bin-hadoop3/bin/spark-shell --jars neo4j-connector-apache-spark_2.12-5.1.0_for_spark_3.jar

Running code in Spark

You can copy-paste Scala code in the Spark shell by activating the paste mode with the :paste command.

Create a Spark session and set the Aura connection credentials:

import org.apache.spark.sql.{SaveMode, SparkSession}

val spark = SparkSession.builder().getOrCreate()

// Replace with the actual connection URI and credentials
val url = "neo4j+s://xxxxxxxx.databases.neo4j.io"
val username = "neo4j"
val password = ""

Then, create the Person class and a Spark Dataset with some example data:

case class Person(name: String, surname: String, age: Int)

// Create example Dataset
val ds = Seq(
    Person("John", "Doe", 42),
    Person("Jane", "Doe", 40)
).toDS()

Write the data to Aura:

// Write to Neo4j
ds.write.format("org.neo4j.spark.DataSource")
    .mode(SaveMode.Overwrite)
    .option("url", url)
    .option("authentication.basic.username", username)
    .option("authentication.basic.password", password)
    .option("labels", ":Person")
    .option("node.keys", "name,surname")
    .save()

You can then query and visualize the data using the Neo4j Browser.

You can also read the data back from Aura:

// Read from Neo4j
val data = spark.read.format("org.neo4j.spark.DataSource")
    .option("url", url)
    .option("authentication.basic.username", username)
    .option("authentication.basic.password", password)
    .option("labels", "Person")
    .load()

// Visualize the data as a table
data.show()

For further information on how to use the connector, read the Neo4j Spark Connector docs.