Using the Neo4j Connector for Apache Spark
This tutorial shows how to use the Neo4j Connector for Apache Spark to write to and read data from an Aura instance.
Setup
-
Download Apache Spark.
Example: Spark 3.4.1, pre-built for Apache Hadoop 3.3 and later with Scala 2.12.
-
Download the Neo4j Connector JAR file, making sure to match both the Spark version and the Scala version.
Example: Neo4j Connector 5.1.0, built for Spark 3.x with Scala 2.12.
-
Decompress the Spark file and launch the Spark shell as in the following example:
$ spark-3.4.1-bin-hadoop3/bin/spark-shell --jars neo4j-connector-apache-spark_2.12-5.1.0_for_spark_3.jar
Running code in Apache Spark
You can copy-paste Scala code in the Spark shell by activating the |
Create a Spark session and set the Aura connection credentials:
import org.apache.spark.sql.{SaveMode, SparkSession}
val spark = SparkSession.builder().getOrCreate()
// Replace with the actual connection URI and credentials
val url = "neo4j+s://xxxxxxxx.databases.neo4j.io"
val username = "neo4j"
val password = ""
Then, create the Person
class and a Spark Dataset
with some example data:
case class Person(name: String, surname: String, age: Int)
// Create example Dataset
val ds = Seq(
Person("John", "Doe", 42),
Person("Jane", "Doe", 40)
).toDS()
Write the data to Aura:
// Write to Neo4j
ds.write.format("org.neo4j.spark.DataSource")
.mode(SaveMode.Overwrite)
.option("url", url)
.option("authentication.basic.username", username)
.option("authentication.basic.password", password)
.option("labels", ":Person")
.option("node.keys", "name,surname")
.save()
You can then query and visualize the data using the Neo4j Browser.
You can also read the data back from Aura:
// Read from Neo4j
val data = spark.read.format("org.neo4j.spark.DataSource")
.option("url", url)
.option("authentication.basic.username", username)
.option("authentication.basic.password", password)
.option("labels", "Person")
.load()
// Visualize the data as a table
data.show()
For further information on how to use the connector, read the Neo4j Spark Connector docs.