Databricks quickstart
This page includes instructions on the usage of a third-party platform, which may be subject to changes beyond our control. In case of doubt, refer to the third-party platform documentation. |
Prerequisites
-
A Databricks workspace must be available on an URL like
https://dbc-xxxxxxxx-yyyy.cloud.databricks.com
.
Set up a compute cluster
-
Create a compute cluster with
Single user
access mode,Unrestricted
policy, and your preferred Scala runtime.Shared access modes are not currently supported.
-
Once the cluster is available, open its page and select the Libraries tab.
-
Select Install new and choose Maven as the library source.
-
Select Search Packages, search for
neo4j-spark-connector
on Spark Packages, then Select it.Make sure to select the correct version of the connector by matching the Scala version to the cluster’s runtime.
-
Select Install.
Unity Catalog
Neo4j supports the Unity Catalog in Single user
access mode only.
Refer to the Databricks documentation for further information.
Session configuration
You can set the Spark configuration on the cluster you are running your notebooks on by doing the following:
-
Open the cluster configuration page.
-
Select the Advanced Options toggle under Configuration.
-
Select the Spark tab.
For example, you can add Neo4j Bearer authentication configuration in the text area as follows:
neo4j.url neo4j://<host>:<port>
neo4j.authentication.type bearer
neo4j.authentication.bearer.token <token>
Databricks advises against storing secrets such as passwords and tokens in plain text. A secure alternative is to use secrets instead. |
Authentication methods
All the authentication methods supported by the Neo4j Java Driver (version 4.4 and higher) are supported.
See the Neo4j driver options for more details on authentication configuration.
Set up secrets
You can add secrets to your environment using the Secrets API via the Databricks CLI. If you use a Databricks runtime version 15.0 or above, you can add secrets directly from a notebook terminal.
After setting secrets up, you can access them from a Databricks notebook using the Databricks Utilities (dbutils
).
For example, given a neo4j
scope and the username
and password
secrets for basic authentication, you can do the following in a Python notebook:
from pyspark.sql import SparkSession
url = "neo4j+s://xxxxxxxx.databases.neo4j.io"
username = dbutils.secrets.get(scope="neo4j", key="username")
password = dbutils.secrets.get(scope="neo4j", key="password")
dbname = "neo4j"
spark = (
SparkSession.builder.config("neo4j.url", url)
.config("neo4j.authentication.basic.username", username)
.config("neo4j.authentication.basic.password", password)
.config("neo4j.database", dbname)
.getOrCreate()
)
Delta tables
You can use the Spark connector to read from and write to Delta tables from a Databricks notebook. This does not require any additional setup.
Basic roundtrip
The following example shows how to read a Delta table, write it as nodes and node properties to Neo4j, read the corresponding nodes and node properties from Neo4j, and write them to a new Delta table.
Content of the Delta table
The example assumes that a Delta table users_example
exists and contains the following data:
name | surname | age |
---|---|---|
John |
Doe |
42 |
Jane |
Doe |
40 |
# Read the Delta table
tableDF = spark.read.table("users_example")
# Write the DataFrame to Neo4j as nodes
(
tableDF
.write.format("org.neo4j.spark.DataSource")
.mode("Append")
.option("labels", ":User")
.save()
)
# Read the nodes with `:User` label from Neo4j
neoDF = (
spark.read.format("org.neo4j.spark.DataSource")
.option("labels", ":User")
.load()
)
# Write the DataFrame to another Delta table,
# which will contain the additional columns
# `<id>` and `<labels>`
neoDF.write.saveAsTable("users_new_example")
Delta tables to Neo4j nodes and relationships
To avoid deadlocks, always use a single partition (with |
The following example shows how to read a Delta table and write its data as both nodes and relationships to Neo4j.
See the Writing page for details on using the Overwrite
mode and on writing nodes only.
Content of the Delta table
The example assumes that a Delta table customers_products_example
exists and contains the following data:
name | surname | customerID | product | quantity | order |
---|---|---|---|---|---|
John |
Doe |
1 |
Product 1 |
200 |
ABC100 |
Jane |
Doe |
2 |
Product 2 |
100 |
ABC200 |
# Read the Delta table into a DataFrame
relDF = spark.read.table("customers_products_example")
# Write the table to Neo4j using the
# `relationship` write option
(
relDF
# Use a single partition
.coalesce(1)
.write
# Create new relationships
.mode("Append")
.format("org.neo4j.spark.DataSource")
# Assign a type to the relationships
.option("relationship", "BOUGHT")
# Use `keys` strategy
.option("relationship.save.strategy", "keys")
# Create source nodes and assign them a label
.option("relationship.source.save.mode", "Append")
.option("relationship.source.labels", ":Customer")
# Map DataFrame columns to source node properties
.option("relationship.source.node.properties", "name,surname,customerID:id")
# Create target nodes and assign them a label
.option("relationship.target.save.mode", "Append")
.option("relationship.target.labels", ":Product")
# Map DataFrame columns to target node properties
.option("relationship.target.node.properties", "product:name")
# Map DataFrame columns to relationship properties
.option("relationship.properties", "quantity,order")
.save()
)
Neo4j nodes to Delta tables
The following example shows how to read nodes from Neo4j and write them to a Delta table. See the Reading page for details on reading relationships.
# Read the nodes with `:Customer` label from Neo4j
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("labels", ":Customer")
.load()
)
# Write the DataFrame to another Delta table
df.write.saveAsTable("customers_status_example")