Reading from Neo4j
The connector provides three data source options to read data from a Neo4j database.
Option | Description | Value | Default |
---|---|---|---|
|
Use this if you only need to read nodes with their properties. |
Colon-separated list of node labels to read. |
(empty) |
|
Use this if you need to read relationships along with their source and target nodes. |
Relationship type to read. |
(empty) |
|
Use this if you need more flexibility and know how to write a Cypher® query. |
Cypher query with a |
(empty) |
Examples
All the examples in this page assume that the |
You can run the write examples for each option to have some example data to read. |
labels
option
Read the :Person
nodes.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("labels", ":Person")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("labels", ":Person")
.load()
)
df.show()
See Read nodes for more information and examples.
relationship
option
Read the :BOUGHT
relationship with its source and target nodes and its properties.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
See Read relationships for more information and examples.
query
option
Use a Cypher query to read data.
val readQuery = """
MATCH (n:Person)
RETURN id(n) AS id, n.fullName AS name
"""
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("query", readQuery)
.load()
df.show()
read_query = """
MATCH (n:Person)
RETURN id(n) AS id, n.fullName AS name
"""
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("query", read_query)
.load()
)
df.show()
See Read with a Cypher query for more information and examples.
Type mapping
See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.
Performance considerations
If the schema is not specified, the Spark Connector uses sampling. Since sampling is potentially an expensive operation, consider defining a schema.