Reading from Neo4j

The connector provides three data source options to read data from a Neo4j database.

Table 1. Read options
Option Description Value Default

labels

Use this if you only need to read nodes with their properties.

Colon-separated list of node labels to read.

(empty)

relationship

Use this if you need to read relationships along with their source and target nodes.

Relationship type to read.

(empty)

query

Use this if you need more flexibility and know how to write a Cypher® query.

Cypher query with a MATCH clause.

(empty)

Examples

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

You can run the write examples for each option to have some example data to read.

labels option

Read the :Person nodes.

Example
val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("labels", ":Person")
  .load()

df.show()
Example
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()
)

df.show()

See Read nodes for more information and examples.

relationship option

Read the :BOUGHT relationship with its source and target nodes and its properties.

Example
val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("relationship", "BOUGHT")
  .option("relationship.source.labels", ":Customer")
  .option("relationship.target.labels", ":Product")
  .load()

df.show()
Example
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()

See Read relationships for more information and examples.

query option

Use a Cypher query to read data.

Example
val readQuery = """
  MATCH (n:Person)
  RETURN id(n) AS id, n.fullName AS name
"""

val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("query", readQuery)
  .load()

df.show()
Example
read_query = """
    MATCH (n:Person)
    RETURN id(n) AS id, n.fullName AS name
"""

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("query", read_query)
    .load()
)

df.show()

See Read with a Cypher query for more information and examples.

Type mapping

See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.

Performance considerations

If the schema is not specified, the Spark Connector uses sampling. Since sampling is potentially an expensive operation, consider defining a schema.