Reading from Neo4j

The connector provides three data source options to read data from a Neo4j database.

Table 1. Read options
Option	Description	Value	Default
`labels`	Use this if you only need to read nodes with their properties.	Colon-separated list of node labels to read.	(empty)
`relationship`	Use this if you need to read relationships along with their source and target nodes.	Relationship type to read.	(empty)
`query`	Use this if you need more flexibility and know how to write a Cypher^® query.	Cypher query with a `MATCH` clause.	(empty)

Examples

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

You can run the write examples for each option to have some example data to read.

`labels` option

Read the :Person nodes.

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()

df.show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()
)

df.show()

See Read nodes for more information and examples.

`relationship` option

Read the :BOUGHT relationship with its source and target nodes and its properties.

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()

See Read relationships for more information and examples.

`query` option

Use a Cypher query to read data.

Example

val readQuery = """
  MATCH (n:Person)
  RETURN id(n) AS id, n.fullName AS name
"""

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("query", readQuery)
    .load()

df.show()

Example

read_query = """
    MATCH (n:Person)
    RETURN id(n) AS id, n.fullName AS name
"""

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("query", read_query)
    .load()
)

df.show()

See Read with a Cypher query for more information and examples.

Type mapping

See Data type mapping for the full type mapping between Spark DataFrames and Neo4j.

Performance considerations

If the schema is not specified, the Spark Connector uses sampling. Since sampling is potentially an expensive operation, consider defining a schema.

Reading from Neo4j

Examples

labels option

relationship option

query option

Type mapping

Performance considerations

`labels` option

`relationship` option

`query` option