Read nodes

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

With the labels option, the connector reads data from the Neo4j database as a set of nodes with the given labels.

The connector builds a MATCH Cypher® query that uses SKIP and LIMIT to read a batch of rows.

The code from the example reads the :Person nodes with their node properties into a DataFrame.

Example
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()

df.show()
Example
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()
)

df.show()
Equivalent Cypher query
MATCH (n:Person)
RETURN
  id(n) AS `<id>`,
  labels(n) AS `<labels>`,
  n.surname AS surname,
  n.name AS name,
  n.age AS age
...

The query may include SKIP and LIMIT clauses depending on the level of parallelism.

Table 1. Result
<id> <labels> surname name age

0

[Person]

Doe

Jane

40

39

[Person]

Doe

John

42

You can read nodes with multiple labels using the colon as a separator. The colon before the first label is optional.

Example
val df = spark.read
    .format("org.neo4j.spark.DataSource")
    // ":Person:Employee" and "Person:Employee"
    // are equivalent
    .option("labels", ":Person:Employee")
    .load()

df.show()
Example
df = (
    spark.read.format("org.neo4j.spark.DataSource")
    # ":Person:Employee" and "Person:Employee"
    # are equivalent
    .option("labels", ":Person:Employee").load()
)

df.show()

DataFrame columns

The resulting DataFrame contains as many columns as the number of node properties plus two additional columns:

  • <id>: internal Neo4j ID

  • <labels>: list of labels for each node

The schema for the node property columns is inferred as explained in Schema inference.