Read nodes

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

With the labels option, the connector reads data from the Neo4j database as a set of nodes with the given labels.

The connector builds a MATCH Cypher^® query that uses SKIP and LIMIT to read a batch of rows.

The code from the example reads the :Person nodes with their node properties into a DataFrame.

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()

df.show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("labels", ":Person")
    .load()
)

df.show()

Equivalent Cypher query

MATCH (n:Person)
RETURN
  id(n) AS `<id>`,
  labels(n) AS `<labels>`,
  n.surname AS surname,
  n.name AS name,
  n.age AS age
...

The query may include SKIP and LIMIT clauses depending on the level of parallelism.

Table 1. Result
<id>	<labels>	surname	name	age
0	[Person]	Doe	Jane	40
39	[Person]	Doe	John	42

You can read nodes with multiple labels using the colon as a separator. The colon before the first label is optional.

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    // ":Person:Employee" and "Person:Employee"
    // are equivalent
    .option("labels", ":Person:Employee")
    .load()

df.show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    # ":Person:Employee" and "Person:Employee"
    # are equivalent
    .option("labels", ":Person:Employee").load()
)

df.show()

DataFrame columns

The resulting DataFrame contains as many columns as the number of node properties plus two additional columns:

<id>: internal Neo4j ID
<labels>: list of labels for each node

The schema for the node property columns is inferred as explained in Schema inference.