Read nodes
All the examples in this page assume that the |
With the labels
option, the connector reads data from the Neo4j database as a set of nodes with the given labels.
The connector builds a MATCH
Cypher® query that uses SKIP
and LIMIT
to read a batch of rows.
The code from the example reads the :Person
nodes with their node properties into a DataFrame.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("labels", ":Person")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("labels", ":Person")
.load()
)
df.show()
Equivalent Cypher query
MATCH (n:Person)
RETURN
id(n) AS `<id>`,
labels(n) AS `<labels>`,
n.surname AS surname,
n.name AS name,
n.age AS age
...
The query may include SKIP
and LIMIT
clauses depending on the level of parallelism.
<id> | <labels> | surname | name | age |
---|---|---|---|---|
0 |
[Person] |
Doe |
Jane |
40 |
39 |
[Person] |
Doe |
John |
42 |
You can read nodes with multiple labels using the colon as a separator. The colon before the first label is optional.
val df = spark.read
.format("org.neo4j.spark.DataSource")
// ":Person:Employee" and "Person:Employee"
// are equivalent
.option("labels", ":Person:Employee")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
# ":Person:Employee" and "Person:Employee"
# are equivalent
.option("labels", ":Person:Employee").load()
)
df.show()
DataFrame columns
The resulting DataFrame contains as many columns as the number of node properties plus two additional columns:
-
<id>
: internal Neo4j ID -
<labels>
: list of labels for each node
The schema for the node property columns is inferred as explained in Schema inference.