Read relationships
|
All the examples in this page assume that the |
You can read a relationship and its source and target nodes by specifying the relationship type, the source node labels, and the target node labels.
val df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
Equivalent Cypher query
MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...
The exact RETURN clause depends on the value of the relationship.nodes.map option.
DataFrame columns
When reading data with this method, the DataFrame contains the following columns:
-
<rel.id>: internal Neo4j ID -
<rel.type>: relationship type -
rel.[property name]: relationship properties
Additional columns are added depending on the value of the relationship.nodes.map option:
relationship.nodes.map set to false (default) |
relationship.nodes.map set to true |
|---|---|
|
|
relationship.nodes.map set to falseval df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
// It can be omitted, since `false` is the default
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
# It can be omitted, since `false` is the default
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.show()
| <rel.id> | <rel.type> | <source.id> | <source.labels> | source.surname | source.name | source.id | <target.id> | <target.labels> | target.name | rel.order | rel.quantity |
|---|---|---|---|---|---|---|---|---|---|---|---|
3189 |
BOUGHT |
1100 |
[Customer] |
Doe |
John |
1 |
1040 |
[Product] |
Product 1 |
ABC100 |
200 |
3190 |
BOUGHT |
1099 |
[Customer] |
Doe |
Jane |
2 |
1039 |
[Product] |
Product 2 |
ABC200 |
100 |
relationship.nodes.map set to trueval df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
// Use `false` to print the whole DataFrame
df.show(false)
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
# Use `false` to print the whole DataFrame
df.show(truncate=False)
| <rel.id> | <rel.type> | <source> | <target> | rel.order | rel.quantity |
|---|---|---|---|---|---|
3189 |
BOUGHT |
{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100} |
{name: "Product 1", <labels>: ["Product"], <id>: 1040} |
ABC100 |
200 |
3190 |
BOUGHT |
{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099} |
{name: "Product 2", <labels>: ["Product"], <id>: 1039} |
ABC200 |
100 |
The schema for the node and relationship property columns is inferred as explained in Schema inference.
Filtering
You can use the where and filter functions in Spark to filter properties of the relationship, the source node, or the target node.
The correct format of the filter depends on the value of relationship.nodes.map option.
relationship.nodes.map set to false (default) |
relationship.nodes.map set to true |
|---|---|
|
|
Examples:
relationship.nodes.map set to falseval df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
df.where("`source.id` > 1").show()
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "false")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
df.where("`source.id` > 1").show()
| <rel.id> | <rel.type> | <source.id> | <source.labels> | source.surname | source.name | source.id | <target.id> | <target.labels> | target.name | rel.order | rel.quantity |
|---|---|---|---|---|---|---|---|---|---|---|---|
3190 |
BOUGHT |
1099 |
[Customer] |
Doe |
Jane |
2 |
1039 |
[Product] |
Product 2 |
ABC200 |
100 |
relationship.nodes.map set to trueval df = spark.read
.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)
df = (
spark.read.format("org.neo4j.spark.DataSource")
.option("relationship", "BOUGHT")
.option("relationship.nodes.map", "true")
.option("relationship.source.labels", ":Customer")
.option("relationship.target.labels", ":Product")
.load()
)
# Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(truncate=False)
| <rel.id> | <rel.type> | <source> | <target> | rel.order | rel.quantity |
|---|---|---|---|---|---|
3190 |
BOUGHT |
{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099} |
{name: "Product 2", <labels>: ["Product"], <id>: 1039} |
ABC200 |
100 |