Read relationships

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

You can read a relationship and its source and target nodes by specifying the relationship type, the source node labels, and the target node labels.

val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("relationship", "BOUGHT")
  .option("relationship.source.labels", ":Customer")
  .option("relationship.target.labels", ":Product")
  .load()

df.show()
Equivalent Cypher query
MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...

The exact RETURN clause depends on the value of the relationship.nodes.map option.

DataFrame columns

When reading data with this method, the DataFrame contains the following columns:

  • <rel.id>: internal Neo4j ID

  • <rel.type>: relationship type

  • rel.[property name]: relationship properties

Additional columns are added depending on the value of the relationship.nodes.map option:

relationship.nodes.map set to false (default) relationship.nodes.map set to true
  • <source.id>: internal Neo4j ID of source node

  • <source.labels>: list of labels for source node

  • <target.id>: internal Neo4j ID of target node

  • <target.labels>: list of labels for target node

  • source.[property name]: source node properties

  • target.[property name]: target node properties

  • source: map of source node properties

  • target: map of target node properties

Examples:

relationship.nodes.map set to false
val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("relationship", "BOUGHT")
  // It can be omitted, since `false` is the default
  .option("relationship.nodes.map", "false")
  .option("relationship.source.labels", ":Customer")
  .option("relationship.target.labels", ":Product")
  .load()

df.show()
Table 1. Result
<rel.id> <rel.type> <source.id> <source.labels> source.surname source.name source.id <target.id> <target.labels> target.name rel.order rel.quantity

3189

BOUGHT

1100

[Customer]

Doe

John

1

1040

[Product]

Product1

ABC100

200

3190

BOUGHT

1099

[Customer]

Doe

Jane

2

1039

[Product]

Product1

ABC200

100

relationship.nodes.map set to true
val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("relationship", "BOUGHT")
  .option("relationship.nodes.map", "true")
  .option("relationship.source.labels", ":Customer")
  .option("relationship.target.labels", ":Product")
  .load()

// Use `false` to print the whole DataFrame
df.show(false)
Table 2. Result
<rel.id> <rel.type> <source> <target> rel.order rel.quantity

3189

BOUGHT

{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100}
{name: "Product 1", <labels>: ["Product"], <id>: 1040}

ABC100

200

3190

BOUGHT

{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}
{name: "Product 2", <labels>: ["Product"], <id>: 1039}

ABC200

100

The schema for the node and relationship property columns is inferred as explained in Schema inference.

Filtering

You can use the where and filter functions in Spark to filter properties of the relationship, the source node, or the target node. The correct format of the filter depends on the value of relationship.nodes.map option.

relationship.nodes.map set to false (default) relationship.nodes.map set to true
  • `source.[property]` for the source node properties

  • `rel.[property]` for the relationship property

  • `target.[property]` for the target node property

  • `<source>`.`[property]` for the source node map properties

  • `<rel>`.`[property]` for the relationship map property

  • `<target>`.`[property]` for the target node map property

Examples:

relationship.nodes.map set to false
val df = spark.read.format("org.neo4j.spark.DataSource")
  .option("relationship", "BOUGHT")
  .option("relationship.nodes.map", "false")
  .option("relationship.source.labels", ":Customer")
  .option("relationship.target.labels", ":Product")
  .load()

df.where("`source.id` > 1").show()
Table 3. Result
<rel.id> <rel.type> <source.id> <source.labels> source.surname source.name <target.id> <target.labels> target.name rel.order rel.quantity

3190

BOUGHT

1099

[Customer]

Doe

Jane

2

1039

[Product]

Product 2

ABC200

relationship.nodes.map set to true
val df = spark.read.format("org.neo4j.spark.DataSource")
  .option("relationship", "BOUGHT")
  .option("relationship.nodes.map", "true")
  .option("relationship.source.labels", ":Customer")
  .option("relationship.target.labels", ":Product")
  .load()

// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)
Table 4. Result
<rel.id> <rel.type> <source> <target> rel.order rel.quantity

3190

BOUGHT

{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}
{name: "Product 2", <labels>: ["Product"], <id>: 1039}

ABC200

100