Read relationships

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

You can read a relationship and its source and target nodes by specifying the relationship type, the source node labels, and the target node labels.

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()

Equivalent Cypher query

MATCH (source:Customer)
MATCH (target:Product)
MATCH (source)-[rel:BOUGHT]->(target)
RETURN ...

The exact RETURN clause depends on the value of the relationship.nodes.map option.

DataFrame columns

When reading data with this method, the DataFrame contains the following columns:

<rel.id>: internal Neo4j ID
<rel.type>: relationship type
rel.[property name]: relationship properties

Additional columns are added depending on the value of the relationship.nodes.map option:

relationship.nodes.map set to false (default) relationship.nodes.map set to true

`relationship.nodes.map` set to `false` (default)	`relationship.nodes.map` set to `true`
`<source.id>`: internal Neo4j ID of source node `<source.labels>`: list of labels for source node `<target.id>`: internal Neo4j ID of target node `<target.labels>`: list of labels for target node `source.[property name]`: source node properties `target.[property name]`: target node properties	`source`: map of source node properties `target`: map of target node properties

<source.id>: internal Neo4j ID of source node
<source.labels>: list of labels for source node
<target.id>: internal Neo4j ID of target node
<target.labels>: list of labels for target node
source.[property name]: source node properties
target.[property name]: target node properties

source: map of source node properties
target: map of target node properties

Example 1. relationship.nodes.map set to false

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    // It can be omitted, since `false` is the default
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    # It can be omitted, since `false` is the default
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.show()

Table 1. Result
<rel.id>	<rel.type>	<source.id>	<source.labels>	source.surname	source.name	source.id	<target.id>	<target.labels>	target.name	rel.order	rel.quantity
3189	BOUGHT	1100	[Customer]	Doe	John	1	1040	[Product]	Product 1	ABC100	200
3190	BOUGHT	1099	[Customer]	Doe	Jane	2	1039	[Product]	Product 2	ABC200	100

Example 2. relationship.nodes.map set to true

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

// Use `false` to print the whole DataFrame
df.show(false)

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

# Use `false` to print the whole DataFrame
df.show(truncate=False)

Table 2. Result
<rel.id>	<rel.type>	<source>	<target>	rel.order	rel.quantity
3189	BOUGHT	{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100}	{name: "Product 1", <labels>: ["Product"], <id>: 1040}	ABC100	200
3190	BOUGHT	{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}	{name: "Product 2", <labels>: ["Product"], <id>: 1039}	ABC200	100

Table 2. Result

<rel.id>

<rel.type>

rel.order

rel.quantity

3189

BOUGHT

{surname: "Doe", name: "John", id: 1, <labels>: ["Customer"], <id>: 1100}

{name: "Product 1", <labels>: ["Product"], <id>: 1040}

ABC100

200

3190

BOUGHT

{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}

{name: "Product 2", <labels>: ["Product"], <id>: 1039}

ABC200

100

The schema for the node and relationship property columns is inferred as explained in Schema inference.

Filtering

You can use the where and filter functions in Spark to filter properties of the relationship, the source node, or the target node. The correct format of the filter depends on the value of relationship.nodes.map option.

relationship.nodes.map set to false (default) relationship.nodes.map set to true

`relationship.nodes.map` set to `false` (default)	`relationship.nodes.map` set to `true`
`source.[property]` for the source node properties `rel.[property]` for the relationship property `target.[property]` for the target node property	`<source>`.`[property]` for the source node map properties `<rel>`.`[property]` for the relationship map property `<target>`.`[property]` for the target node map property

`source.[property]` for the source node properties
`rel.[property]` for the relationship property
`target.[property]` for the target node property

`<source>`.`[property]` for the source node map properties
`<rel>`.`[property]` for the relationship map property
`<target>`.`[property]` for the target node map property

Examples:

Example 3. relationship.nodes.map set to false

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

df.where("`source.id` > 1").show()

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "false")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

df.where("`source.id` > 1").show()

Table 3. Result
<rel.id>	<rel.type>	<source.id>	<source.labels>	source.surname	source.name	source.id	<target.id>	<target.labels>	target.name	rel.order	rel.quantity
3190	BOUGHT	1099	[Customer]	Doe	Jane	2	1039	[Product]	Product 2	ABC200	100

Example 4. relationship.nodes.map set to true

Example

val df = spark.read
    .format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()

// Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(false)

Example

df = (
    spark.read.format("org.neo4j.spark.DataSource")
    .option("relationship", "BOUGHT")
    .option("relationship.nodes.map", "true")
    .option("relationship.source.labels", ":Customer")
    .option("relationship.target.labels", ":Product")
    .load()
)

# Use `false` to print the whole DataFrame
df.where("`<source>`.`id` > 1").show(truncate=False)

Table 4. Result
<rel.id>	<rel.type>	<source>	<target>	rel.order	rel.quantity
3190	BOUGHT	{surname: "Doe", name: "Jane", id: 2, <labels>: ["Customer"], <id>: 1099}	{name: "Product 2", <labels>: ["Product"], <id>: 1039}	ABC200	100