Write nodes

All the examples in this page assume that the SparkSession has been initialized with the appropriate authentication options. See the Quickstart examples for more details.

With the labels option, the connector writes a DataFrame to the Neo4j database as a set of nodes with the given labels.

The connector builds a CREATE or a MERGE Cypher® query (depending on the save mode) that uses the UNWIND clause to write a batch of rows (an events list with size defined by the batch.size option).

The code from the example creates new nodes with the :Person label.

Example
case class Person(name: String, surname: String, age: Int)

val peopleDF = List(
    Person("John", "Doe", 42),
    Person("Jane", "Doe", 40)
).toDF()

peopleDF.write
    .format("org.neo4j.spark.DataSource")
    .mode(SaveMode.Append)
    .option("labels", ":Person")
    .save()
Example
# Create example DataFrame
peopleDF = spark.createDataFrame(
    [
        {"name": "John", "surname": "Doe", "age": 42},
        {"name": "Jane", "surname": "Doe", "age": 40},
    ]
)

(
    peopleDF.write.format("org.neo4j.spark.DataSource")
    .mode("Append")
    .option("labels", ":Person")
    .save()
)
Equivalent Cypher query
UNWIND $events AS event
CREATE (n:Person)
SET n += event.properties

You can write nodes with multiple labels using the colon as a separator. The colon before the first label is optional.

Example
peopleDF.write
    .format("org.neo4j.spark.DataSource")
    .mode(SaveMode.Append)
    // ":Person:Employee" and "Person:Employee"
    // are equivalent
    .option("labels", ":Person:Employee")
    .save()
Example
(
    peopleDF.write.format("org.neo4j.spark.DataSource")
    .mode("Append")
    .option("labels", ":Person")
    # ":Person:Employee" and "Person:Employee"
    # are equivalent
    .option("labels", ":Person:Employee")
    .save()
)

Node keys

With the Overwrite mode, you must specify the DataFrame columns to use as keys to match the nodes. The node.keys option takes a comma-separated list of key:value pairs, where the key is the DataFrame column name and the value is the node property name.

If key and value are the same, you can omit the value. For example, "name:name,surname:surname" is equivalent to "name,surname".

The same code using the Overwrite save mode:

Overwrite example
df.write
  .format("org.neo4j.spark.DataSource")
  .mode(SaveMode.Overwrite)
  .option("labels", ":Person")
  .option("node.keys", "name,surname")
  .save()
Equivalent Cypher query
UNWIND $events AS event
MERGE (n:Person {
  name: event.keys.name,
  surname: event.keys.surname
})
SET n += event.properties

Due to the concurrency of Spark jobs, when using the Overwrite mode you should use the property uniqueness constraint to guarantee node uniqueness.