Define a schema

There are two alternatives to using schema inference:

Use the string strategy.
Define a custom schema.

Use the `string` strategy

When you set the schema.strategy option to string, every DataFrame column is assigned the String type.

string strategy example

val df = spark.read
  .format("org.neo4j.spark.DataSource")
  .option("schema.strategy", "string")
  .option("query", "MATCH (n:Person) WITH n LIMIT 2 RETURN id(n) as id, n.age as age")
  .load()

This strategy is useful when property types may differ, for example when a property accepts both number and string values.

Define a custom schema

If you need more control, you can provide your own schema using the .schema() method.

Custom schema example

import org.apache.spark.sql.types.{DataTypes, StructType, StructField}

val userSchema = StructType(
  Array(
    StructField("id", DataTypes.StringType),
    StructField("age", DataTypes.StringType)
  )
)

spark.read.format("org.neo4j.spark.DataSource")
  .schema(userSchema)
  .option("query", "MATCH (n:Person) WITH n LIMIT 2 RETURN id(n) as id, n.age as age")
  .load()

The user-defined schema only works if all the values of a property can be converted to the desired type.

If you need to convert only some of the values, use the string strategy and some custom Scala or Python code.

Type conversion example

import scala.jdk.CollectionConverters._

val result = df.collectAsList()

for (row <- result.asScala) {
  // if <some specific condition> then convert like below
  println(s"""Age is: ${row.getString(0).toLong}""")
}

Define a schema

Use the string strategy

Define a custom schema

Use the `string` strategy