Load / Import Apache Parquet

Library Requirements

The Apache Parquet procedures have dependencies on a client library that is not included in the APOC Extended library.

These dependencies are included in apoc-hadoop-dependencies-5.20.0-all.jar, which can be downloaded from the releases page.

Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Available Procedures

The table below describes the available procedures:

Name Description


Loads parquet from the provided Parquet file or binary


Imports parquet from the provided Parquet file or binary

Similar to the other procedures, the apoc.load.parquet just retrieve the Parquet result, while the apoc.import.parquet create nodes and relationships into the database.

These procedures are intended to be used together with the apoc.export.parquet.* procedures.

Configuration parameters

The procedures support the following config parameters:

Table 1. Config parameters
name type default description




the transaction batch size




to map complex files. See Mapping config section below


Given the following sample graph:

CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),

if we create a test.parquet via a CALL apoc.export.parquet.all('test.parquet') procedure, we can load the result by using:

CALL apoc.load.parquet('test.parquet')
Table 2. Results

{id: 0, tagline: "Welcome to the Real World", title: "The Matrix", released: 1999, labels: ["Movie"]

{id: 1, born: 1964, name: "Keanu Reeves", labels: ["Person"]}

{id: 2, born: 1967, name: "Carrie-Anne Moss", labels: ["Person"]}

{id: 3, born: 1961, name: "Laurence Fishburne", labels: ["Person"]}

{id: 4, born: 1960, name: "Hugo Weaving", labels: ["Person"]}

{id: 5, born: 1967, name: "Lilly Wachowski", labels: ["Person"]}

{id: 6, born: 1965, name: "Lana Wachowski", labels: ["Person"]}

{id: 7, born: 1952, name: "Joel Silver", labels: ["Person"]}

{type: "ACTED_IN", roles: ["Neo"], target_id: 0, __source_id: 1}

{type: "ACTED_IN", roles: ["Trinity"], target_id: 0, __source_id: 2}

{type: "ACTED_IN", roles: ["Morpheus"], target_id: 0, __source_id: 3}

{type: "ACTED_IN", roles: ["Agent Smith"], target_id: 0, __source_id: 4}

{type: "DIRECTED", target_id: 0, __source_id: 5}

{type: "DIRECTED", target_id: 0, __source_id: 6}

{type: "PRODUCED", target_id: 0, __source_id: 7}

Otherwise, we can re-import the test.parquet nodes/relationships by using:

CALL apoc.load.parquet('test.parquet')
Table 3. Results
file source format nodes relationships properties time rows batchSize batches data












The above procedure can also load/import from a Parquet byte array procuced by e.g. a CALL apoc.export.parquet.all.stream procedure. For example, the following procedures will produce the same results as the above ones:

Load procedure
// create a byte array
call apoc.export.parquet.all.stream()
YIELD value with value as bytes
// load the byte array
call apoc.load.parquet(bytes)
YIELD value return value
Import procedure
// create a byte array
CALL apoc.export.parquet.all.stream()
YIELD value with value as bytes
// import the byte array
CALL apoc.import.parquet(bytes)
YIELD source return source

Mapping config

In order to import complex types not supported by Parquet, like Point, Duration, List of Duration, etc.. we can use the mapping config to convert to the desired data type. For example, if we have a node (:MyLabel {durationProp: duration('P5M1.5D')}, and we export it in a parquet file/binary, we can import it by expliciting a map with key the property key, and value the property type.

That is in this example, by using the load procedure:

CALL apoc.load.parquet(fileOrBinary, {mapping: {durationProp: 'Duration'}})

Or with the import procedure:

CALL apoc.import.parquet(fileOrBinary, {mapping: {durationProp: 'Duration'}})

The mapping value types can be one of the following:

  • Point

  • LocalDateTime

  • LocalTime

  • DateTime

  • Time

  • Date

  • Duration

  • Char

  • Byte

  • Double

  • Float

  • Short

  • Int

  • Long

  • Node

  • Relationship

  • BaseType followed by Array, to map a list of values, where BaseType can be one of the previous type, for example DurationArray