Export Apache Parquet

Library Requirements

The Apache Parquet procedures have dependencies on a client library that is not included in the APOC Extended library.

These dependencies are included in apoc-hadoop-dependencies-5.20.0-all.jar, which can be downloaded from the releases page.

Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Available Procedures

The table below describes the available procedures:

Name Description

apoc.export.parquet.all

Exports the full database as a Parquet byte array

apoc.export.parquet.data

Exports the given nodes and relationships as a Parquet byte array

apoc.export.parquet.graph

Exports the given graph as a Parquet byte array

apoc.export.parquet.query

Exports the given Cypher query as a Parquet byte array

apoc.export.parquet.all.stream

Exports the full database as a Parquet file

apoc.export.parquet.data.stream

Exports the given nodes and relationships as a Parquet file

apoc.export.parquet.graph.stream

Exports the given graph as a Parquet file

apoc.export.parquet.query.stream

Exports the given Cypher query as a Parquet file

We can import or load the exported result by using one of these procedures.

Configuration parameters

The procedures support the following config parameters:

Table 1. Config parameters
name type default description

batchSize

long

20000

to update the parquet file / byte array every n results

mapping

Map

20000

to map complex files. See Mapping config section below

Usage

The examples in this section are based on the following sample graph:

CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix);
The following query exports the whole database to the Parquet file test.parquet
CALL apoc.export.parquet.all('test.parquet')
Table 2. Results
file source format nodes relationships properties time rows batchSize batches data

"file:///test.parquet"

"graph: nodes(8), rels(7)"

"parquet"

8

7

0

0

0

20000

0

null

The following procedure exports the specified graph to the Parquet file testData.parquet

MATCH (n:Person)-[r]->()
WITH collect(n) as nodes, collect(r) as rels
call apoc.export.parquet.data(nodes, rels, 'testData.parquet')
YIELD file RETURN file
Table 3. Results
file

"file:///testData.parquet"

The following procedure exports the specified nodes and relationships to a Parquet file

CALL apoc.graph.fromDB('neo4j',{}) YIELD graph
CALL apoc.export.parquet.graph(graph, 'testGraph.parquet')
YIELD file RETURN file
Table 4. Results
file

"file:///testGraph.parquet"

The following procedure exports the specified query result to a Parquet file

CALL apoc.export.parquet.query("MATCH (n:Person) RETURN n", 'testQuery.parquet')
Table 5. Results
file source format nodes relationships properties time rows batchSize batches data

"file:///testQuery.parquet"

"statement: cols(1)"

"parquet"

8

7

0

0

0

20000

0

null

We can also export a Parquet byte array directly as a result by using the apoc.export.parquet.<type>.stream procedures, for example

CALL apoc.export.parquet.all.stream
Table 6. Results
value

<byte_array_parquet_file>