Export Apache Parquet - APOC Extended Documentation

Library Requirements

The Apache Parquet procedures have dependencies on a client library that is not included in the APOC Extended library.

These dependencies are included in apoc-hadoop-dependencies-5.26.3-all.jar, which can be downloaded from the releases page.

Once that file is downloaded, it should be placed in the plugins directory and the Neo4j Server restarted.

Available Procedures

The table below describes the available procedures:

Name	Description
apoc.export.parquet.all	Exports the full database as a Parquet byte array
apoc.export.parquet.data	Exports the given nodes and relationships as a Parquet byte array
apoc.export.parquet.graph	Exports the given graph as a Parquet byte array
apoc.export.parquet.query	Exports the given Cypher query as a Parquet byte array
apoc.export.parquet.all.stream	Exports the full database as a Parquet file
apoc.export.parquet.data.stream	Exports the given nodes and relationships as a Parquet file
apoc.export.parquet.graph.stream	Exports the given graph as a Parquet file
apoc.export.parquet.query.stream	Exports the given Cypher query as a Parquet file

Name

Description

apoc.export.parquet.all

Exports the full database as a Parquet byte array

apoc.export.parquet.data

Exports the given nodes and relationships as a Parquet byte array

apoc.export.parquet.graph

Exports the given graph as a Parquet byte array

apoc.export.parquet.query

Exports the given Cypher query as a Parquet byte array

apoc.export.parquet.all.stream

Exports the full database as a Parquet file

apoc.export.parquet.data.stream

Exports the given nodes and relationships as a Parquet file

apoc.export.parquet.graph.stream

Exports the given graph as a Parquet file

apoc.export.parquet.query.stream

Exports the given Cypher query as a Parquet file

We can import or load the exported result by using one of these procedures.

Configuration parameters

The procedures support the following config parameters:

Table 1. Config parameters
name	type	default	description
batchSize	long	20000	to update the parquet file / byte array every n results
mapping	Map	20000	to map complex files. See `Mapping config` section below

Usage

The examples in this section are based on the following sample graph:

CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix);

The following query exports the whole database to the Parquet file test.parquet

CALL apoc.export.parquet.all('test.parquet')

Table 2. Results
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	data
"file:///test.parquet"	"graph: nodes(8), rels(7)"	"parquet"	8	7	0	0	0	20000	0	null

The following procedure exports the specified graph to the Parquet file testData.parquet

MATCH (n:Person)-[r]->()
WITH collect(n) as nodes, collect(r) as rels
call apoc.export.parquet.data(nodes, rels, 'testData.parquet')
YIELD file RETURN file

Table 3. Results
file
"file:///testData.parquet"

The following procedure exports the specified nodes and relationships to a Parquet file

CALL apoc.graph.fromDB('neo4j',{}) YIELD graph
CALL apoc.export.parquet.graph(graph, 'testGraph.parquet')
YIELD file RETURN file

Table 4. Results
file
"file:///testGraph.parquet"

The following procedure exports the specified query result to a Parquet file

CALL apoc.export.parquet.query("MATCH (n:Person) RETURN n", 'testQuery.parquet')

Table 5. Results
file	source	format	nodes	relationships	properties	time	rows	batchSize	batches	data
"file:///testQuery.parquet"	"statement: cols(1)"	"parquet"	8	7	0	0	0	20000	0	null

We can also export a Parquet byte array directly as a result by using the apoc.export.parquet.<type>.stream procedures, for example

CALL apoc.export.parquet.all.stream

Table 6. Results
value
<byte_array_parquet_file>