Export Apache Parquet
Library Requirements
The Apache Parquet procedures have dependencies on a client library that is not included in the APOC Extended library.
These dependencies are included in apoc-hadoop-dependencies-5.26.0-all.jar, which can be downloaded from the releases page.
Once that file is downloaded, it should be placed in the plugins
directory and the Neo4j Server restarted.
Available Procedures
The table below describes the available procedures:
Name | Description |
---|---|
apoc.export.parquet.all |
Exports the full database as a Parquet byte array |
apoc.export.parquet.data |
Exports the given nodes and relationships as a Parquet byte array |
apoc.export.parquet.graph |
Exports the given graph as a Parquet byte array |
apoc.export.parquet.query |
Exports the given Cypher query as a Parquet byte array |
apoc.export.parquet.all.stream |
Exports the full database as a Parquet file |
apoc.export.parquet.data.stream |
Exports the given nodes and relationships as a Parquet file |
apoc.export.parquet.graph.stream |
Exports the given graph as a Parquet file |
apoc.export.parquet.query.stream |
Exports the given Cypher query as a Parquet file |
We can import or load the exported result by using one of these procedures. |
Configuration parameters
The procedures support the following config parameters:
name | type | default | description |
---|---|---|---|
batchSize |
long |
20000 |
to update the parquet file / byte array every n results |
mapping |
Map |
20000 |
to map complex files. See |
Usage
The examples in this section are based on the following sample graph:
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'})
CREATE (Keanu:Person {name:'Keanu Reeves', born:1964})
CREATE (Carrie:Person {name:'Carrie-Anne Moss', born:1967})
CREATE (Laurence:Person {name:'Laurence Fishburne', born:1961})
CREATE (Hugo:Person {name:'Hugo Weaving', born:1960})
CREATE (LillyW:Person {name:'Lilly Wachowski', born:1967})
CREATE (LanaW:Person {name:'Lana Wachowski', born:1965})
CREATE (JoelS:Person {name:'Joel Silver', born:1952})
CREATE
(Keanu)-[:ACTED_IN {roles:['Neo']}]->(TheMatrix),
(Carrie)-[:ACTED_IN {roles:['Trinity']}]->(TheMatrix),
(Laurence)-[:ACTED_IN {roles:['Morpheus']}]->(TheMatrix),
(Hugo)-[:ACTED_IN {roles:['Agent Smith']}]->(TheMatrix),
(LillyW)-[:DIRECTED]->(TheMatrix),
(LanaW)-[:DIRECTED]->(TheMatrix),
(JoelS)-[:PRODUCED]->(TheMatrix);
test.parquet
CALL apoc.export.parquet.all('test.parquet')
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | data |
---|---|---|---|---|---|---|---|---|---|---|
"file:///test.parquet" |
"graph: nodes(8), rels(7)" |
"parquet" |
8 |
7 |
0 |
0 |
0 |
20000 |
0 |
null |
The following procedure exports the specified graph to the Parquet file testData.parquet
MATCH (n:Person)-[r]->()
WITH collect(n) as nodes, collect(r) as rels
call apoc.export.parquet.data(nodes, rels, 'testData.parquet')
YIELD file RETURN file
file |
---|
"file:///testData.parquet" |
The following procedure exports the specified nodes and relationships to a Parquet file
CALL apoc.graph.fromDB('neo4j',{}) YIELD graph
CALL apoc.export.parquet.graph(graph, 'testGraph.parquet')
YIELD file RETURN file
file |
---|
"file:///testGraph.parquet" |
The following procedure exports the specified query result to a Parquet file
CALL apoc.export.parquet.query("MATCH (n:Person) RETURN n", 'testQuery.parquet')
file | source | format | nodes | relationships | properties | time | rows | batchSize | batches | data |
---|---|---|---|---|---|---|---|---|---|---|
"file:///testQuery.parquet" |
"statement: cols(1)" |
"parquet" |
8 |
7 |
0 |
0 |
0 |
20000 |
0 |
null |
We can also export a Parquet byte array directly as a result by using the apoc.export.parquet.<type>.stream
procedures, for example
CALL apoc.export.parquet.all.stream
value |
---|
<byte_array_parquet_file> |