Export operations
1. Create Neo4j databases from named graphs
We can create new Neo4j databases from named in-memory graphs stored in the graph catalog.
All nodes, relationships and properties present in an in-memory graph are written to a new Neo4j database.
This includes data that has been projected in gds.graph.create
and data that has been added by running algorithms in mutate
mode.
The newly created database will be stored in the Neo4j databases
directory using a given database name.
The feature is useful in the following, exemplary scenarios:
-
Avoid heavy write load on the operational system by exporting the data instead of writing back.
-
Create an analytical view of the operational system that can be used as a basis for running algorithms.
-
Produce snapshots of analytical results and persistent them for archiving and inspection.
-
Share analytical results within the organization.
1.1. Syntax
CALL gds.graph.export(graphName: String, configuration: Map)
YIELD
dbName: String,
graphName: String,
nodeCount: Integer,
nodePropertyCount: Integer,
relationshipCount: Integer,
relationshipTypeCount: Integer,
relationshipPropertyCount: Integer,
writeMillis: Integer
Name | Type | Optional | Description |
---|---|---|---|
graphName |
String |
no |
The name under which the graph is stored in the catalog. |
configuration |
Map |
no |
Additional parameters to configure the database export. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
dbName |
String |
|
No |
The name of the exported Neo4j database. |
writeConcurrency |
Boolean |
|
yes |
The number of concurrent threads used for writing the database. |
enableDebugLog |
Boolean |
|
yes |
Prints debug information to Neo4j log files. |
batchSize |
Integer |
|
yes |
Number of entities processed by one single thread at a time. |
defaultRelationshipType |
String |
|
yes |
Relationship type used for |
additionalNodeProperties |
String, List or Map |
|
yes |
Allows for exporting additional node properties from the original graph backing the in-memory graph. |
Name | Type | Description |
---|---|---|
dbName |
String |
The name of the exported Neo4j database. |
graphName |
String |
The name under which the graph is stored in the catalog. |
nodeCount |
Integer |
The number of nodes exported. |
nodePropertyCount |
Integer |
The number of node properties exported. |
relationshipCount |
Integer |
The number of relationships exported. |
relationshipTypeCount |
Integer |
The number of relationship types exported. |
relationshipPropertyCount |
Integer |
The number of relationship properties exported. |
writeMillis |
Integer |
Milliseconds for writing the graph into the new database. |
1.2. Example
my-graph
from GDS into a Neo4j database called mydatabase
:CALL gds.graph.export('my-graph', { dbName: 'mydatabase' })
The new database can be started using databases management commands.
The database must not exist when using the export procedure. It needs to be created manually using the following commands. |
:use system
CREATE DATABASE mydatabase;
:use mydatabase
MATCH (n) RETURN n;
1.3. Example with additional node properties
Suppose we have a graph my-db-graph
in the Neo4j database that has a string node property myproperty
, and that we have a corresponding in-memory graph called my-in-memory-graph
which does not have the myproperty
node property.
If we want to export my-in-memory-graph
but additionally add the myproperty
properties from my-db-graph
we can use the additionalProperties
configuration parameter.
my-in-memory-graph
from GDS with myproperty
from my-db-graph
into a Neo4j database called mydatabase
:CALL gds.graph.export('my-graph', { dbName: 'mydatabase', additionalNodeProperties: ['myproperty']})
The new database can be started using databases management commands.
The original database ( |
The additionalNodeProperties
parameter uses the same syntax as nodeProperties
of the graph create procedure.
So we could for instance define a default value for our myproperty
.
my-in-memory-graph
from GDS with myproperty
from my-db-graph
with default value into a Neo4j database called mydatabase
:CALL gds.graph.export('my-graph', { dbName: 'mydatabase', additionalNodeProperties: [{ myproperty: {defaultValue: 'my-default-value'}}] })
2. Export a named graph to CSV
We can export named in-memory graphs stored in the graph catalog to a set of CSV files.
All nodes, relationships and properties present in an in-memory graph are exported.
This includes data that has been projected with gds.graph.create
and data that has been added by running algorithms in mutate
mode.
The location of the exported CSV files can be configured via the configuration parameter gds.export.location
in the neo4j.conf
.
All files will be stored in a subfolder using the specified export name.
The export will fail if a folder with the given export name already exists.
The |
2.1. Syntax
CALL gds.beta.graph.export.csv(graphName: String, configuration: Map)
YIELD
graphName: String,
exportName: String,
nodeCount: Integer,
nodePropertyCount: Integer,
relationshipCount: Integer,
relationshipTypeCount: Integer,
relationshipPropertyCount: Integer,
writeMillis: Integer
Name | Type | Optional | Description |
---|---|---|---|
graphName |
String |
no |
The name under which the graph is stored in the catalog. |
configuration |
Map |
no |
Additional parameters to configure the database export. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
exportName |
String |
|
No |
The name of the directory where the graph is exported to. The absolute path of the exported CSV files depends on the configuration parameter |
writeConcurrency |
Boolean |
|
yes |
The number of concurrent threads used for writing the database. |
defaultRelationshipType |
String |
|
yes |
Relationship type used for |
additionalNodeProperties |
String, List or Map |
|
yes |
Allows for exporting additional node properties from the original graph backing the in-memory graph. |
Name | Type | Description |
---|---|---|
graphName |
String |
The name under which the graph is stored in the catalog. |
exportName |
String |
The name of the directory where the graph is exported to. |
nodeCount |
Integer |
The number of nodes exported. |
nodePropertyCount |
Integer |
The number of node properties exported. |
relationshipCount |
Integer |
The number of relationships exported. |
relationshipTypeCount |
Integer |
The number of relationship types exported. |
relationshipPropertyCount |
Integer |
The number of relationship properties exported. |
writeMillis |
Integer |
Milliseconds for writing the graph into the new database. |
2.2. Estimation
As many other procedures in GDS, export to csv has an estimation mode. For more details see Memory Estimation.
Using the gds.beta.graph.export.csv.estimate
procedure, it is possible to estimate the required disk space of the exported CSV files.
The estimation uses sampling to generate a more accurate estimate.
CALL gds.beta.graph.export.csv.estimate(graphName:String, configuration: Map)
YIELD
nodeCount: Integer,
relationshipCount: Integer,
requiredMemory: String,
treeView: String,
mapView: Map,
bytesMin: Integer,
bytesMax: Integer,
heapPercentageMin: Float,
heapPercentageMax: Float;
Name | Type | Optional | Description |
---|---|---|---|
graphName |
String |
no |
The name under which the graph is stored in the catalog. |
configuration |
Map |
no |
Additional parameters to configure the database export. |
Name | Type | Default | Optional | Description |
---|---|---|---|---|
exportName |
String |
|
no |
Name of the folder the exported CSV files are saved at. |
samplingFactor |
Double |
|
yes |
The fraction of nodes and relationships to sample for the estimation. |
writeConcurrency |
Boolean |
|
yes |
The number of concurrent threads used for writing the database. |
defaultRelationshipType |
String |
|
yes |
Relationship type used for |
Name | Type | Description |
---|---|---|
|
Integer |
The number of nodes in the graph. |
|
Integer |
The number of relationships in the graph. |
|
String |
An estimation of the required memory in a human readable format. |
|
String |
A more detailed representation of the required memory, including estimates of the different components in human readable format. |
|
Map |
A more detailed representation of the required memory, including estimates of the different components in structured format. |
|
Integer |
The minimum number of bytes required. |
|
Integer |
The maximum number of bytes required. |
|
Float |
The minimum percentage of the configured maximum heap required. |
|
Float |
The maximum percentage of the configured maximum heap required. |
2.3. Export format
The format of the exported CSV files is based on the format that is supported by the Neo4j Admin import command.
2.3.1. Nodes
Nodes are exported into files grouped by the nodes labels, i.e., for every label combination that exists in the graph a set of export files is created.
The naming schema of the exported files is: nodes_LABELS_INDEX.csv
, where:
-
LABELS
is the ordered list of labels joined by_
. -
INDEX
is a number between 0 and concurrency.
For each label combination one or more data files are created, as each exporter thread exports into a separate file.
Additionally, each label combination produces a single header file, which contains a single line describing the columns in the data files More information about the header files can be found here: CSV header format.
For example a Graph with the node combinations :A
, :B
and :A:B
might create the following files
nodes_A_header.csv nodes_A_0.csv nodes_B_header.csv nodes_B_0.csv nodes_B_2.csv nodes_A_B_header.csv nodes_A_B_0.csv nodes_A_B_1.csv nodes_A_B_2.csv
2.3.2. Relationships
The format of the relationship files is similar to those of the nodes.
Relationships are exported into files grouped by the relationship type.
The naming schema of the exported files is: relationships_TYPE_INDEX.csv
, where:
-
TYPE
is the relationship type -
INDEX
is a number between 0 and concurrency.
For each relationship type one or more data files are created, as each exporter thread exports into a separate file.
Additionally, each relationship type produces a single header file, which contains a single line describing the columns in the data files.
For example a Graph with the relationship types :KNOWS
, :LIVES_IN
might create the following files
relationships_KNOWS_header.csv relationships_KNOWS_0.csv relationships_LIVES_IN_header.csv relationships_LIVES_IN_0.csv relationships_LIVES_IN_2.csv
2.4. Example
my-graph
from GDS into a directory my-export
:CALL gds.beta.graph.export.csv('my-graph', { exportName: 'my-export' })
2.5. Example with additional node properties
Suppose we have a graph my-db-graph
in the Neo4j database that has a string node property myproperty
, and that we have a corresponding in-memory graph called my-in-memory-graph
which does not have the myproperty
node property.
If we want to export my-in-memory-graph
but additionally add the myproperty
properties from my-db-graph
we can use the additionalProperties
configuration parameter.
my-in-memory-graph
from GDS with the myproperty
from my-db-graph
into a directory my-export
:CALL gds.beta.graph.export.csv('my-graph', { exportName: 'my-export', additionalNodeProperties: ['myproperty']})
The original database ( |
The additionalNodeProperties
parameter uses the same syntax as nodeProperties
of the graph create procedure.
So we could for instance define a default value for our myproperty
.
my-in-memory-graph
from GDS with myproperty
from my-db-graph
with default value into a directory called my-export
:CALL gds.beta.graph.export.csv('my-graph', { exportName: 'my-export', additionalNodeProperties: [{ myproperty: {defaultValue: 'my-default-value'}}] })
Was this page helpful?