Copy a database store
You can use the neo4j-admin database copy
command to copy a database, create a compacted/defragmented copy of a database, or clean up database inconsistencies.
neo4j-admin database copy
reclaims the unused space, creates a defragmented copy of the data store, and creates the node label and relationship type lookup indexes.
Any additional schema (indexes and constraints) defined are not included.
However, the command will output Cypher statements, which you can run to recreate the indexes and constraints.
Command limitations
|
Command
neo4j-admin database copy
copies the data store of an existing offline database to a new database.
Syntax
neo4j-admin database copy [--verbose]
[--from-path-data=<path> --from-path-txn=<path>]
[--to-path-data=<path> --to-path-txn=<path>]
[--to-path-schema=<path>]
[--force]
[--compact-node-store[=true|false]]
[--to-format=<format>]
[--ignore-nodes-with-labels=<label>[,<label>...]]
[--ignore-relationships-with-types=<type>[,<type>...]]
[--copy-only-node-properties=<label.property>[,<label.property>...]]
[--copy-only-nodes-with-labels=<label>[,<label>...]]
[--copy-only-relationship-properties=<relationship.property>[,<relationship.property>...]]
[--copy-only-relationships-with-types=<type>[,<type>...]]
[--skip-labels=<label>[,<label>...]]
[--skip-node-properties=<label.property>[,<label.property>...]]
[--skip-properties=<property>[,<property>...]]
[--skip-relationship-properties=<relationship.property>[,<relationship.property>...]]
[--from-pagecache=<size>]
[--temp-path=<path>]
<fromDatabase>
<toDatabase>
Required parameters
To use this command, you must specify the parameters <toDatabase>
and <fromDatabase>
.
From Neo4j v5.5, you can use the same values for <fromDatabase>
and <toDatabase>
if you do not need an actual copy of the database.
The command will replace the original database with the newly created copy.
-
<fromDatabase>
— Name of the source database. -
<toDatabase>
— Name of the target database. If the same as<fromDatabase>
, it is copied to a temporary location, by default the current working directory or the path as defined by--temp-path
, before being moved to replace the original.
Optional parameters
Option | Description |
---|---|
|
Enable verbose output. |
|
Path to the /databases directory, containing the directory of the source database. It can be used to target databases outside of the installation. Default: |
|
Path to the /transactions directory, containing the transaction directory of the source database. |
|
Path to the /databases directory, containing the directory of the target database. Default: |
|
Path to the /transactions directory, containing the transaction directory of the target database. |
|
Path to the directory where the schema commands file will be created. Default is the current directory. |
|
Force the command to proceed even if the integrity of the database can not be verified. |
|
Enforce node store compaction. By default, the node store is not compacted on copy since it changes the node IDs. |
|
Set the format for the new database. Valid values are Default: The format of the source database. |
|
A comma-separated list of labels. Nodes that have any of the specified labels will not be included in the copy.
Cannot be combined with |
|
A comma-separated list of relationship types. Relationships with any of the specified relationship types will not be included in the copy.
Cannot be combined with |
|
A comma-separated list of property keys to include in the copy for nodes with the specified label. Nodes whose labels are not explicitly mentioned in the list will have all their properties included in the copy.
Cannot be combined with |
|
A comma-separated list of labels. All nodes that have any of the specified labels will be included in the copy.
Cannot be combined with |
|
A comma-separated list of property keys to include in the copy for relationships with the specified type. Relationship types that are not explicitly mentioned will have all their properties included in the copy.
Cannot be combined with |
|
A comma-separated list of relationship types. All relationships with any of the specified types will be included in the copy.
Cannot be combined with |
|
A comma-separated list of labels to ignore during the copy. |
|
A comma-separated list of property keys to ignore for nodes with the specified label. Cannot be combined with |
|
A comma-separated list of property keys to ignore during the copy. Cannot be combined with |
|
A comma-separated list of property keys to ignore for relationships with the specified type. Cannot be combined with |
|
The size of the page cache to use for reading. |
|
Path to a directory to be used as a staging area when the source and target databases are the same. Default: The current directory. |
You can use the |
Examples
Copying the data store of a database
You can use neo4j-admin database copy
to copy the data store of a database, for example, neo4j
.
-
Stop the database named
neo4j
:STOP DATABASE neo4j
-
Copy the data store from
neo4j
to a new database calledcopy
:bin/neo4j-admin database copy neo4j copy
-
Run the following command to verify that the database has been successfully copied.
ls -al ../data/databases
Copying a database does not automatically create it. Therefore, it will not be visible if you do
SHOW DATABASES
at this point. -
Create the copied database.
CREATE DATABASE copy
-
Verify that the
copy
database is online.SHOW DATABASES
-
If your original database has a schema defined, change your active database to
copy
and recreate the schema using the schema commands saved in the file <database-name>-schema.cypher.--to-path-schema
can be used to specify a different directory for the schema file.
Filtering data while copying a database
You can use neo4j-admin database copy
to filter out any unwanted data while copying a database, for example, by removing nodes, labels, properties, and relationships.
bin/neo4j-admin database copy neo4j copy --ignore-nodes-with-labels="Cat,Dog"
The command creates a copy of the database neo4j
but without the nodes with the labels :Cat
and :Dog
.
Labels are processed independently, i.e., the filter ignores any node with a label |
For a detailed example of how to use |
Further compacting an existing database
You can use the command neo4j-admin database copy
with the argument -compact-node-store
to further compact the store of an existing database.
This example uses the same values for <toDatabase>
and <fromDatabase>
, which means that the command will compact the database in place by creating a new version of the database.
After running the command, you need to recreate the indexes using the generated script.
If the database belongs to a cluster, you also need to reseed the cluster.
Note that even though it is only one database copy in the end, you still need double the space during the operation. |
-
Stop the database named
neo4j
:STOP DATABASE neo4j
-
Compact the
neo4j
database using the command:bin/neo4j-admin database copy neo4j neo4j --compact-node-store --temp-path=<my-prefered-staging-area>
--temp-path
can be used to specify a different directory to use as a temporary staging area. If omitted, the current working directory will be used. -
Start the
neo4j
database. This is the newly created version of the database.START DATABASE neo4j
-
If your original database has a schema defined, recreate the schema using the schema commands saved in the file <database-name>-schema.cypher.
For a detailed example of how to reclaim unused space, see Reclaim unused space. |
Estimating the processing time
Estimations for how long the neo4j-admin database copy
command takes can be made based on the following:
-
Neo4j, like many other databases, do IO in 8K pages.
-
Your disc manufacturer will have a value for the maximum IOPS it can process.
For example, if your disc manufacturer has provided a maximum of 5000 IOPS, you can reasonably expect up to 5000 such page operations a second.
Therefore, the maximal theoretical throughput you can expect is 40MB/s (or 144 GB/hour) on that disc.
You may then assume that the best-case scenario for running neo4j-admin database copy
on that 5000 IOPS disc is that it takes at least 1 hour to process a 144 GB database. [1]
However, it is important to remember that the process must read 144 GB from the source database, and must also write to the target store (assuming the target store is of comparable size).
Additionally, there are internal processes during the copy that reads/modifies/writes the store multiple times.
Therefore, with an additional 144 GB of both read and write, the best-case scenario for running neo4j-admin database copy
on a 5000 IOPS disc is that it takes at least 3 hours to process a 144 GB database.
Finally, it is also important to consider that in almost all Cloud environments, the published IOPS value may not be the same as the actual value, or be able to continuously maintain the maximum possible IOPS. The real processing time for this example could be well above that estimation of 3 hours.
MB/s = (IOPS * B) ÷ 10^6
, where B
is the block size in bytes; in the case of Neo4j, this is 8000
. GB/hour can then be calculated from (MB/s * 3600) ÷ 1000
.
Was this page helpful?