Copy a database store

A user database or backup can be copied to a Neo4j instance using the copy command of neo4j-admin.

neo4j-admin database copy is not supported for use on the system database.

neo4j-admin database copy is not supported for use on composite databases. It must be run directly on the databases that are part of a composite database.

It is important to note that neo4j-admin database copy is an IOPS-intensive process. Using this process for upgrading or migration purposes can have significant performance implications, depending on your disc specification. It is therefore not appropriate for all use cases.

Estimating the processing time

Estimations for how long the neo4j-admin database copy command takes can be made based upon the following:

  • Neo4j, like many other databases, do IO in 8K pages.

  • Your disc manufacturer will have a value for the maximum IOPS it can process.

For example, if your disc manufacturer has provided a maximum of 5000 IOPS, you can reasonably expect up to 5000 such page operations a second. Therefore, the maximal theoretical throughput you can expect is 40MB/s (or 144 GB/hour) on that disc. You may then assume that the best-case scenario for running neo4j-admin database copy on that 5000 IOPS disc is that it takes at least 1 hour to process a 144 GB database. [1]

However, it is important to remember that the process must read 144 GB from the source database, and must also write to the target store (assuming the target store is of comparable size). Additionally, there are internal processes during the copy that reads/modifies/writes the store multiple times. Therefore, with an additional 144 GB of both read and write, the best-case scenario for running neo4j-admin database copy on a 5000 IOPS disc is that it takes at least 3 hours to process a 144 GB database.

Finally, it is also important to consider that in almost all Cloud environments, the published IOPS value may not be the same as the actual value, or be able to continuously maintain the maximum possible IOPS. The real processing time for this example could be well above that estimation of 3 hours.

For detailed information about supported methods of upgrade and migration, see the Neo4j Upgrade and Migration Guide.

Command

neo4j-admin database copy copies the data store of an existing offline database to a new database.

Usage

The neo4j-admin database copy command can be used to clean up database inconsistencies, compact stores, and do a direct migration from Neo4j 3.5 to any 4.x version. It can process an optional set of filters, which you can use to remove any unwanted data before copying the database. The command also reclaims the unused space of a database and creates a defragmented copy of that database or backup in the destination Neo4j instance.

neo4j-admin database copy copies the data store and creates the node label and relationship type lookup indexes. Any additional schema (indexes and constraints) defined are not included. However, the command will output Cypher statements, which you can run to recreate the indexes and constraints.

For a detailed example of how to reclaim unused space, see Reclaim unused space.

neo4j-admin database copy preserves the node IDs; however, the relationships get new IDs.

Syntax

neo4j-admin database copy   [--verbose]
                            [--from-path-data=<path> --from-path-txn=<path>]
                            [--to-path-data=<path> --to-path-txn=<path>]
                            [--to-path-schema=<path>]
                            [--force]
                            [--compact-node-store[=true|false]]
                            [--to-format=<format>]
                            [--ignore-nodes-with-labels=<label>[,<label>...]]
                            [--ignore-relationships-with-types=<type>[,<type>...]]
                            [--copy-only-node-properties=<label.property>[,<label.property>...]]
                            [--copy-only-nodes-with-labels=<label>[,<label>...]]
                            [--copy-only-relationship-properties=<relationship.property>[,<relationship.property>...]]
                            [--copy-only-relationships-with-types=<type>[,<type>...]]
                            [--skip-labels=<label>[,<label>...]]
                            [--skip-node-properties=<label.property>[,<label.property>...]]
                            [--skip-properties=<property>[,<property>...]]
                            [--skip-relationship-properties=<relationship.property>[,<relationship.property>...]]
                            [--from-pagecache=<size>]
                            <fromDatabase>
                            <toDatabase>

<toDatabase> — Name of the target database.

<fromDatabase> — Name of the source database.

Options

Option Description

--verbose

Enable verbose output.

--from-path-data

Path to the /databases directory, containing the directory of the source database.

It can be used to target databases outside of the installation.

Default: server.directories.data/databases

--from-path-txn

Path to the /transactions directory, containing the transaction directory of the source database.

--to-path-data=<path>

Path to the /databases directory, containing the directory of the target database.

Default: server.directories.data/databases

--to-path-txn

Path to the /transactions directory, containing the transaction directory of the target database.

--to-path-schema

Path to the directory where the schema commands file will be created.

Default is the current directory.

--force

Force the command to proceed even if the integrity of the database can not be verified.

--compact-node-store

Enforce node store compaction.

By default, the node store is not compacted on copy since it changes the node IDs.

--to-format

Set the format for the new database.

Valid values are same, standard, high_limit, and aligned. The high_limit format is only available in Enterprise Edition. If you go from high_limit to standard or aligned, there is no validation that the data will fit.

Default: The format of the source database.

--ignore-nodes-with-labels

A comma-separated list of labels.

Nodes that have any of the specified labels will not be included in the copy. Cannot be combined with --copy-only-nodes-with-labels.

--ignore-relationships-with-types

A comma-separated list of relationship types.

Relationships with any of the specified relationship types will not be included in the copy. Cannot be combined with --copy-only-relationships-with-types.

--copy-only-node-properties

A comma-separated list of property keys to include in the copy for nodes with the specified label.

Nodes whose labels are not explicitly mentioned in the list will have all their properties included in the copy. Cannot be combined with --skip-properties or --skip-node-properties.

--copy-only-nodes-with-labels

A comma-separated list of labels.

All nodes that have any of the specified labels will be included in the copy. Cannot be combined with --ignore-nodes-with-labels.

--copy-only-relationship-properties

A comma-separated list of property keys to include in the copy for relationships with the specified type.

Relationship types that are not explicitly mentioned will have all their properties included in the copy. Cannot be combined with --skip-properties or --skip-relationship-properties.

--copy-only-relationships-with-types=<type>[,<type>…​]

A comma-separated list of relationship types.

All relationships with any of the specified types will be included in the copy. Cannot be combined with --ignore-relationships-with-types.

--skip-labels

A comma-separated list of labels to ignore during the copy.

--skip-node-properties

A comma-separated list of property keys to ignore for nodes with the specified label.

Cannot be combined with --skip-properties or --copy-only-node-properties.

--skip-properties

A comma-separated list of property keys to ignore during the copy.

Cannot be combined with --skip-node-properties, --copy-only-node-properties, --skip-relationship-properties, and --copy-only-relationship-properties.

--skip-relationship-properties

A comma-separated list of property keys to ignore for relationships with the specified type.

Cannot be combined with --skip-properties or --copy-only-relationship-properties.

--from-pagecache

The size of the page cache to use for reading.

You can use the --from-pagecache option to speed up the copy operation by specifying how much cache to allocate when reading the source. The --from-pagecache should be assigned whatever memory you can spare since Neo4j does random reads from the source.

Examples

Example 1. Use neo4j-admin database copy to copy the data store of the database neo4j.
  1. Stop the database named neo4j:

    STOP DATABASE neo4j
  2. Copy the data store from neo4j to a new database called copy:

    bin/neo4j-admin database copy neo4j copy
  3. Run the following command to verify that database has been successfully copied.

    ls -al ../data/databases
    Copying a database does not automatically create it. Therefore, it will not be visible if you do SHOW DATABASES at this point.
  4. Create the copied database.

    CREATE DATABASE copy
  5. Verify that the copy database is online.

    SHOW DATABASES
  6. If your original database has a schema defined, change your active database to copy and recreate the schema using the schema commands saved in the file <database-name>-schema.cypher.

    --to-path-schema can be used to specify a different directory for the schema file.
Example 2. Use neo4j-admin database copy to filter the data you want to copy.

The command can perform some basic forms of processing. You can filter the data that you want to copy by removing nodes, labels, properties, and relationships.

bin/neo4j-admin database copy neo4j copy --ignore-nodes-with-labels="Cat,Dog"

The command creates a copy of the database neo4j but without the nodes with the labels :Cat and :Dog.

Labels are processed independently, i.e., the filter ignores any node with a label :Cat, :Dog, or both.

For a detailed example of how to use neo4j-admin database copy to filter out data for sharding a database, see Sharding data with the copy command.


1. The calculations are based on MB/s = (IOPS * B) ÷ 10^6, where B is the block size in bytes; in the case of Neo4j, this is 8000. GB/hour can then be calculated from (MB/s * 3600) ÷ 1000.