Copy a database

This chapter describes the copy command of Neo4j Admin.

The copy command of neo4j-admin is used to copy data from an existing database to a new database.

The syntax follows:

neo4j-admin copy --to-database=<database>
                [--neo4j-home-directory=<path>]
                (--from-database=<database> | --from-path=<path>)
                [--from-pagecache=<size>]
                [--from-path-tx=<path>]
                [--to-format=<format>]
                [--to-pagecache=<size>]
                [--delete-nodes-with-labels=<label>[,<label>...]]
                [--keep-only-nodes-with-labels=<label>[,<label>...]]
                [--skip-labels=<label>[,<label>...]]
                [--skip-properties=<property>[,<property>...]]
                [--skip-node-properties=<label.property>[,<label.property>...]]
                [--keep-only-node-properties=<label.property>[,<label.property>...]]
                [--skip-relationship-properties=<relationship.property>[,<relationship.property>...]]
                [--keep-only-relationship-properties=<relationship.property>[,<relationship.property>...]]
                [--skip-relationships=<relationship>[,<relationship>...]]`
                [--force]
                [--verbose]

The existing database must be stopped before copying from it, and the destination database must not yet exist.

The copy command can process an optional set of filters. This can be used to remove data that is unwanted in the destination database.

The schema definitions, i.e. index and constraint, are not automatically transferred. However, they will be extracted and presented as Cypher statements so you can recreate the ones you want.

Options

Option Description

--from-database

The database name to copy from. Will assume the database is in the configured location.

--from-pagecache

The size of the page cache to use for reading.

--from-path

The path to the database to copy from. It can be used to target databases outside of the installation, e.g. backups.

--from-path-tx

The path to the transaction log files. You only need to use this if the command is unable to determine where they are located.

--to-database

The destination database name.

--neo4j-home-directory

Path to the home directory for the copied database. Default is the same as the database copied from.

--to-format

The store format of the destination database. Valid values are same, standard, high_limit, aligned.

The default value for this option is the format of the source database.

--to-pagecache

The size of the page cache to use for writing.

--delete-nodes-with-labels

A list of labels. Any node matching any of the labels will be ignored during copy.

Cannot be combined with --keep-only-nodes-with-labels.

--keep-only-nodes-with-labels

A list of labels. All nodes that have any of the specified labels will be kept.

Cannot be combined with --delete-nodes-with-labels.

--skip-properties

A list of property keys to ignore during the copy.

Cannot be combined with --skip-node-properties, --keep-only-node-properties, --skip-relationship-properties, or --keep-only-relationship-properties.

--skip-node-properties

A list of property keys to ignore for nodes with the specified label.

Cannot be combined with --skip-properties or --keep-only-node-properties.

--keep-only-node-properties

A list of property keys to keep for nodes with the specified label. Any labels not explicitly mentioned will keep their properties.

Cannot be combined with --skip-properties or --skip-node-properties.

--skip-relationship-properties

A list of property keys to ignore for relationships with the specified type.

Cannot be combined with --skip-properties or --keep-only-relationship-properties.

--keep-only-relationship-properties

A list of property keys to keep for relationships with the specified type. Any relationship types not explicitly mentioned will keep their properties.

Cannot be combined with --skip-properties or --skip-relationship-properties.

--skip-labels

A list of labels to ignore during the copy.

--skip-relationships

A list of relationship types to ignore during the copy.

--verbose

Will instruct the tool to print more verbose output.

--force

Will force the command to proceed even if the integrity of the database cannot be verified.

Due to the way filters are processed, the id of the node might change. This is true even if no filters are specified.

Examples

Example 1. Use the copy command to take a copy of the database neo4j.

To begin, you must stop the database named neo4j; this can be done by issuing the following Cypher statement:

STOP DATABASE neo4j

You can now copy the data from neo4j, to a new database called copy:

$neo4j-home> bin/neo4j-admin copy --from-database=neo4j --to-database=copy

A new database with the name copy now exists on the server, but it is not automatically picked up by Neo4j. To start the new database you have to insert it into Neo4j with the following Cypher query:

CREATE DATABASE copy

Neo4j will then detect the copied database and begin to use that.

Remember to start the database you copied from, if you still want it, using:

START DATABASE neo4j

The console output is saved to logs/neo4j-admin-copy-<date>.log.

Example 2. Use the copy command with filters.

The command can perform some basic forms of processing. You can remove nodes, labels, properties, and/or relationships.

The difference with --skip-labels and --delete-nodes-with-labels is that --skip-labels will just remove the labels, potentially leaving nodes without any labels.

$neo4j-home> bin/neo4j-admin copy --from-database=neo4j --to-database=copy --delete-nodes-with-labels="Cat,Dog"

After this command, you will have a copy of the database neo4j, without nodes with the labels :Cat and :Dogs.

Labels are processed independently, i.e. the filter described above will delete any node with either :Cat or :Dogs, and not only nodes that have both of the labels.

1. Sharding data with the copy command

The copy command can be used to filter out data for a Fabric installation.

In the following example we will go through a sample database that will be separated into 3 shards.

Example 3. Use the copy command to filter out data for a Fabric installation.

The sample database contains the following data:

(p1 :Person :S2 {id:123, name: "Ava"})
(p2 :Person :S2 {id:124, name: "Bob"})
(p3 :Person :S3 {id:125, name: "Cat", age: 54})
(p4 :Person :S3 {id:126, name: "Dan"})

(t1 :Team :S1 :SAll {id:1, name: "Foo", mascot: "Pink Panther"})
(t2 :Team :S1 :SAll {id:2, name: "Bar", mascot: "Cookie Monster"})

(d1 :Division :SAll {name: "Marketing"})

(p1)-[:MEMBER]->(t1)
(p2)-[:MEMBER]->(t2)
(p3)-[:MEMBER]->(t1)
(p4)-[:MEMBER]->(t2)

The data has been prepared using queries to add the labels :S1,:S2, :S3, and :SAll which denotes the target shard. Shard 1 will contain the team data. Shard 2 and Shard 3 will contain person data.

  1. We start by creating Shard 1 with:

    $neo4j-home> bin/neo4j-admin copy --from-database=neo4j \
       --to-database=shard1 \
       --keep-only-nodes-with-labels=S1,SAll \
       --skip-labels=S1,S2,S3,SAll

    The --keep-only-node-with-labels property is used to filter out everything that doesn’t have the label :S1 or :SAll.

    The --skip-labels property is also used so as to not include the temporary labels we created for the sharding process. The resulting shard will contain the following:

    (t1 :Team {id:1, name: "Foo", mascot: "Pink Panther"})
    (t2 :Team {id:2, name: "Bar", mascot: "Cookie Monster"})
    
    (d1 :Division {name: "Marketing"})
  2. Next we create Shard 2:

    $neo4j-home> bin/neo4j-admin copy --from-database=neo4j \
       --to-database=shard2 \
       --keep-only-nodes-with-labels=S2,SAll \
       --skip-labels=S1,S2,S3,SAll \
       --keep-only-node-properties=Team.id

    In Shard 2 we want to keep the :Team nodes as proxy nodes in order to be able to link together information from the separate shards. The nodes will be included since they have the label :SAll, but we specify --keep-only-node-properties so as to not duplicate the team information from Shard 1.

    (p1 :Person {id:123, name: "Ava"})
    (p2 :Person {id:124, name: "Bob"})
    
    (t1 :Team {id:1})
    (t2 :Team {id:2})
    
    (d1 :Division {name: "Marketing"})
    
    (p1)-[:MEMBER]->(t1)
    (p2)-[:MEMBER]->(t2)

    Observe that --keep-only-node-properties did not filter out Person.name since the :Person label was not mentioned in the filter.

  3. Finally, we do exactly the same thing for Shard 3, but with the filter --skip-node-properties, instead of --keep-only-node-properties.

    $neo4j-home> bin/neo4j-admin copy --from-database=neo4j \
       --to-database=shard3 \
       --keep-only-nodes-with-labels=S3,SAll \
       --skip-labels=S1,S2,S3,SAll \
       --skip-node-properties=Team.name,Team.mascot

    This will produce:

    (p3 :Person {id:125, name: "Cat", age: 54})
    (p4 :Person {id:126, name: "Dan"})
    
    (t1 :Team {id:1})
    (t2 :Team {id:2})
    
    (d1 :Division {name: "Marketing"})
    
    (p3)-[:MEMBER]->(t1)
    (p4)-[:MEMBER]->(t2)

    As demonstrated, we can achieve the same result with both --skip-node-properties and --keep-only-node-properties.

In this example it is easier to use --keep-only-node-properties, since only one property should be kept.

The relationship property filters works in the same way.