Tutorial: Back up and copy a database in a Neo4j standalone instance

This tutorial provides a detailed example of how to back up a 3.5 database and use the neo4j-admin copy command to migrate it to a 4.x Neo4j standalone instance.

The neo4j-admin copy command can be used to clean up database inconsistencies, compact stores, and do a migration at the same time. It allows a large range of migrations, e.g., 3.5.x directly to 4.2.y Enterprise edition, skipping the intermediate steps of 3.5.latest → 4.0.latest → 4.1.latest → 4.2.latest. Those steps are needed to migrate the schema, but since the neo4j-admin copy command does not copy the schema store at all, they are not needed. However, if there is a schema defined, you have to recreate it by running the commands that the neo4j-admin copy operation outputs. The neo4j-admin copy command can be applied only to an offline database.

The neo4j-admin copy command is an Enterprise Edition feature. Therefore, it is not possible to migrate to a Community Edition, for example, from Neo4j Community Edition 3.5 to Neo4j Community Edition 4.2. However, you can use it to migrate from Neo4j Community Edition to Neo4j Enterprise Edition.

The following is an example of how to check your 3.5 database store usage, perform a backup, compact the database backup (using neo4j-admin copy), and create it in a Neo4j 4.x standalone instance.

Your copied node IDs will be the same, but the relationships will get new IDs. Therefore, if you want to preserve the relationship IDs, follow the regular backup and restore upgrade path.

1. Check your 3.5 database store usage

Before you back up and copy your 3.5 database, let’s look at the database store usage and see how it changes when you load, delete, and then reload data.

  1. Log in to Neo4j Browser of your running 3.5 Neo4j standalone instance, add 100k nodes to the graph.db database using the following command:

    FOREACH (x IN RANGE (1,100000) | CREATE (n:Person {name:x}))
  2. Create an index on the name property of the Person node:

    CREATE INDEX ON :Person(name)
  3. Use the dbms.checkpoint() procedure to flush all cached updates from the page cache to the store files.

    CALL dbms.checkpoint()
  4. In your terminal, navigate to the graph.db database ($neo4j_home/data/databases/graph.db) and run the following command to check the store size of the loaded nodes and properties.

    ls -alh
    ...
    -rw-r--r--   1 username  staff   1.4M 26 Nov 15:51 neostore.nodestore.db
    -rw-r--r--   1 username  staff   3.9M 26 Nov 15:51 neostore.propertystore.db
    ...

    The output reports that the node store (neostore.nodestore.db) and the property store (neostore.propertystore.db) occupy 1.4M and 3.9M, respectively.

  5. In Neo4j Browser, delete the nodes created above and run CALL dbms.checkpoint again to force a checkpoint.

    MATCH (n) DETACH DELETE n
    CALL dbms.checkpoint()
  6. Now, add just one node, force a checkpoint, and repeat step 4 to see if the store size has changed.

    CREATE (n:Person {name:"John"})
    CALL dbms.checkpoint()

    If you check the size of the node store and the property store now, they will still be 1.4M and 3.9M, even though the database only contains one node and one property. Neo4j does not shrink the store files on the hard drive.

In a production database, where numerous load/delete operations are performed, the result is a significant unused space occupied by store files.

2. Back up your 3.5 database

Navigate to the /bin folder, and run the following command to back up your database in the targeted folder. If the folder where you want to place your backup does not exist, you have to create it. In this example, it is called /tmp/3.5.24.

./neo4j-admin backup --backup-dir=/tmp/3.5.24 --name=graphdbbackup

For details on performing a backup and the different command options, see Operations Manual → Perform a backup.

3. Copy your 3.5 database backup to 4.x Neo4j

You can use the neo4j-admin copy command to reclaim the unused space and create a defragmented copy of your database backup in your 4.x standalone instance.

To speed up the copy operation, you can use the --from-pagecache and --to-pagecache options to specify how much cache to be allocated when reading the source and writing the destination. As a rule of thumb, --to-pagecache should be around 1-2GB, since it mostly does sequential writes. The --from-pagecache should then be assigned whatever memory you can spare, since Neo4j does random reads from the source.

  1. In your 4.x Neo4j standalone instance, navigate to the /bin folder and run the following command to create a compacted store copy of your 3.5 database backup. Any inconsistent nodes, properties, and relationships will not be copied over to the newly created store.

    ./neo4j-admin copy --from-path=/private/tmp/3.5.24/graphdbbackup --to-database=compactdb
    Starting to copy store, output will be saved to:  $neo4j_home/logs/neo4j-admin-copy-2020-11-26.16.07.19.log
    2020-11-26 16:07:19.939+0000 INFO [StoreCopy] ### Copy Data ###
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Source: /private/tmp/3.5.24/graphdbbackup (page cache 8m)
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Target:  $neo4j_home/data/databases/compactdb (page cache 8m)
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Empty database created, will start importing readable data from the source.
    2020-11-26 16:07:21.661+0000 INFO [o.n.i.b.ImportLogic] Import starting
    
    Import starting 2020-11-26 16:07:21.699+0000
      Estimated number of nodes: 50.00 k
      Estimated number of node properties: 50.00 k
      Estimated number of relationships: 0.00
      Estimated number of relationship properties: 50.00 k
      Estimated disk space usage: 2.680MiB
      Estimated required memory usage: 8.598MiB
    
    (1/4) Node import 2020-11-26 16:07:22.220+0000
      Estimated number of nodes: 50.00 k
      Estimated disk space usage: 1.698MiB
      Estimated required memory usage: 8.598MiB
    .......... .......... .......... .......... ..........   5% ∆239ms
    .......... .......... .......... .......... ..........  10% ∆1ms
    .......... .......... .......... .......... ..........  15% ∆1ms
    .......... .......... .......... .......... ..........  20% ∆0ms
    .......... .......... .......... .......... ..........  25% ∆1ms
    .......... .......... .......... .......... ..........  30% ∆0ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆1ms
    .......... .......... .......... .......... ..........  45% ∆0ms
    .......... .......... .......... .......... ..........  50% ∆1ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... .........-  60% ∆51ms
    .......... .......... .......... .......... ..........  65% ∆0ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆1ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆1ms
    .......... .......... .......... .......... ..........  95% ∆0ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    (2/4) Relationship import 2020-11-26 16:07:22.543+0000
      Estimated number of relationships: 0.00
      Estimated disk space usage: 1006KiB
      Estimated required memory usage: 15.60MiB
    (3/4) Relationship linking 2020-11-26 16:07:22.879+0000
      Estimated required memory usage: 7.969MiB
    (4/4) Post processing 2020-11-26 16:07:23.272+0000
      Estimated required memory usage: 7.969MiB
    -......... .......... .......... .......... ..........   5% ∆356ms
    .......... .......... .......... .......... ..........  10% ∆0ms
    .......... .......... .......... .......... ..........  15% ∆1ms
    .......... .......... .......... .......... ..........  20% ∆0ms
    .......... .......... .......... .......... ..........  25% ∆0ms
    .......... .......... .......... .......... ..........  30% ∆1ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆0ms
    .......... .......... .......... .......... ..........  45% ∆1ms
    .......... .......... .......... .......... ..........  50% ∆0ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... ..........  60% ∆0ms
    .......... .......... .......... .......... ..........  65% ∆1ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆0ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆0ms
    .......... .......... .......... .......... ..........  95% ∆1ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    
    IMPORT DONE in 2s 473ms.
    Imported:
      1 nodes
      0 relationships
      1 properties
    Peak memory usage: 15.60MiB
    2020-11-26 16:07:24.140+0000 INFO [o.n.i.b.ImportLogic] Import completed successfully, took 2s 473ms. Imported:
      1 nodes
      0 relationships
      1 properties
    2020-11-26 16:07:24.668+0000 INFO [StoreCopy] Import summary: Copying of 100704 records took 4 seconds (25176 rec/s). Unused Records 100703 (99%) Removed Records 0 (0%)
    2020-11-26 16:07:24.669+0000 INFO [StoreCopy] ### Extracting schema ###
    2020-11-26 16:07:24.669+0000 INFO [StoreCopy] Trying to extract schema...
    2020-11-26 16:07:24.920+0000 INFO [StoreCopy] ... found 1 schema definitions. The following can be used to recreate the schema:
    2020-11-26 16:07:24.922+0000 INFO [StoreCopy]
    
    CALL db.createIndex('index_5c0607ad', ['Person'], ['name'], 'native-btree-1.0', {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0],`spatial.cartesian.min`: [-1000000.0, -1000000.0],`spatial.wgs-84.min`: [-180.0, -90.0],`spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0],`spatial.cartesian.max`: [1000000.0, 1000000.0],`spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0],`spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0],`spatial.wgs-84.max`: [180.0, 90.0]})
    2020-11-26 16:07:24.923+0000 INFO [StoreCopy] You have to manually apply the above commands to the database when it is stared to recreate the indexes and constraints. The commands are saved to $neo4j_home/logs/neo4j-admin-copy-2020-11-26.16.07.19.log as well for reference.
  2. Run the following command to verify that database has been successfully copied.

    ls -al ../data/databases
    total 0
    drwxr-xr-x@  5 username  staff   160 26 Nov 18:00 .
    drwxr-xr-x@  5 username  staff   160 26 Nov 18:00 ..
    drwxr-xr-x  35 username  staff  1120 26 Nov 17:58 compactdb
    -rw-r--r--   1 username  staff     0 26 Nov 18:00 store_lock
    drwxr-xr-x  33 username  staff  1056 26 Nov 18:00 system

    Copying a database does not automatically create it. Therefore, it will not be visible if you do SHOW DATABASES in Cypher Shell or Neo4j Browser.

4. Create your compacted backup

You can now create the copied database and compare its store size with the size of the backed up database.

  1. Log in to the Cypher Shell command-line console, change the active database to system (:USE system;), and create the compactdb database. For more information about the Cypher Shell command-line interface (CLI) and how to use it, see Operations Manual → Cypher Shell.

    CREATE DATABASE compactdb;
    0 rows available after 145 ms, consumed after another 0 ms
  2. Verify that the compactdb database is online.

    SHOW DATABASES;
    +-------------------------------------------------------------------------------------------------------+
    | name            | address          | role         | requestedStatus | currentStatus | error | default |
    +-------------------------------------------------------------------------------------------------------+
    | "compactdb"     | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "neo4j"         | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
    | "system"        | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    +-------------------------------------------------------------------------------------------------------+
    
    3 rows available after 10 ms, consumed after another 3 ms
  3. Change your active database to compactdb and recreate the schema using the output from the neo4j-admin copy command.

    CALL db.createIndex('index_5c0607ad', ['Person'], ['name'], 'native-btree-1.0', {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0],`spatial.cartesian.min`: [-1000000.0, -1000000.0],`spatial.wgs-84.min`: [-180.0, -90.0],`spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0],`spatial.cartesian.max`: [1000000.0, 1000000.0],`spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0],`spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0],`spatial.wgs-84.max`: [180.0, 90.0]});
    +-----------------------------------------------------------------------------------+
    | name             | labels     | properties | providerName       | status          |
    +-----------------------------------------------------------------------------------+
    | "index_5c0607ad" | ["Person"] | ["name"]   | "native-btree-1.0" | "index created" |
    +-----------------------------------------------------------------------------------+
    
    1 row available after 50 ms, consumed after another 5 ms
  4. Verify that all the data has been successfully copied. In this example, there should be one node.

    MATCH (n) RETURN n.name;
    +--------+
    | n.name |
    +--------+
    | "John" |
    +--------+
    
    1 row available after 106 ms, consumed after another 2 ms
  5. Exit the Cypher Shell command-line console.

    :exit;
    
    Bye!
  6. Navigate to the compactdb database ($neo4j_home/data/databases/compactdb) and check the store size of the copied nodes and properties.

    ls -alh
    ...
    -rw-r--r--   1 username  staff   8.0K 26 Nov 17:58 neostore.nodestore.db
    -rw-r--r--   1 username  staff   8.0K 26 Nov 17:58 neostore.propertystore.db
    ...

    The output reports that the node store and the property store now occupy only 8K each, compared to the previous 1.4M and 3.9M.