Tutorial: Back up and copy a single database in a running single instance

This tutorial provides a detailed example of how to back up a single database, in this example version 3.5, and use the neo4j-admin copy command to copy it into a running 4.x Neo4j standalone instance.

The neo4j-admin copy command can be used to clean up database inconsistencies, compact stores, and upgrade/migrate a database (from Community or Enterprise) to a later version of Neo4j Enterprise edition. Since the neo4j-admin copy command does not copy the schema store, the intermediary steps of the sequential path are not needed. If a schema is defined, you just have to recreate it by running the commands that the neo4j-admin copy operation outputs.

Keep in mind that:

  • The neo4j-admin copy command copies the node IDs, but the relationships get new IDs.

  • neo4j-admin copy is a copy tool, not an upgrade or migration tool.
    It copies only a single database and cannot be applied to the system database.

Therefore, if you want to preserve your relationships IDs, or to upgrade the whole DBMS, you should follow the sequential path.

It is important to note that neo4j-admin copy is an IOPS-intensive process.

Estimations for how long the neo4j-admin copy command will take can be made based on the following:

  • Neo4j, like many other databases, do IO in 8K pages.

  • The maximum value of IOPS your disc can process, provided by your disc manufacturer.

For example, if your disc manufacturer has provided a maximum of 5000 IOPS, you can reasonably expect up to 5000 such page operations a second. Therefore, the maximal theoretical throughput you can expect is 40MB/s (or 144 GB/hour) on that disc. You may then assume that the best-case scenario for running neo4j-admin copy on that 5000 IOPS disc is that it will take at least 1 hour to process a 144 GB database. [1]

However, it is important to remember that the process must read 144 GB from the source database, and must also write to the target store (assuming the target store is of comparable size). Additionally, there are internal processes during the copy that will read/modify/write the store multiple times. Therefore, with an additional 144 GB of both read and write, the best-case scenario for running neo4j-admin copy on a 5000 IOPS disc is that it will actually take at least 3 hours to process a 144 GB database.

Finally, it is also important to consider that in almost all Cloud environments, the published IOPS value may not be the same as the actual value, or be able to continuously maintain the maximum possible IOPS. The real processing time for this example could be well above that estimation of 3 hours.

This tutorial walks through the basics of checking your database store usage, in this example version 3.5, performing a backup, compacting the database backup (using neo4j-admin copy), and creating it in a running Neo4j 4.x standalone instance.

1. Check your 3.5 database store usage

Before you back up and copy your 3.5 database, let’s look at the database store usage and see how it changes when you load, delete, and then reload data.

  1. Log in to Neo4j Browser of your running 3.5 Neo4j standalone instance, add 100k nodes to the graph.db database using the following command:

    FOREACH (x IN RANGE (1,100000) | CREATE (n:Person {name:x}))
  2. Create an index on the name property of the Person node:

    CREATE INDEX ON :Person(name)
  3. Use the dbms.checkpoint() procedure to flush all cached updates from the page cache to the store files.

    CALL dbms.checkpoint()
  4. In your terminal, navigate to the graph.db database ($neo4j_home/data/databases/graph.db) and run the following command to check the store size of the loaded nodes and properties.

    ls -alh
    ...
    -rw-r--r--   1 username  staff   1.4M 26 Nov 15:51 neostore.nodestore.db
    -rw-r--r--   1 username  staff   3.9M 26 Nov 15:51 neostore.propertystore.db
    ...

    The output reports that the node store (neostore.nodestore.db) and the property store (neostore.propertystore.db) occupy 1.4M and 3.9M, respectively.

  5. In Neo4j Browser, delete the nodes created above and run CALL dbms.checkpoint again to force a checkpoint.

    MATCH (n) DETACH DELETE n
    CALL dbms.checkpoint()
  6. Now, add just one node, force a checkpoint, and repeat step 4 to see if the store size has changed.

    CREATE (n:Person {name:"John"})
    CALL dbms.checkpoint()

    If you check the size of the node store and the property store now, they will still be 1.4M and 3.9M, even though the database only contains one node and one property. Neo4j does not shrink the store files on the hard drive.

In a production database, where numerous load/delete operations are performed, the result is a significant unused space occupied by store files.

2. Back up your 3.5 database

Navigate to the /bin folder, and run the following command to back up your database in the targeted folder. If the folder where you want to place your backup does not exist, you have to create it. In this example, it is called /tmp/3.5.24.

./neo4j-admin backup --backup-dir=/tmp/3.5.24 --name=graphdbbackup

For details on performing a backup and the different command options, see Operations Manual → Perform a backup.

3. Copy your 3.5 database backup to 4.x Neo4j

You can use the neo4j-admin copy command to reclaim the unused space and create a defragmented copy of your database backup in your 4.x standalone instance.

To speed up the copy operation, you can use the --from-pagecache and --to-pagecache options to specify how much cache to be allocated when reading the source and writing the destination. As a rule of thumb, --to-pagecache should be around 1-2GB, since it mostly does sequential writes. The --from-pagecache should then be assigned whatever memory you can spare, since Neo4j does random reads from the source.

  1. In your 4.x Neo4j standalone instance, navigate to the /bin folder and run the following command to create a compacted store copy of your 3.5 database backup. Any inconsistent nodes, properties, and relationships will not be copied over to the newly created store.

    ./neo4j-admin copy --from-path=/private/tmp/3.5.24/graphdbbackup --to-database=compactdb
    Starting to copy store, output will be saved to:  $neo4j_home/logs/neo4j-admin-copy-2020-11-26.16.07.19.log
    2020-11-26 16:07:19.939+0000 INFO [StoreCopy] ### Copy Data ###
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Source: /private/tmp/3.5.24/graphdbbackup (page cache 8m)
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Target:  $neo4j_home/data/databases/compactdb (page cache 8m)
    2020-11-26 16:07:19.940+0000 INFO [StoreCopy] Empty database created, will start importing readable data from the source.
    2020-11-26 16:07:21.661+0000 INFO [o.n.i.b.ImportLogic] Import starting
    
    Import starting 2020-11-26 16:07:21.699+0000
      Estimated number of nodes: 50.00 k
      Estimated number of node properties: 50.00 k
      Estimated number of relationships: 0.00
      Estimated number of relationship properties: 50.00 k
      Estimated disk space usage: 2.680MiB
      Estimated required memory usage: 8.598MiB
    
    (1/4) Node import 2020-11-26 16:07:22.220+0000
      Estimated number of nodes: 50.00 k
      Estimated disk space usage: 1.698MiB
      Estimated required memory usage: 8.598MiB
    .......... .......... .......... .......... ..........   5% ∆239ms
    .......... .......... .......... .......... ..........  10% ∆1ms
    .......... .......... .......... .......... ..........  15% ∆1ms
    .......... .......... .......... .......... ..........  20% ∆0ms
    .......... .......... .......... .......... ..........  25% ∆1ms
    .......... .......... .......... .......... ..........  30% ∆0ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆1ms
    .......... .......... .......... .......... ..........  45% ∆0ms
    .......... .......... .......... .......... ..........  50% ∆1ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... .........-  60% ∆51ms
    .......... .......... .......... .......... ..........  65% ∆0ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆1ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆1ms
    .......... .......... .......... .......... ..........  95% ∆0ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    (2/4) Relationship import 2020-11-26 16:07:22.543+0000
      Estimated number of relationships: 0.00
      Estimated disk space usage: 1006KiB
      Estimated required memory usage: 15.60MiB
    (3/4) Relationship linking 2020-11-26 16:07:22.879+0000
      Estimated required memory usage: 7.969MiB
    (4/4) Post processing 2020-11-26 16:07:23.272+0000
      Estimated required memory usage: 7.969MiB
    -......... .......... .......... .......... ..........   5% ∆356ms
    .......... .......... .......... .......... ..........  10% ∆0ms
    .......... .......... .......... .......... ..........  15% ∆1ms
    .......... .......... .......... .......... ..........  20% ∆0ms
    .......... .......... .......... .......... ..........  25% ∆0ms
    .......... .......... .......... .......... ..........  30% ∆1ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆0ms
    .......... .......... .......... .......... ..........  45% ∆1ms
    .......... .......... .......... .......... ..........  50% ∆0ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... ..........  60% ∆0ms
    .......... .......... .......... .......... ..........  65% ∆1ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆0ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆0ms
    .......... .......... .......... .......... ..........  95% ∆1ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    
    IMPORT DONE in 2s 473ms.
    Imported:
      1 nodes
      0 relationships
      1 properties
    Peak memory usage: 15.60MiB
    2020-11-26 16:07:24.140+0000 INFO [o.n.i.b.ImportLogic] Import completed successfully, took 2s 473ms. Imported:
      1 nodes
      0 relationships
      1 properties
    2020-11-26 16:07:24.668+0000 INFO [StoreCopy] Import summary: Copying of 100704 records took 4 seconds (25176 rec/s). Unused Records 100703 (99%) Removed Records 0 (0%)
    2020-11-26 16:07:24.669+0000 INFO [StoreCopy] ### Extracting schema ###
    2020-11-26 16:07:24.669+0000 INFO [StoreCopy] Trying to extract schema...
    2020-11-26 16:07:24.920+0000 INFO [StoreCopy] ... found 1 schema definitions. The following can be used to recreate the schema:
    2020-11-26 16:07:24.922+0000 INFO [StoreCopy]
    
    CALL db.createIndex('index_5c0607ad', ['Person'], ['name'], 'native-btree-1.0', {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0],`spatial.cartesian.min`: [-1000000.0, -1000000.0],`spatial.wgs-84.min`: [-180.0, -90.0],`spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0],`spatial.cartesian.max`: [1000000.0, 1000000.0],`spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0],`spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0],`spatial.wgs-84.max`: [180.0, 90.0]})
    2020-11-26 16:07:24.923+0000 INFO [StoreCopy] You have to manually apply the above commands to the database when it is stared to recreate the indexes and constraints. The commands are saved to $neo4j_home/logs/neo4j-admin-copy-2020-11-26.16.07.19.log as well for reference.
  2. Run the following command to verify that database has been successfully copied.

    ls -al ../data/databases
    total 0
    drwxr-xr-x@  5 username  staff   160 26 Nov 18:00 .
    drwxr-xr-x@  5 username  staff   160 26 Nov 18:00 ..
    drwxr-xr-x  35 username  staff  1120 26 Nov 17:58 compactdb
    -rw-r--r--   1 username  staff     0 26 Nov 18:00 store_lock
    drwxr-xr-x  33 username  staff  1056 26 Nov 18:00 system

    Copying a database does not automatically create it. Therefore, it will not be visible if you do SHOW DATABASES in Cypher Shell or Neo4j Browser.

4. Create your compacted backup

You can now create the copied database and compare its store size with the size of the backed up database.

  1. Log in to the Cypher Shell command-line console, change the active database to system (:USE system;), and create the compactdb database. For more information about the Cypher Shell command-line interface (CLI) and how to use it, see Operations Manual → Cypher Shell.

    CREATE DATABASE compactdb;
    0 rows available after 145 ms, consumed after another 0 ms
  2. Verify that the compactdb database is online.

    SHOW DATABASES;
    +-------------------------------------------------------------------------------------------------------+
    | name            | address          | role         | requestedStatus | currentStatus | error | default |
    +-------------------------------------------------------------------------------------------------------+
    | "compactdb"     | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    | "neo4j"         | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
    | "system"        | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
    +-------------------------------------------------------------------------------------------------------+
    
    3 rows available after 10 ms, consumed after another 3 ms
  3. Change your active database to compactdb and recreate the schema using the output from the neo4j-admin copy command.

    CALL db.createIndex('index_5c0607ad', ['Person'], ['name'], 'native-btree-1.0', {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0],`spatial.cartesian.min`: [-1000000.0, -1000000.0],`spatial.wgs-84.min`: [-180.0, -90.0],`spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0],`spatial.cartesian.max`: [1000000.0, 1000000.0],`spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0],`spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0],`spatial.wgs-84.max`: [180.0, 90.0]});
    +-----------------------------------------------------------------------------------+
    | name             | labels     | properties | providerName       | status          |
    +-----------------------------------------------------------------------------------+
    | "index_5c0607ad" | ["Person"] | ["name"]   | "native-btree-1.0" | "index created" |
    +-----------------------------------------------------------------------------------+
    
    1 row available after 50 ms, consumed after another 5 ms
  4. Verify that all the data has been successfully copied. In this example, there should be one node.

    MATCH (n) RETURN n.name;
    +--------+
    | n.name |
    +--------+
    | "John" |
    +--------+
    
    1 row available after 106 ms, consumed after another 2 ms
  5. Exit the Cypher Shell command-line console.

    :exit;
    
    Bye!
  6. Navigate to the compactdb database ($neo4j_home/data/databases/compactdb) and check the store size of the copied nodes and properties.

    ls -alh
    ...
    -rw-r--r--   1 username  staff   8.0K 26 Nov 17:58 neostore.nodestore.db
    -rw-r--r--   1 username  staff   8.0K 26 Nov 17:58 neostore.propertystore.db
    ...

    The output reports that the node store and the property store now occupy only 8K each, compared to the previous 1.4M and 3.9M.


1. The calculations are based on MB/s = (IOPS * B) ÷ 10^6, where B is the block size in bytes; in the case of Neo4j, this is 8000. GB/hour can then be calculated from (MB/s * 3600) ÷ 1000.