Tutorial: Back up and copy a single database in a running cluster

This tutorial provides a detailed example of how to back up a single database, in this example version 3.5, and use the neo4j-admin copy command to copy it into a running 4.x Neo4j cluster.

The neo4j-admin copy command can be used to clean up database inconsistencies, compact stores, and upgrade/migrate a database (from Community or Enterprise) to a later version of Neo4j Enterprise edition. Since the neo4j-admin copy command does not copy the schema store, the intermediary steps of the sequential path are not needed. If a schema is defined, the commands that the neo4j-admin copy operation outputs can be used to create the new schema.

Keep in mind that:

  • The neo4j-admin copy command copies the node IDs, but the relationships get new IDs.

  • The neo4j-admin copy command is used to copy a single database from a specified Neo4j DBMS path to another Neo4j DBMS. Note that no schema data will be created on the new Neo4j DBMS. The system database cannot be copied with the neo4j-admin copy command.

Therefore, if you want to preserve your relationships IDs, or to upgrade the whole DBMS, you should follow the sequential path.

It is important to note that neo4j-admin copy is an IOPS-intensive process.

Estimations for how long the neo4j-admin copy command will take can be made based on the following:

  • Neo4j, like many other databases, do IO in 8K pages.

  • The maximum value of IOPS your disc can process, provided by your disc manufacturer.

For example, if your disc manufacturer has provided a maximum of 5000 IOPS, you can reasonably expect up to 5000 such page operations a second. Therefore, the maximal theoretical throughput you can expect is 40MB/s (or 144 GB/hour) on that disc. You may then assume that the best-case scenario for running neo4j-admin copy on that 5000 IOPS disc is that it will take at least 1 hour to process a 144 GB database. [1]

However, it is important to remember that the process must read 144 GB from the source database, and must also write to the target store (assuming the target store is of comparable size). Additionally, there are internal processes during the copy that will read/modify/write the store multiple times. Therefore, with an additional 144 GB of both read and write, the best-case scenario for running neo4j-admin copy on a 5000 IOPS disc is that it will actually take at least 3 hours to process a 144 GB database.

Finally, it is also important to consider that in almost all Cloud environments, the published IOPS value may not be the same as the actual value, or be able to continuously maintain the maximum possible IOPS. The real processing time for this example could be well above that estimation of 3 hours.

This tutorial walks through the basics of checking your database store usage, in this example version 3.5, performing a backup, compacting the database backup (using neo4j-admin copy), and creating it in a running Neo4j 4.x cluster.

1. Check your 3.5 database store usage

Before you back up and copy your 3.5 database, let’s look at the database store usage and see how it changes when you load, delete, and then reload data.

  1. Log in to Neo4j Browser of your running 3.5 Neo4j standalone instance, add 100k nodes to the graph.db database using the following command:

    FOREACH (x IN RANGE (1,100000) | CREATE (n:Person {name:x}))
  2. Create an index on the name property of the Person node:

    CREATE INDEX ON :Person(name)
  3. Use the dbms.checkpoint() procedure to flush all cached updates from the page cache to the store files.

    CALL dbms.checkpoint()
  4. In your terminal, navigate to the graph.db database ($neo4j_home/data/databases/graph.db) and run the following command to check the store size of the loaded nodes and properties.

    ls -alh
    ...
    -rw-r--r--   1 username  staff   1.4M 26 Nov 15:51 neostore.nodestore.db
    -rw-r--r--   1 username  staff   3.9M 26 Nov 15:51 neostore.propertystore.db
    ...

    The output reports that the node store (neostore.nodestore.db) and the property store (neostore.propertystore.db) occupy 1.4M and 3.9M, respectively.

  5. In Neo4j Browser, delete the nodes created above and run CALL dbms.checkpoint again to force a checkpoint.

    MATCH (n) DETACH DELETE n
    CALL dbms.checkpoint()
  6. Now, add just one node, force a checkpoint, and repeat step 4 to see if the store size has changed.

    CREATE (n:Person {name:"John"})
    CALL dbms.checkpoint()

    If you check the size of the node store and the property store now, they will still be 1.4M and 3.9M, even though the database only contains one node and one property. Neo4j does not shrink the store files on the hard drive.

In a production database, where numerous load/delete operations are performed, the result is a significant unused space occupied by store files.

2. Back up your 3.5 database

Navigate to the /bin folder, and run the following command to back up your database in the targeted folder. If the folder where you want to place your backup does not exist, you have to create it. In this example, it is called /tmp/3.5.24.

./neo4j-admin backup --backup-dir=/tmp/3.5.24 --name=graphdbbackup

For details on performing a backup and the different command options, see Operations Manual → Perform a backup.

3. Copy your 3.5 database backup to 4.x Neo4j cluster

You can use the neo4j-admin copy command to reclaim the unused space and create a defragmented copy of your database backup in your 4.x cluster.

To speed up the copy operation, you can use the --from-pagecache and --to-pagecache options to specify how much cache to be allocated when reading the source and writing the destination. As a rule of thumb, --to-pagecache should be around 1-2GB, since it mostly does sequential writes. The --from-pagecache should then be assigned whatever memory you can spare, since Neo4j does random reads from the source.

  1. On each cluster member, navigate to the /bin folder and run the following command to create a compacted store copy of your 3.5 database backup. Any inconsistent nodes, properties, and relationships will not be copied over to the newly created store.

    ./neo4j-admin copy --from-path=/private/tmp/3.5.24/graphdbbackup --to-database=compactdb
    Selecting JVM - Version:11.0.6+8-LTS, Name:Java HotSpot(TM) 64-Bit Server VM, Vendor:Oracle Corporation
    Starting to copy store, output will be saved to: /Users/renetapopova/neo4j/cc-4.4.0/core1/logs/neo4j-admin-copy-2022-02-07.11.13.05.log
    2022-02-07 11:13:06.920+0000 INFO  [StoreCopy] ### Copy Data ###
    2022-02-07 11:13:06.923+0000 INFO  [StoreCopy] Source: /private/tmp/3.5.24/graphdbbackup (page cache 8m)
    2022-02-07 11:13:06.924+0000 INFO  [StoreCopy] Target: /Users/renetapopova/neo4j/cc-4.4.0/core1/data/databases/compactdb
    2022-02-07 11:13:06.924+0000 INFO  [StoreCopy] Empty database created, will start importing readable data from the source.
    2022-02-07 11:13:09.911+0000 INFO  [o.n.i.b.ImportLogic] Import starting
    
    Import starting 2022-02-07 11:13:09.963+0000
      Estimated number of nodes: 50.00 k
      Estimated number of node properties: 50.00 k
      Estimated number of relationships: 0.00
      Estimated number of relationship properties: 50.00 k
      Estimated disk space usage: 2.680MiB
      Estimated required memory usage: 36.71MiB
    
    (1/4) Node import 2022-02-07 11:13:11.069+0000
      Estimated number of nodes: 50.00 k
      Estimated disk space usage: 1.698MiB
      Estimated required memory usage: 36.71MiB
    .......... .......... .......... .......... ..........   5% ∆236ms
    .......... .......... .......... .......... ..........  10% ∆24ms
    .......... .......... .......... .......... ..........  15% ∆3ms
    .......... .......... .......... .......... ..........  20% ∆2ms
    .......... .......... .......... .......... ..........  25% ∆1ms
    .......... .......... .......... .......... ..........  30% ∆0ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆3ms
    .......... .......... .......... .......... ..........  45% ∆2ms
    .......... .......... .......... .......... ..........  50% ∆1ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... .........-  60% ∆77ms
    .......... .......... .......... .......... ..........  65% ∆2ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆1ms
    .......... .......... .......... .......... ..........  80% ∆0ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆0ms
    .......... .......... .......... .......... ..........  95% ∆0ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    Node import COMPLETED in 458ms
    
    (2/4) Relationship import 2022-02-07 11:13:11.528+0000
      Estimated number of relationships: 0.00
      Estimated disk space usage: 1006KiB
      Estimated required memory usage: 43.90MiB
    Relationship import COMPLETED in 571ms
    
    (3/4) Relationship linking 2022-02-07 11:13:12.100+0000
      Estimated required memory usage: 36.08MiB
    Relationship linking COMPLETED in 645ms
    
    (4/4) Post processing 2022-02-07 11:13:12.745+0000
      Estimated required memory usage: 36.08MiB
    -......... .......... .......... .......... ..........   5% ∆717ms
    .......... .......... .......... .......... ..........  10% ∆1ms
    .......... .......... .......... .......... ..........  15% ∆0ms
    .......... .......... .......... .......... ..........  20% ∆1ms
    .......... .......... .......... .......... ..........  25% ∆1ms
    .......... .......... .......... .......... ..........  30% ∆0ms
    .......... .......... .......... .......... ..........  35% ∆0ms
    .......... .......... .......... .......... ..........  40% ∆0ms
    .......... .......... .......... .......... ..........  45% ∆0ms
    .......... .......... .......... .......... ..........  50% ∆1ms
    .......... .......... .......... .......... ..........  55% ∆0ms
    .......... .......... .......... .......... ..........  60% ∆0ms
    .......... .......... .......... .......... ..........  65% ∆0ms
    .......... .......... .......... .......... ..........  70% ∆0ms
    .......... .......... .......... .......... ..........  75% ∆0ms
    .......... .......... .......... .......... ..........  80% ∆1ms
    .......... .......... .......... .......... ..........  85% ∆0ms
    .......... .......... .......... .......... ..........  90% ∆0ms
    .......... .......... .......... .......... ..........  95% ∆0ms
    .......... .......... .......... .......... .......... 100% ∆0ms
    
    Post processing COMPLETED in 1s 781ms
    
    
    IMPORT DONE in 4s 606ms.
    Imported:
      1 nodes
      0 relationships
      1 properties
    Peak memory usage: 43.90MiB
    2022-02-07 11:13:14.527+0000 INFO  [o.n.i.b.ImportLogic] Import completed successfully, took 4s 606ms. Imported:
      1 nodes
      0 relationships
      1 properties
    2022-02-07 11:13:15.484+0000 INFO  [StoreCopy] Import summary: Copying of 100704 records took 8 seconds (12588 rec/s). Unused Records 100703 (99%) Removed Records 0 (0%)
    2022-02-07 11:13:15.485+0000 INFO  [StoreCopy] ### Extracting schema ###
    2022-02-07 11:13:15.485+0000 INFO  [StoreCopy] Trying to extract schema...
    2022-02-07 11:13:15.606+0000 INFO  [StoreCopy] ... found 1 readable schema definitions. The following can be used to recreate the schema:
    2022-02-07 11:13:15.606+0000 INFO  [StoreCopy]
    
    CREATE BTREE INDEX `index_5c0607ad` FOR (n:`Person`) ON (n.`name`) OPTIONS {indexProvider: 'native-btree-1.0', indexConfig: {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0], `spatial.cartesian.min`: [-1000000.0, -1000000.0], `spatial.wgs-84.min`: [-180.0, -90.0], `spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0], `spatial.cartesian.max`: [1000000.0, 1000000.0], `spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0], `spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0], `spatial.wgs-84.max`: [180.0, 90.0]}}
    2022-02-07 11:13:15.606+0000 INFO  [StoreCopy] You have to manually apply the above commands to the database when it is started to recreate the indexes and constraints. The commands are saved to /Users/renetapopova/neo4j/cc-4.4.0/core1/logs/neo4j-admin-copy-2022-02-07.11.13.05.log as well for reference.
  2. On each cluster member, run the following command to verify that database has been successfully copied.

    ls -al ../data/databases
    total 0
    drwxr-xr-x@  6 renetapopova  staff   192 Feb  7 11:11 .
    drwxr-xr-x@  8 renetapopova  staff   256 Feb  7 10:36 ..
    drwxr-xr-x  34 renetapopova  staff  1088 Feb  7 11:12 compactdb
    drwxr-xr-x  38 renetapopova  staff  1216 Feb  7 10:39 neo4j
    -rw-r--r--   1 renetapopova  staff     0 Feb  7 10:36 store_lock
    drwxr-xr-x  39 renetapopova  staff  1248 Feb  7 10:39 system

    Copying a database does not automatically create it. Therefore, it will not be visible if you do SHOW DATABASES in Cypher Shell or Neo4j Browser.

4. Create your compacted backup on one of the cluster members

You create the database copy only on one of the cluster members using the command CREATE DATABASE. The command is automatically routed to the leader, and from there, to the other cluster members.

  1. On one of the cluster members, navigate to the /bin folder and run the following command to log in to the Cypher Shell command-line console:

    ./cypher-shell -u neo4j -p password
  2. Change the active database to system:

    :USE system;
  3. Create the compactdb database:

    CREATE DATABASE compactdb;
    0 rows available after 145 ms, consumed after another 0 ms
  4. Verify that the compactdb database is online.

    SHOW DATABASES;
    +----------------------------------------------------------------------------------------------------------------------------------+
    | name        | aliases | access       | address          | role       | requestedStatus | currentStatus | error | default | home  |
    +----------------------------------------------------------------------------------------------------------------------------------+
    | "compactdb" | []      | "read-write" | "localhost:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "compactdb" | []      | "read-write" | "localhost:7688" | "leader"   | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "compactdb" | []      | "read-write" | "localhost:7689" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "neo4j"     | []      | "read-write" | "localhost:7687" | "leader"   | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "neo4j"     | []      | "read-write" | "localhost:7688" | "follower" | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "neo4j"     | []      | "read-write" | "localhost:7689" | "follower" | "online"        | "online"      | ""    | TRUE    | TRUE  |
    | "system"    | []      | "read-write" | "localhost:7687" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system"    | []      | "read-write" | "localhost:7688" | "follower" | "online"        | "online"      | ""    | FALSE   | FALSE |
    | "system"    | []      | "read-write" | "localhost:7689" | "leader"   | "online"        | "online"      | ""    | FALSE   | FALSE |
    +----------------------------------------------------------------------------------------------------------------------------------+
    
    9 rows
    ready to start consuming query after 21 ms, results consumed after another 29 ms
  5. Change your active database to compactdb and recreate the schema using the output from the neo4j-admin copy command.

    CREATE BTREE INDEX `index_5c0607ad` FOR (n:`Person`) ON (n.`name`) OPTIONS {indexProvider: 'native-btree-1.0', indexConfig: {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0], `spatial.cartesian.min`: [-1000000.0, -1000000.0], `spatial.wgs-84.min`: [-180.0, -90.0], `spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0], `spatial.cartesian.max`: [1000000.0, 1000000.0], `spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0], `spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0], `spatial.wgs-84.max`: [180.0, 90.0]}};
    0 rows
    ready to start consuming query after 95 ms, results consumed after another 0 ms
    Added 1 indexes
  6. On each cluster member, log in to the Cypher Shell command-line console, change the active database to compactdb, and verify that the index has been successfully created:

    CALL db.indexes;
    +----------------------------------------------------------------------------------------------------------------------------------------------+
    | id | name             | state    | populationPercent | uniqueness  | type     | entityType | labelsOrTypes | properties | provider           |
    +----------------------------------------------------------------------------------------------------------------------------------------------+
    | 1  | "index_343aff4e" | "ONLINE" | 100.0             | "NONUNIQUE" | "LOOKUP" | "NODE"     | []            | []         | "token-lookup-1.0" |
    | 2  | "index_5c0607ad" | "ONLINE" | 100.0             | "NONUNIQUE" | "BTREE"  | "NODE"     | ["Person"]    | ["name"]   | "native-btree-1.0" |
    +----------------------------------------------------------------------------------------------------------------------------------------------+
    
    2 rows
    ready to start consuming query after 31 ms, results consumed after another 5 ms
  7. Verify that all the data has been successfully copied. In this example, there should be one node.

    MATCH (n) RETURN n.name;
    +--------+
    | n.name |
    +--------+
    | "John" |
    +--------+
    
    1 row available after 106 ms, consumed after another 2 ms
  8. Exit the Cypher Shell command-line console.

    :exit;
    
    Bye!

    You can now compare the store size with the size of the backed up database.

  9. On one of the cluster members, navigate to the compactdb database ($core1_home/data/databases/compactdb) and check the store size of the copied nodes and properties.

    ls -alh
    ...
    -rw-r--r--   1 username  staff   736K Feb  7 16:00 neostore.nodestore.db
    -rw-r--r--   1 username  staff    16K Feb  7 16:00 neostore.propertystore.db
    ...

    The output reports that the node store and the property store now occupy only 736K and 16K respectively, compared to the previous 1.4M and 3.9M.


1. The calculations are based on MB/s = (IOPS * B) ÷ 10^6, where B is the block size in bytes; in the case of Neo4j, this is 8000. GB/hour can then be calculated from (MB/s * 3600) ÷ 1000.