Tutorial: Back up and copy a single database in a running cluster
This tutorial provides a detailed example of how to back up a single database, in this example version 3.5, and use the
neo4j-admin copy
command to copy it into a running 4.x Neo4j cluster.
The neo4j-admin copy
command can be used to clean up database inconsistencies, compact stores, and upgrade/migrate a database (from Community or Enterprise) to a later version of Neo4j Enterprise edition.
Since the neo4j-admin copy
command does not copy the schema store, the intermediary steps of the sequential path are not needed.
If a schema is defined, the commands that the neo4j-admin copy
operation outputs can be used to create the new schema.
Keep in mind that:
Therefore, if you want to preserve your relationships IDs, or to upgrade the whole DBMS, you should follow the sequential path. |
It is important to note that Estimations for how long the
For example, if your disc manufacturer has provided a maximum of 5000 IOPS, you can reasonably expect up to 5000 such page operations a second.
Therefore, the maximal theoretical throughput you can expect is 40MB/s (or 144 GB/hour) on that disc.
You may then assume that the best-case scenario for running However, it is important to remember that the process must read 144 GB from the source database, and must also write to the target store (assuming the target store is of comparable size).
Additionally, there are internal processes during the copy that will read/modify/write the store multiple times.
Therefore, with an additional 144 GB of both read and write, the best-case scenario for running Finally, it is also important to consider that in almost all Cloud environments, the published IOPS value may not be the same as the actual value, or be able to continuously maintain the maximum possible IOPS. The real processing time for this example could be well above that estimation of 3 hours. |
This tutorial walks through the basics of checking your database store usage, in this example version 3.5, performing a backup, compacting the database backup (using neo4j-admin copy
), and creating it in a running Neo4j 4.x cluster.
Check your 3.5 database store usage
Before you back up and copy your 3.5 database, let’s look at the database store usage and see how it changes when you load, delete, and then reload data.
-
Log in to Neo4j Browser of your running 3.5 Neo4j standalone instance and add 100k nodes to the
graph.db
database using the following command:FOREACH (x IN RANGE (1,100000) | CREATE (n:Person {name:x}))
-
Create an index on the
name
property of thePerson
node:CREATE INDEX ON :Person(name)
-
Use the
dbms.checkpoint()
procedure to flush all cached updates from the page cache to the store files.CALL dbms.checkpoint()
-
In your terminal, navigate to the
graph.db
database ($neo4j_home/data/databases/graph.db) and run the following command to check the store size of the loaded nodes and properties.ls -alh
... -rw-r--r-- 1 username staff 1.4M 26 Nov 15:51 neostore.nodestore.db -rw-r--r-- 1 username staff 3.9M 26 Nov 15:51 neostore.propertystore.db ...
The output reports that the node store (neostore.nodestore.db) and the property store (neostore.propertystore.db) occupy
1.4M
and3.9M
, respectively. -
In Neo4j Browser, delete the nodes created above and run
CALL dbms.checkpoint
again to force a checkpoint.MATCH (n) DETACH DELETE n
CALL dbms.checkpoint()
-
Now, add just one node, force a checkpoint, and repeat step 4 to see if the store size has changed.
CREATE (n:Person {name:"John"})
CALL dbms.checkpoint()
If you check the size of the node store and the property store now, they will still be
1.4M
and3.9M
, even though the database only contains one node and one property. Neo4j does not shrink the store files on the hard drive.
In a production database, where numerous load/delete operations are performed, the result is a significant unused space occupied by store files. |
Back up your 3.5 database
Navigate to the /bin folder, and run the following command to back up your database in the targeted folder. If the folder where you want to place your backup does not exist, you have to create it. In this example, it is called /tmp/3.5.24.
./neo4j-admin backup --backup-dir=/tmp/3.5.24 --name=graphdbbackup
For details on performing a backup and the different command options, see Operations Manual → Perform a backup.
Copy your 3.5 database backup to 4.x Neo4j cluster
You can use the neo4j-admin copy
command to reclaim the unused space and create a defragmented copy of your database backup in your 4.x cluster.
To speed up the copy operation, you can use the |
-
On each cluster member, navigate to the /bin folder and run the following command to create a compacted store copy of your 3.5 database backup. Any inconsistent nodes, properties, and relationships will not be copied over to the newly created store.
./neo4j-admin copy --from-path=/private/tmp/3.5.24/graphdbbackup --to-database=compactdb
Selecting JVM - Version:11.0.6+8-LTS, Name:Java HotSpot(TM) 64-Bit Server VM, Vendor:Oracle Corporation Starting to copy store, output will be saved to: /Users/renetapopova/neo4j/cc-4.4.0/core1/logs/neo4j-admin-copy-2022-02-07.11.13.05.log 2022-02-07 11:13:06.920+0000 INFO [StoreCopy] ### Copy Data ### 2022-02-07 11:13:06.923+0000 INFO [StoreCopy] Source: /private/tmp/3.5.24/graphdbbackup (page cache 8m) 2022-02-07 11:13:06.924+0000 INFO [StoreCopy] Target: /Users/renetapopova/neo4j/cc-4.4.0/core1/data/databases/compactdb 2022-02-07 11:13:06.924+0000 INFO [StoreCopy] Empty database created, will start importing readable data from the source. 2022-02-07 11:13:09.911+0000 INFO [o.n.i.b.ImportLogic] Import starting Import starting 2022-02-07 11:13:09.963+0000 Estimated number of nodes: 50.00 k Estimated number of node properties: 50.00 k Estimated number of relationships: 0.00 Estimated number of relationship properties: 50.00 k Estimated disk space usage: 2.680MiB Estimated required memory usage: 36.71MiB (1/4) Node import 2022-02-07 11:13:11.069+0000 Estimated number of nodes: 50.00 k Estimated disk space usage: 1.698MiB Estimated required memory usage: 36.71MiB .......... .......... .......... .......... .......... 5% ∆236ms .......... .......... .......... .......... .......... 10% ∆24ms .......... .......... .......... .......... .......... 15% ∆3ms .......... .......... .......... .......... .......... 20% ∆2ms .......... .......... .......... .......... .......... 25% ∆1ms .......... .......... .......... .......... .......... 30% ∆0ms .......... .......... .......... .......... .......... 35% ∆0ms .......... .......... .......... .......... .......... 40% ∆3ms .......... .......... .......... .......... .......... 45% ∆2ms .......... .......... .......... .......... .......... 50% ∆1ms .......... .......... .......... .......... .......... 55% ∆0ms .......... .......... .......... .......... .........- 60% ∆77ms .......... .......... .......... .......... .......... 65% ∆2ms .......... .......... .......... .......... .......... 70% ∆0ms .......... .......... .......... .......... .......... 75% ∆1ms .......... .......... .......... .......... .......... 80% ∆0ms .......... .......... .......... .......... .......... 85% ∆0ms .......... .......... .......... .......... .......... 90% ∆0ms .......... .......... .......... .......... .......... 95% ∆0ms .......... .......... .......... .......... .......... 100% ∆0ms Node import COMPLETED in 458ms (2/4) Relationship import 2022-02-07 11:13:11.528+0000 Estimated number of relationships: 0.00 Estimated disk space usage: 1006KiB Estimated required memory usage: 43.90MiB Relationship import COMPLETED in 571ms (3/4) Relationship linking 2022-02-07 11:13:12.100+0000 Estimated required memory usage: 36.08MiB Relationship linking COMPLETED in 645ms (4/4) Post processing 2022-02-07 11:13:12.745+0000 Estimated required memory usage: 36.08MiB -......... .......... .......... .......... .......... 5% ∆717ms .......... .......... .......... .......... .......... 10% ∆1ms .......... .......... .......... .......... .......... 15% ∆0ms .......... .......... .......... .......... .......... 20% ∆1ms .......... .......... .......... .......... .......... 25% ∆1ms .......... .......... .......... .......... .......... 30% ∆0ms .......... .......... .......... .......... .......... 35% ∆0ms .......... .......... .......... .......... .......... 40% ∆0ms .......... .......... .......... .......... .......... 45% ∆0ms .......... .......... .......... .......... .......... 50% ∆1ms .......... .......... .......... .......... .......... 55% ∆0ms .......... .......... .......... .......... .......... 60% ∆0ms .......... .......... .......... .......... .......... 65% ∆0ms .......... .......... .......... .......... .......... 70% ∆0ms .......... .......... .......... .......... .......... 75% ∆0ms .......... .......... .......... .......... .......... 80% ∆1ms .......... .......... .......... .......... .......... 85% ∆0ms .......... .......... .......... .......... .......... 90% ∆0ms .......... .......... .......... .......... .......... 95% ∆0ms .......... .......... .......... .......... .......... 100% ∆0ms Post processing COMPLETED in 1s 781ms IMPORT DONE in 4s 606ms. Imported: 1 nodes 0 relationships 1 properties Peak memory usage: 43.90MiB 2022-02-07 11:13:14.527+0000 INFO [o.n.i.b.ImportLogic] Import completed successfully, took 4s 606ms. Imported: 1 nodes 0 relationships 1 properties 2022-02-07 11:13:15.484+0000 INFO [StoreCopy] Import summary: Copying of 100704 records took 8 seconds (12588 rec/s). Unused Records 100703 (99%) Removed Records 0 (0%) 2022-02-07 11:13:15.485+0000 INFO [StoreCopy] ### Extracting schema ### 2022-02-07 11:13:15.485+0000 INFO [StoreCopy] Trying to extract schema... 2022-02-07 11:13:15.606+0000 INFO [StoreCopy] ... found 1 readable schema definitions. The following can be used to recreate the schema: 2022-02-07 11:13:15.606+0000 INFO [StoreCopy] CREATE BTREE INDEX `index_5c0607ad` FOR (n:`Person`) ON (n.`name`) OPTIONS {indexProvider: 'native-btree-1.0', indexConfig: {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0], `spatial.cartesian.min`: [-1000000.0, -1000000.0], `spatial.wgs-84.min`: [-180.0, -90.0], `spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0], `spatial.cartesian.max`: [1000000.0, 1000000.0], `spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0], `spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0], `spatial.wgs-84.max`: [180.0, 90.0]}} 2022-02-07 11:13:15.606+0000 INFO [StoreCopy] You have to manually apply the above commands to the database when it is started to recreate the indexes and constraints. The commands are saved to /Users/renetapopova/neo4j/cc-4.4.0/core1/logs/neo4j-admin-copy-2022-02-07.11.13.05.log as well for reference.
-
On each cluster member, run the following command to verify that database has been successfully copied.
ls -al ../data/databases
total 0 drwxr-xr-x@ 6 renetapopova staff 192 Feb 7 11:11 . drwxr-xr-x@ 8 renetapopova staff 256 Feb 7 10:36 .. drwxr-xr-x 34 renetapopova staff 1088 Feb 7 11:12 compactdb drwxr-xr-x 38 renetapopova staff 1216 Feb 7 10:39 neo4j -rw-r--r-- 1 renetapopova staff 0 Feb 7 10:36 store_lock drwxr-xr-x 39 renetapopova staff 1248 Feb 7 10:39 system
Copying a database does not automatically create it. Therefore, it will not be visible if you do
SHOW DATABASES
in Cypher® Shell or Neo4j Browser.
Create your compacted backup on one of the cluster members
You create the database copy only on one of the cluster members using the command CREATE DATABASE
.
The command is automatically routed to the leader, and from there, to the other cluster members.
-
On one of the cluster members, navigate to the /bin folder and run the following command to log in to the Cypher Shell command-line console:
./cypher-shell -u neo4j -p password
-
Change the active database to
system
::USE system;
-
Create the
compactdb
database:CREATE DATABASE compactdb;
0 rows available after 145 ms, consumed after another 0 ms
-
Verify that the
compactdb
database is online.SHOW DATABASES;
+----------------------------------------------------------------------------------------------------------------------------------+ | name | aliases | access | address | role | requestedStatus | currentStatus | error | default | home | +----------------------------------------------------------------------------------------------------------------------------------+ | "compactdb" | [] | "read-write" | "localhost:7687" | "follower" | "online" | "online" | "" | FALSE | FALSE | | "compactdb" | [] | "read-write" | "localhost:7688" | "leader" | "online" | "online" | "" | FALSE | FALSE | | "compactdb" | [] | "read-write" | "localhost:7689" | "follower" | "online" | "online" | "" | FALSE | FALSE | | "neo4j" | [] | "read-write" | "localhost:7687" | "leader" | "online" | "online" | "" | TRUE | TRUE | | "neo4j" | [] | "read-write" | "localhost:7688" | "follower" | "online" | "online" | "" | TRUE | TRUE | | "neo4j" | [] | "read-write" | "localhost:7689" | "follower" | "online" | "online" | "" | TRUE | TRUE | | "system" | [] | "read-write" | "localhost:7687" | "follower" | "online" | "online" | "" | FALSE | FALSE | | "system" | [] | "read-write" | "localhost:7688" | "follower" | "online" | "online" | "" | FALSE | FALSE | | "system" | [] | "read-write" | "localhost:7689" | "leader" | "online" | "online" | "" | FALSE | FALSE | +----------------------------------------------------------------------------------------------------------------------------------+ 9 rows ready to start consuming query after 21 ms, results consumed after another 29 ms
-
On one of the cluster members, change your active database to
compactdb
and recreate the schema using the output from theneo4j-admin copy
command.CREATE BTREE INDEX `index_5c0607ad` FOR (n:`Person`) ON (n.`name`) OPTIONS {indexProvider: 'native-btree-1.0', indexConfig: {`spatial.cartesian-3d.min`: [-1000000.0, -1000000.0, -1000000.0], `spatial.cartesian.min`: [-1000000.0, -1000000.0], `spatial.wgs-84.min`: [-180.0, -90.0], `spatial.cartesian-3d.max`: [1000000.0, 1000000.0, 1000000.0], `spatial.cartesian.max`: [1000000.0, 1000000.0], `spatial.wgs-84-3d.min`: [-180.0, -90.0, -1000000.0], `spatial.wgs-84-3d.max`: [180.0, 90.0, 1000000.0], `spatial.wgs-84.max`: [180.0, 90.0]}};
0 rows ready to start consuming query after 95 ms, results consumed after another 0 ms Added 1 indexes
-
On each cluster member, log in to the Cypher Shell command-line console, change the active database to
compactdb
, and verify that the index has been successfully created:CALL db.indexes;
+----------------------------------------------------------------------------------------------------------------------------------------------+ | id | name | state | populationPercent | uniqueness | type | entityType | labelsOrTypes | properties | provider | +----------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | "index_343aff4e" | "ONLINE" | 100.0 | "NONUNIQUE" | "LOOKUP" | "NODE" | [] | [] | "token-lookup-1.0" | | 2 | "index_5c0607ad" | "ONLINE" | 100.0 | "NONUNIQUE" | "BTREE" | "NODE" | ["Person"] | ["name"] | "native-btree-1.0" | +----------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows ready to start consuming query after 31 ms, results consumed after another 5 ms
-
Verify that all the data has been successfully copied. In this example, there should be one node.
MATCH (n) RETURN n.name;
+--------+ | n.name | +--------+ | "John" | +--------+ 1 row available after 106 ms, consumed after another 2 ms
-
Exit the Cypher Shell command-line console.
:exit; Bye!
You can now compare the store size with the size of the backed up database.
-
On one of the cluster members, navigate to the
compactdb
database ($core1_home/data/databases/compactdb) and check the store size of the copied nodes and properties.ls -alh
... -rw-r--r-- 1 username staff 736K Feb 7 16:00 neostore.nodestore.db -rw-r--r-- 1 username staff 16K Feb 7 16:00 neostore.propertystore.db ...
The output reports that the node store and the property store now occupy only
736K
and16K
respectively, compared to the previous1.4M
and3.9M
.
MB/s = (IOPS * B) ÷ 10^6
, where B
is the block size in bytes; in the case of Neo4j, this is 8000
. GB/hour can then be calculated from (MB/s * 3600) ÷ 1000
.