This section introduces how to prepare for backing up a Neo4j database.
Backing up your Neo4j database to remote or offline storage is a fundamental part of database operations. Your backup strategy should be designed to take your specific requirements into account. Requirements may include demands on performance, and tolerance for down-time and data loss in case of network or hardware failure. If running a Neo4j Causal Cluster, the backup strategy needs to consider the overall cluster design.
Online backups are typically required for production environments, but it is also possible to perform offline backups.
Offline backups are a more limited method for backing up a database. For example:
For more details about offline backups, see Section 10.3, “Dump and load databases”.
For any backup it is important that you store your data separately from the production system, where there are no common dependencies.
It is advisable to keep the backup on stable storage outside of the cluster servers, on different (network attached) storage, and preferably off-site. In the cloud, for example, use a different availability zone within the same cloud, or use a separate cloud.
Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.
When the backup program is started, it will start up a new Java process. If there is a running Neo4j database, this will run in parallel to the Neo4j process. On a production system, Neo4j is typically configured to take maximum advantage of the system’s available RAM. If you run backups on a production system, the overall performance can be negatively affected, and in extreme cases can cause failure with an out-of-memory error. It is therefore strongly recommended to run backups from a different server than the production server.
If running a Causal Cluster, you also have the option of running backups on a Read Replica. For an in-depth discussion about this, refer to Section 4.2.5, “Backup planning for a Causal Cluster”.
If it is not possible to use a separate backup server, you can control the impact on the production system by explicitly defining how much memory to allocate to the backup process:
HEAP_SIZEbefore starting the backup program. If not specified by
HEAP_SIZE, the Java Virtual Machine will choose a value based on server resources.
HEAP_SIZEconfigures the maximum heap size allocated for the backup process.
--pagecacheoption to the
neo4j-admin backupcommand. If not explicitly defined, the page cache will default to 8MB.
It is strongly recommended to run backups from a different server than the production server.