This section introduces backing up a Neo4j database.
Backing up your Neo4j database to remote or offline storage is a fundamental part of database operations. The backup strategy should be designed while taking your specific requirements into account. Requirements may include demands on performance and tolerance for down-time and data loss in case of network or hardware failure. If running a Neo4j Causal Cluster, the backup strategy needs to take the overall cluster design into account.
This chapter is dedicated to online backups, as this is what is most commonly needed in production environments. It is also possible to perform offline backups, which is a more limited method for backing up a database. Online backups run against a live Neo4j instance. Offline backups require that the database is shut down. Online backups can be full or incremental. With offline backups, there is no support for backing up incrementally. For details on offline backups, see Section 10.3, “Dump and load databases”.
Backing up and restoring databases are done by using commands with the Neo4j Admin tool as described in the following sections:
Backups are performed locally on the database server, or over the network, from a running Neo4j server. The resulting files are created locally, or on a network mounted directory, on the server performing the backup. Two parameters must be configured in order to perform backups.
For any backup it is important that the data is stored separately from the production system where there are no common dependencies. It is advisable to keep the backup on stable storage outside of the cluster servers, on different (network attached) storage, and preferably off site (for example to the cloud, a different availability zone within the same cloud, or a separate cloud). Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.
When the backup program is started, it will start up a new Java process. If there is a running Neo4j database, this will run in parallel to the Neo4j process. On a production system, Neo4j is typically configured to take maximum advantage of the system’s available RAM. If you run backups on a production system, the overall performance can be negatively affected, and in extreme cases can cause failure with an out-of-memory error. It is therefore strongly recommended to run backups from a different server than the production server.
If running a Causal Cluster, we also have the option of doing backups on a Read Replica. For an in-depth discussion about this, refer to Section 4.2.5, “Backup planning for a Causal Cluster”.
If it is not possible to use a separate backup server, we want to explicitly define how much memory to allocate to the backup process, in order to control the impact on the production system.
HEAP_SIZEbefore starting the backup program. If not specified by
HEAP_SIZE, the Java Virtual Machine will choose a value based on server resources.
--pagecacheoption to the
neo4j-admin backupcommand. If not explicitly defined, the page cache will default to 8MB.
It is strongly recommended to run backups from a different server than the production server.