6.1. Plan a backup

This section introduces how to prepare for backing up a Neo4j database.

Backing up your Neo4j database to remote or offline storage is a fundamental part of database operations. Your backup strategy should be designed to take your specific requirements into account. Requirements may include demands on performance, and tolerance for down-time and data loss in case of network or hardware failure. If running a Neo4j Causal Cluster, the backup strategy needs to consider the overall cluster design.

6.1.1. Online and offline backups

Online backups are typically required for production environments, but it is also possible to perform offline backups.

Offline backups are a more limited method for backing up a database. For example:

  • Online backups run against a live Neo4j instance, while offline backups require that the database is shut down.
  • Online backups can be full or incremental, but there is no support for backing up incrementally with offline backups.

For more details about offline backups, see Section 10.3, “Dump and load databases”.

6.1.2. Storage considerations

For any backup it is important that you store your data separately from the production system, where there are no common dependencies.

It is advisable to keep the backup on stable storage outside of the cluster servers, on different (network attached) storage, and preferably off-site. In the cloud, for example, use a different availability zone within the same cloud, or use a separate cloud.

Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.

6.1.3. Memory considerations

When the backup program is started, it will start up a new Java process. If there is a running Neo4j database, this will run in parallel to the Neo4j process. On a production system, Neo4j is typically configured to take maximum advantage of the system’s available RAM. If you run backups on a production system, the overall performance can be negatively affected, and in extreme cases can cause failure with an out-of-memory error. It is therefore strongly recommended to run backups from a different server than the production server.

If running a Causal Cluster, you also have the option of running backups on a Read Replica. For an in-depth discussion about this, refer to Section 4.2.5, “Backup planning for a Causal Cluster”.

If it is not possible to use a separate backup server, you can control the impact on the production system by explicitly defining how much memory to allocate to the backup process:

Configure heap size for the backup
This is done by setting the environment variable HEAP_SIZE before starting the backup program. If not specified by HEAP_SIZE, the Java Virtual Machine will choose a value based on server resources. HEAP_SIZE configures the maximum heap size allocated for the backup process.
Configure page cache for the backup
The page cache size can be determined for the backup program by using the --pagecache option to the neo4j-admin backup command. If not explicitly defined, the page cache will default to 8MB.

It is strongly recommended to run backups from a different server than the production server.