9.1. Backup planning

This section explains how to prepare for backing up a Neo4j deployment.

This section includes:

9.1.1. Introduction

Designing an appropriate backup strategy for your Neo4j DBMS is a fundamental part of operations. The backup strategy should take into account elements such as:

  • Demands on performance during backup actions.
  • Tolerance for data loss in case of failure.
  • Tolerance for downtime in case of failure.
  • Data volumes.

The backup strategy will answer question such as:

  • What type of backup method used; online or offline backups?
  • What physical setup meets our demands?
  • What backup media — offline or remote storage, cloud storage etc. — should we use?
  • How long do we archive backups for?
  • With what frequency should we perform backups;

    If using online backups:

    • How often should we perform full backups?
    • How often should we perform incremental backups?
  • How do we test recovery routines, and how often?

9.1.2. Online and offline backups

Online backups are typically required for production environments, but it is also possible to perform offline backups.

Offline backups are a more limited method for backing up a database. For example:

  • Online backups run against a live Neo4j instance, while offline backups require that the database is shut down.
  • Online backups can be full or incremental, but there is no support for backing up incrementally with offline backups.

For more details about offline backups, see Section 15.7, “Dump and load databases”.

The remainder of this chapter is dedicated to describing online backups.

9.1.3. Server configuration

The table below lists the basic server parameters relevant to backups. Note that by default the backup service is enabled but only listens on localhost (127.0.0.1) and this needs to be changed if backups are to be taken from another machine.

Table 9.1. Server parameters for backups
Parameter name Default value Description

dbms.backup.enabled

true

Enable support for running online backups.

dbms.backup.listen_address

127.0.0.1:6362

Listening server for online backups.

9.1.4. Databases to backup

Since a Neo4j DBMS can host multiple databases and they are backed up independently of one another, it is important to plan a backup strategy for every database and to not forget any databases. In a new deployment there are two databases by default, neo4j and system. The system database contains configuration, e.g. operational states of databases, security configuration, etc.

9.1.5. Storage considerations

For any backup it is important that you store your data separately from the production system, where there are no common dependencies, and preferably off-site. If you are running Neo4j in the cloud, you could for example use a different availability zone or even a separate cloud provider.

Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.

You may also want to override the settings used for pruning and rotation of transaction log files. The transaction log files are files that keep track of recent changes. Please note that removing transaction logs manually can result in a broken backup.

Recovered servers do not need all of the transaction log files that have already been applied, so it is possible to reduce storage size even further by reducing the size of the files to the bare minimum.

This can be done by setting dbms.tx_log.rotation.size=1M and dbms.tx_log.rotation.retention_policy=3 files. Alternatively you can use the --additional-config override.

9.1.6. Cluster considerations

In a cluster it is possible to take a backup from any server, and each server has two configurable ports capable of serving a backup. These ports are configured by dbms.backup.listen.address and causal_clustering.transaction_listen_address respectively. Functionally they are equivalent for backups, but separating them can allow some operational flexibility, while using just a single port can simplify the configuration.

It is generally recommended to select Read Replicas to act as backup servers, since they are more numerous than Core Servers in typical cluster deployments. Furthermore, the possibility of performance issues on a Read Replica, caused by a large backup, will not affect the performance or redundancy of the Core Cluster. If a Read Replica is not available, then a Core can be picked based on factors such as its physical proximity, bandwidth, performance, and liveness.

Note that both Read Replicas and Cores can fall behind the leader and be out-of-date. We can look at transaction IDs in order to avoid taking a backup from a server that has lagged too far behind. The latest transaction ID can be found by exposing Neo4j metrics or via Neo4j Browser. To view the latest processed transaction ID (and other metrics) in Neo4j Browser, type :sysinfo at the prompt.

9.1.7. Using SSL/TLS for backups

The backup server can be configured to require SSL/TLS. If that is the case then the backup client must also be configured to use it with a compatible policy. Refer to SSL framework to learn how SSL is configured in general. See below table for more details about how configured SSL policies map to the configured ports.

Table 9.2. Mapping backup configuraton to SSL policies
Backup target address on database server SSL policy setting on database server SSL policy setting on backup client Default port

dbms.backup.listen_address

dbms.ssl.policy.backup

dbms.ssl.policy.backup

6362

causal_clustering.transaction_listen_address

dbms.ssl.policy.cluster

dbms.ssl.policy.backup

6000

9.1.8. Additional files to back up

The files listed below are not included in online nor offline backups. Make sure to back them up separately.

  • If you have a cluster, it may be relevant to back up neo4j.conf on each server.
  • Back up all the files used for SSL/TLS, i.e. private keys, public certificates, and the contents of the trusted and revoked directories. The locations of these are described in Section 12.2, “SSL framework”. If you have a cluster, you should back up these files on each server in the cluster.