This section explains how to prepare for backing up a Neo4j deployment.
This section includes:
Designing an appropriate backup strategy for your Neo4j DBMS is a fundamental part of operations. The backup strategy should take into account elements such as:
The backup strategy will answer question such as:
With what frequency should we perform backups;
If using online backups:
Online backups are typically required for production environments, but it is also possible to perform offline backups.
Offline backups are a more limited method for backing up a database. For example:
For more details about offline backups, see Section 14.7, “Dump and load databases”.
The remainder of this chapter is dedicated to describing online backups.
The table below lists the basic server parameters relevant to backups. Note that by default the backup service is enabled but only listens on localhost (127.0.0.1) and this needs to be changed if backups are to be taken from another machine.
Since a Neo4j DBMS can host multiple databases and they are backed up independently of one another, it is important to plan
a backup strategy for every database and to not forget any databases.
In a new deployment there are two databases by default,
The system database contains configuration, e.g. operational states of databases, security configuration, etc.
For any backup it is important that you store your data separately from the production system, where there are no common dependencies, and preferably off-site. If you are running Neo4j in the cloud, you could for example use a different availability zone or even a separate cloud provider.
Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.
You may also want to override the settings used for pruning and rotation of transaction log files. The transaction log files are files that keep track of recent changes. Please note that removing transaction logs manually can result in a broken backup.
Recovered servers do not need all of the transaction log files that have already been applied, so it is possible to reduce storage size even further by reducing the size of the files to the bare minimum.
This can be done by setting
Alternatively you can use the
In a cluster it is possible to take a backup from any server, and each server has two configurable ports capable of serving
These ports are configured by
Functionally they are equivalent for backups, but separating them can allow some operational flexibility, while using just
a single port can simplify the configuration.
It is generally recommended to select Read Replicas to act as backup servers, since they are more numerous than Core Servers in typical cluster deployments. Furthermore, the possibility of performance issues on a Read Replica, caused by a large backup, will not affect the performance or redundancy of the Core Cluster. If a Read Replica is not available, then a Core can be picked based on factors such as its physical proximity, bandwidth, performance, and liveness.
Note that both Read Replicas and Cores can fall behind the leader and be out-of-date.
We can look at transaction IDs in order to avoid taking a backup from a server that has lagged too far behind.
The latest transaction ID can be found by exposing Neo4j metrics or via Neo4j Browser.
To view the latest processed transaction ID (and other metrics) in Neo4j Browser, type
:sysinfo at the prompt.
The backup server can be configured to require SSL/TLS. If that is the case then the backup client must also be configured to use it with a compatible policy. Refer to SSL framework to learn how SSL is configured in general. See below table for more details about how configured SSL policies map to the configured ports.
|Backup target address on database server||SSL policy setting on database server||SSL policy setting on backup client||Default port|
The files listed below are not included in online nor offline backups. Make sure to back them up separately.