Back up an online database

Remember to plan your backup carefully and to back up each of your databases, including the system database.

Command

A Neo4j database can be backed up in online mode using the backup command of neo4j-admin. The command must be invoked as the neo4j user to ensure the appropriate file permissions.

Neo4j v5.0 introduces a new version of the backup command which produces immutable backup artifacts (as opposed to mutable folders as in previous versions).

Backup artifact

The neo4j-admin database backup command produces one backup artifact file per database each time it is run. A backup artifact file is an immutable file containing the backup data of a given database along with some metadata like the database name and id, the backup time, the lowest/highest transaction id etc.

Backup artifacts can be of two types:

  1. a full backup containing the whole database store or

  2. a differential backup containing a log of transactions to apply to a database store contained in a full backup artifact.

Backup chain

The first time the backup command is run, a full backup artifact is produced for a given database. On the other hand, differential backup artifacts are produced by the subsequent runs.

A backup chain consists of a full backup optionally followed by a sequence of n contiguous differential backups.

backup chain
Figure 1. Backup chain

Usage

The neo4j-admin database backup command can be used for performing both full and differential backups of an online database. The command can be run both locally and remotely. However, it uses a significant amount of resources, such as memory and CPU. Therefore, it is recommended to perform the backup on a separate dedicated machine. The neo4j-admin database backup command also supports SSL/TLS. For more information, see Online backup configurations.

neo4j-admin database backup is not supported in Neo4j Aura.

Syntax

neo4j-admin database backup     --to-path=<path>
                                [--from=<host:port>[,<host:port>]...]
                                [--type=<type>]
                                [--compress=<true/false>]
                                [--keep-failed=<true/false>]
                                [--pagecache=<size>]
                                [--include-metadata=<all/users/roles>]
                                [--parallel-recovery=<true/false>]
                                [--inspect-path=<path>]
                                [--verbose]
                                [--expand-commands]
                                [--additional-config=<path>]
                                <database>

Options

Option Default Description

--to-path

Directory to place backup in.

--from

localhost:6362

Comma-separated list of host and port of Neo4j instances, each of which are tried in order.

--type

AUTO

Type of backup to perform. Possible values are: FULL, DIFF, AUTO.

--compress

true

Request backup artifact to be compressed. If disabled, backup artifact creation is faster but the size of the produced artifact is approximately equal to the size of backed-up database.

--keep-failed

false

Request failed backup to be preserved for further post-failure analysis. If enabled, a directory with the failed backup database is preserved.

--pagecache

8m

The size of the page cache to use for the backup process.

--include-metadata

Include metadata in the backup. Metadata contains security settings related to the database. Cannot be used for backing up the system database.

  • roles - commands to create the roles and privileges (for both database and graph) that affect the use of the database.

  • users - commands to create the users that can use the database and their role assignments.

  • all - include roles and users.

--parallel-recovery

false

Allow multiple threads to apply transactions to a backup in parallel. For some databases and workloads, this may reduce execution times significantly.

parallel-recovery is an experimental option. Consult Neo4j support before use.

--inspect-path

List and show the metadata of the backup artifact(s). Accepts a folder or a file.

--verbose

Enable verbose output.

--expand-commands

Allow command expansion in config value evaluation.

--additional-config

Configuration file to provide additional or override the existing configuration settings in the neo4j.conf file.

Parameters

Parameter Default Description

<database>

Name of the remote database to back up.

The value can contain * and ? for globbing, in which cases all matching databases are backed up.

With a single * as a value, you can back up all the databases of the DBMS.

Exit codes

Depending on whether the backup was successful or not, neo4j-admin database backup exits with different codes. The error codes include details of what error was encountered.

Table 1. Neo4j Admin backup exit codes when backing up one database
Code Description

0

Success.

1

Backup failed, or succeeded but encountered problems such as some servers being uncontactable. See logs for more details.

Table 2. Neo4j Admin backup exit codes when backing multiple databases
Code Description

0

All databases are backed up successfully.

1

One or several backups failed, or succeeded with problems.

Online backup configurations

Server configuration

The table below lists the basic server parameters relevant to backups. Note that by default, the backup service is enabled but only listens on localhost (127.0.0.1). This needs to be changed if backups are to be taken from another machine.

Table 3. Server parameters for backups
Parameter name Default value Description

server.backup.enabled

true

Enable support for running online backups.

server.backup.listen_address

127.0.0.1:6362

Listening server for online backups.

Memory configuration

The following options are available for configuring the memory allocated to the backup client:

  • Configure heap size for the backup::

HEAP_SIZE configures the maximum heap size allocated for the backup process. This is done by setting the environment variable HEAP_SIZE before starting the operation. If not specified, the Java Virtual Machine chooses a value based on the server resources.

  • Configure page cache for the backup::

The page cache size can be configured by using the --pagecache option of the neo4j-admin database backup command.

You should give the Neo4J page cache as much memory as possible, as long as it satisfies the following constraint:

Neo4J page cache + OS page cache < available RAM, where 2 to 4GB should be dedicated to the operating system’s page cache.

For example, if your current database has a Total mapped size of 128GB as per the debug.log, and you have enough free space (meaning you have left aside 2 to 4 GB for the OS), then you can set --pagecache to 128GB.

Computational resources configurations

Transaction log files

The transaction log files, which keep track of recent changes, are rotated and pruned based on a provided configuration. For example, setting db.tx_log.rotation.retention_policy=3 files keeps 3 transaction log files in the backup. Because recovered servers do not need all of the transaction log files that have already been applied, it is possible to further reduce storage size by reducing the size of the files to the bare minimum. This can be done by setting db.tx_log.rotation.size=1M and db.tx_log.rotation.retention_policy=3 files. You can use the --additional-config parameter to override the configurations in the neo4j.conf file.

Removing transaction logs manually can result in a broken backup.

Security configurations

Securing your backup network communication with an SSL policy and a firewall protects your data from unwanted intrusion and leakage. When using the neo4j-admin database backup command, you can configure the backup server to require SSL/TLS, and the backup client to use a compatible policy. For more information on how to configure SSL in Neo4j, see SSL framework.

The default backup port is 6362, configured with key server.backup.listen_address. The SSL configuration policy has the key of dbms.ssl.policy.backup.

As an example, add the following content to your neo4j.conf file:

dbms.ssl.policy.backup.enabled=true
dbms.ssl.policy.backup.tls_versions=TLSv1.2
dbms.ssl.policy.backup.ciphers=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384
dbms.ssl.policy.backup.client_auth=REQUIRE

For a detailed list of recommendations regarding security in Neo4j, see Security checklist.

It is very important to ensure that there is no external access to the port specified by the setting server.backup.listen_address. Failing to protect this port may leave a security hole open by which an unauthorized user can make a copy of the database onto a different machine. In production environments, external access to the backup port should be blocked by a firewall.

Cluster configurations

In a cluster topology, it is possible to take a backup from any server hosting the database to backup, and each server has two configurable ports capable of serving a backup. These ports are configured by server.backup.listen_address and server.cluster.listen_address respectively. Functionally, they are equivalent for backups, but separating them can allow some operational flexibility, while using just a single port can simplify the configuration. It is generally recommended to select secondary servers to act as backup servers, since they are more numerous than primary servers in typical cluster deployments. Furthermore, the possibility of performance issues on a secondary server, caused by a large backup, does not affect the performance or redundancy of the primary servers. If a secondary server is not available, then a primary can be selected based on factors, such as its physical proximity, bandwidth, performance, and liveness.

Use the SHOW DATABASES command to learn which database is hosted on which server.

To avoid taking a backup from a cluster member that is lagging behind, you can look at the transaction IDs by exposing Neo4j metrics or via Neo4j Browser. To view the latest processed transaction IDs (and other metrics) in Neo4j Browser, type :sysinfo at the prompt.

Targeting multiple servers

It is recommended to provide a list of multiple target servers when taking a backup from a cluster, since that may allow a backup to succeed even if some server is down, or not all databases are hosted on the same servers. If the command finds one or more servers that do not respond, it continues trying to backup from other servers, and continue backing up other requested databases, but the exit code of the command is non-zero, to alert the user to the fact there is a problem. If a name pattern is used for the database together with multiple target servers, all servers contribute to the list of matching databases.

Examples

The following are examples of how to back up a single database, e.g., the default database neo4j, and multiple databases, using the neo4j-admin database backup command. The target directory /mnt/backups/neo4j must exist before calling the command and the database(s) must be online.

Example 1. Use neo4j-admin database backup to back up a single database.
bin/neo4j-admin database backup --to-path=/mnt/backups/neo4j neo4j

To backup several databases that match database pattern you can use name globbing. For example, to backup all databases that start with n from your three-node cluster you should run:

Example 2. Use neo4j-admin database backup to back up multiple databases.
neo4j-admin database backup --from=192.168.1.34,192.168.1.35,192.168.1.36 --to-path=/mnt/backups/neo4j --pagecache=4G n*