Backup and restore planning

There are two main reasons for backing up your Neo4j databases and storing them in a safe, off-site location:

  • to be able to quickly recover your data in case of failure, for example related to hardware, human error, or natural disaster.

  • to be able to perform routine administrative operations, such as moving a database from one instance to another, upgrading, or reclaiming space.

Backup and restore strategy

Depending on your particular deployment and environment, it is important to design an appropriate backup and restore strategy.

There are various factors to consider when deciding on your strategy, such as:

  • Type of environment – development, test, or production.

  • Data volumes.

  • Number of databases.

  • Available system resources.

  • Downtime tolerance during backup and restore.

  • Demands on Neo4j performance during backup and restore. This factor might lead your decision towards performing these operations during an off-peak period.

  • Tolerance for data loss in case of failure.

  • Tolerance for downtime in case of failure. If you have zero tolerance for downtime and data loss, you might want to consider performing an online or even a scheduled backup.

  • Frequency of updates to the database.

  • Type of backup and restore method (online or offline), which may depend on whether you want to:

    • perform full backups (online or offline).

    • perform differential backups (online only).

    • use SSL/TLS for the backup network communication (online only).

    • keep your databases as archive files (online or offline).

  • How many backups you want to keep.

  • Where the backups will be stored — drive or remote server, cloud storage, different data center, different location, etc.

    It is recommended to store your database backups on a separate off-site server (drive or remote) from the database files. This ensures that if for some reason your Neo4j DBMS crashes, you will be able to access the backups and perform a restore.

  • How you will test recovery routines, and how often.

Backup and restore options

Neo4j supports backing up and restoring both online and offline databases. It uses Neo4j Admin tool commands, which can be run from a live, as well as from an offline Neo4j DBMS. All neo4j-admin commands must be invoked as the neo4j user to ensure the appropriate file permissions.

  • neo4j-admin database backup/restore (Enterprise only) -– used for performing online backup (full and differential) and restore operations. The database to be backed up must be in online mode. The command produces an immutable artifact, which has an inspectable API to aid management and operability. This command is suitable for production environments, where you cannot afford downtime.

    The command can also be invoked over the network if access is enabled using server.backup.listen_address.

    Make sure to limit access to the backup server port to fully trusted, specific devices. Firewall policies should be considered. For more information, refer to the Server configurations section.

    When using neo4j-admin database backup in a cluster, it is recommended to back up from an external instance as opposed to reuse instances that form part of the cluster.

  • neo4j-admin database dump/load –- used for performing offline dump and load operations. The database to be dumped must be in offline mode. The dump command can only be invoked from the server command line and is suitable for environments where downtime is not a factor. The command produces an archive file that follows the format <databasename><timestamp>.dump.

  • neo4j-admin database copy –- used for copying an offline database or backup. This command can be used for cleaning up database inconsistencies and reclaiming unused space.

File system copy-and-paste of databases is not supported and may result in unwanted behavior, such as corrupt stores.

Considerations for backing up and restoring databases in a cluster

Backing up a database in a clustered environment is not essentially different from a standalone backup, apart from the fact that you must know which server in a cluster to connect to. Use SHOW DATABASE <database> to learn which servers are hosting the database you want to back up. See Listing a single database for more information.

However, restoring a database in a cluster is different since it is not known in advance how a database is going to be allocated to the servers in a cluster. This method relies on the seed already existing on one of the servers. The recommended way to restore a database in a cluster is to seed from URI.

The Neo4j Admin commands backup, restore, dump, load, copy, and check-consistency are not supported for use on Composite databases. They must be run directly on the databases that are associated with that Composite database.

Table 1. The following table describes the commands' capabilities and usage.
Capability/ Usage backup/restore dump/load copy

Neo4j Edition

Enterprise

all

Enterprise

Run from an online Neo4j DBMS

Run from an offline Neo4j DBMS

Run against a user database

Run against the system database

Run against a Composite databases

Perform full backups

n/a

Perform differential backups

n/a

Applied to an online database

Applied to an offline database

only restore

Can be run remotely

only backup

Command input

database/archive (.backup)

database/archive (.dump)

database

Command output

archive (.backup)/database

archive (.dump)/database

database; no schema store

Clean up database inconsistencies

Compact data store

Databases to backup

A Neo4j DBMS can host multiple databases. Both Neo4j Community and Enterprise Editions have a default user database, called neo4j, and a system database, which contains configurations, e.g., operational states of databases, security configuration, schema definitions, login credentials, and roles. In the Enterprise Edition, you can also create additional user databases. Each of these databases is backed up independently of one another.

It is very important to store a recent backup of your databases, including the system database, in a safe location.

Additional files to back up

The following files must be backed up separately from the databases:

  • The neo4j.conf file. If you have a cluster deployment, you should back up the configuration file for each cluster member.

  • All the files used for encryption, i.e., private key, public certificate, and the contents of the trusted and revoked directories. The locations of these are described in SSL framework. If you have a cluster, you should back up these files for each cluster member.

  • If using custom plugins, make sure that you have the plugins in a safe location.

  • If using Bloom or GDS Enterprise, back up license key files for these products as well.

Storage considerations

For any backup, it is important that you store your data separately from the production system, where there are no common dependencies, and preferably off-site. If you are running Neo4j in the cloud, you may use a different availability zone or even a separate cloud provider. Since backups are kept for a long time, the longevity of archival storage should be considered as part of backup planning.