Online Course Basic Neo4j 4.0 Administration Introduction to Neo4j Overview of Neo4j Administration Managing a Neo4j Instance Using cypher-shell to Manage Databases Copying Databases Changing the Database Location Checking Database Consistency Scripting to Manage Databases Configuring Plugins Managing HTTP Ports… Read more →

Managing Backups

About this module

Online backup is used in production where the application cannot tolerate the database being unavailable.

In this part of the training, you will learn how to back up and restore databases in a Neo4j instance.

Overview of backup with Neo4j

Backup is an essential process for any deployed application. The Causal Clustering capabilities of Neo4j enable you to provide uninterrupted access to databases because data is replicated in more than one server. Backup is a little different.

  • Backup enables you to take a snapshot of a database or set of databases at a given time.
  • Backups are essential for a deployed application, especially if some [rogue] application behavior changes the database in a way that cannot be detected or automatically undone. In this scenario, you need to roll back transactions to a certain point in time.

Backup architecture

A best practice is to run the backup process from a different system where the Neo4j instance is running.

The backup process is done using the neo4j-admin tool’s backup functionality. This enables the backup performance to not negatively impact the performance of the Neo4j instance that is servicing clients. What that means is that you must install Neo4j on at least two systems. You do not configure anything on the server where the backups will be performed from and you do not run a Neo4j instance on that server either.

BackupArchitecture

A common practice for many enterprises is to back up their databases to Amazon S3 sites. In addition, if any backups are to be stored in S3, they should be encrypted as well as the channel used to create send the backup to S3.

Backup concepts

  • When backing up the databases of a Neo4j instance, a best practice is to back up the system database and the user databases.
  • You perform a full backup initially and during subsequent backups, you back up increments that contain only the data that changed since the last backup.
  • Another part of the backup process that is important is to confirm that the database being backed up is consistent.

There are other options that you can additionally specify about the backup that you can read about in the Neo4j Operations Manual.

Enabling online backup

To enable a Neo4j instance to be backed up online, you must add these two properties to the neo4j.conf file for the Neo4j instance that will be backed up:

dbms.backup.enabled=true
dbms.backup.listen_address=<host-address>:<6362-6372>

Where host-address is the address of a server from which you will run neo4j-admin to perform the backup. You must specify a port number that will not conflict with existing ports used on the server being backed up.

Performing the backup

After you restart the Neo4j instance with the configuration changes, you can then initiate the backup on the server you specified in host_address as follows with consistency checking:

neo4j-admin backup --backup-dir=<backup-directory>
                   [--verbose]
                   --from=<Neo4j-instance-host-address:<port>
                   --database=<database-name>
                   --check-consistency
                   --report-dir=<report-directory>

If a database has previously been backed up to the backup directory, the backup will be for any changes since the last backup.

There are a number of other options you can specify related to performance and specific types of checks. See the Neo4j Operations Manual for details.

If the backup was successful it returns 0. If a non-zero value is returned, the backup failed or a consistency check failed, both of which need to be investigated.

One thing that you need to determine is whether the consistency checking will be done during the backup or on the backup image after the backup has completed. If you need to speed up the backup process or if it degrades the performance of the Neo4j instance, then it will be better to check the consistency after the backup.

Note
In most cases, you will need to set the HEAP_SIZE environment variable before you start the backup process.

Restore architecture

You use the neo4j-admin tool to restore a database. You run the restore process on the same system where the Neo4j instance resides.

RestoreArchitecture

If you need to restore all databases, then you can first shut down the Neo4j instance and restore all of them.

If you need to restore a specific database, you must ensure that the Neo4j instance is started, but the database that you want to restore is stopped.

Restoring from a backup

If you need to restore a specific database from a backup, you must first stop the database you want to restore.

Here is how you restore the database from a backup:

neo4j-admin restore
          --from=<backup-directory>
          --database=<database-name>
          [--force]
          [--verbose]

If you specify –force, the existing database will be replaced. You then need to create it again against the system database.

Note
If you restore a database as root with –force, make sure that you change the ownership (recursively) of the database directory to neo4j:neo4j before creating the database.

Exercise #11: Managing backups

In this Exercise, you will perform an online backup of a database where you use the same host for the backup process. Then you will modify the database. Finally, you will restore the database from the backup to see that it was successfully restored.

Note
In your real application, if you were to back up a production stand-alone Neo4j instance, you would use a different host from the host that is running the Neo4j instance.

Before you begin

  1. Make sure you have a terminal window open to your Docker Neo4j instance (neo4j) for this course.
  2. Stop the Docker Neo4j instance.

Exercise steps:

  1. Modify the Neo4j configuration so that online backup is enabled and will be done on this same host. For example, your neo4j.conf properties should look something like this:
    dbms.backup.enabled=true
    dbms.backup.listen_address=localhost:6362
  2. Start the Docker Neo4j instance.
  3. View the logs to make sure it started with no errors.
  4. Since we are using Docker for our training exercises, you must create a directory that is accessible from the Docker Neo4j instance. Create the following directory in the $HOME/docker-neo4j/neo4j/logs directory named backups and ensure that it has all permissions (chmod 777 backups).
  5. In cypher-shell, make sure that all database are started. There should be four databases:
    • system
    • maindb
    • movies
    • movies2
  6. Confirm that the movies database has 171 nodes:
    :USE movies
    MATCH (n) RETURN count(n);
  7. Perform an online backup of the system database using these guidelines:
    1. Perform a consistency check.
    2. Use the backups directory for the location of the backup.
    3. Use the reports directory for the report location.
[sudo] docker exec --interactive neo4j bin/neo4j-admin backup --backup-dir=logs/backups --from=localhost:6362 --check-consistency --database=system --report-dir=logs/reports

The result of the backup should look something like this:

BackupSystemDocker

  1. Repeat the backup procedure for the user databases
    • maindb
    • movies
    • movies2
  2. In cypher-shell drop the movies database.
  3. Use the restore tool to restore the movies database.
    [sudo] docker exec --interactive neo4j bin/neo4j-admin restore --from=logs/backups/movies --database=movies
  4. In cypher-shell create the movies database that was just restored.
  5. Confirm that the movies database has 171 nodes.
    MATCH (n) RETURN count(n);
  6. Exit cypher-shell.
  7. Invoke cypher-shell to add nodes to the movies database using the movies.cypher file.

On OS X or Linux:

cat ~/docker-neo4j/files/movies.cypher | docker exec --interactive neo4j bin/cypher-shell --database movies -u neo4j -p <passwordYouSpecified>

On Windows:

type files\movies.cypher | docker exec --interactive neo4j bin/cypher-shell --database movies -u neo4j -p <passwordYouSpecified>
  1. In cypher-shell confirm that the database contains 342 nodes:
    MATCH (n) RETURN count(n);
  2. Perform an online backup using these guidelines:
    1. Back up the movies database.
    2. Perform a consistency check.
    3. Use the backups directory for the location of the backup.

The result of the backup should look as follows:

BackupMoviesDocker

  1. Invoke cypher-shell to add more nodes to the movies database using the movies.cypher file.

On OS X or Linux:

cat ~/docker-neo4j/files/movies.cypher | docker exec --interactive neo4j bin/cypher-shell --database movies -u neo4j -p <passwordYouSpecified>

On Windows:

type files\movies.cypher | docker exec --interactive neo4j bin/cypher-shell --database movies -u neo4j -p <passwordYouSpecified>
  1. In cypher-shell confirm that the database contains 513 nodes:
    MATCH (n) RETURN count(n);
  2. Next, you will restore the movies database to the one that has 342 nodes. Stop the movies database.
  3. Restore the movies using these guidelines:
    1. Use the same backups location.
    2. Specify force so that the database will be replaced.
[sudo] docker exec --interactive neo4j bin/neo4j-admin restore --from=logs/backups/movies --database=movies --force
  1. Connect to the Neo4j instance with cypher-shell.
  2. Start the movies database.
  3. Confirm that the movies database has 342 nodes.

Exercise summary

You have gained experience backing up all databases, backing up a single database, and restoring a database.

Check your understanding

Question 1

What is a best practice for performing backups for the databases a Neo4j instance?

Select the correct answers.

  • Run the backup process as a background process.
  • Run the backup process on a separate system where Neo4j Enterprise Edition has been installed.
  • Stop each database before performing the backup.
  • Check the consistency of the backup.

Question 2

To perform an online backup, what must you configure in neo4j.conf?

Select the correct answers.

  • Which databases will be backed up.
  • When the databases will be backed up.
  • The port that will be used for the backup.
  • The IP address of the system from where the backup process will run.

Question 3

Suppose you need to restore two databases named customers and orders from backups. What must you do before you perform the restore process?

Select the correct answer.

  • DROP each database.
  • STOP each database.
  • RESTART each database.
  • Remove all files in the databases and transactions directories for these databases.

Summary

You should now be able to back up and restore a Neo4j database.

Stay Connected

Sign up to find out more about Neo4j's upcoming events & meetups.