Back up and restore (online)
For performing backups, Neo4j uses the Admin Service, which is only available inside the Kubernetes cluster and access to it should be guarded. For more information, see Accessing Neo4j. |
Back up a database(s) to a cloud provider (AWS, GCP, and Azure) bucket
You can perform a backup of a Neo4j database(s) to any cloud provider (AWS, GCP, and Azure) bucket using the neo4j/neo4j-admin Helm chart. From Neo4j 5.10.0, the neo4j/neo4j-admin Helm chart also supports performing a backup of multiple databases.
Prerequisites
Before you can back up a database and upload it to your bucket, verify that you have the following:
-
A cloud provider bucket (AWS, GCP, or Azure) with read and write access to be able to upload the backup.
-
Credentials to access the cloud provider bucket, such as a service account JSON key file for GCP, a credentials file for AWS, or storage account credentials for Azure.
-
A Kubernetes cluster running on one of the cloud providers with the Neo4j Helm chart installed. For more information, see Quickstart: Deploy a standalone instance or Quickstart: Deploy a cluster.
Steps
To perform a backup of a Neo4j database to any cloud provider (AWS, GCP, and Azure) bucket, follow these steps:
-
Update the repository to get the latest charts:
helm repo update
-
Create a Kubernetes secret with the credentials to access the cloud provider bucket using one of the following options:
Create the secret named
gcpcreds
using your GCP service account JSON key file. The JSON key file contains all the details of the service account that has access to the bucket.kubectl create secret generic gcpcreds --from-file=credentials=/path/to/gcpcreds.json
-
Create a credentials file in the following format:
[ default ] region = us-east-1 aws_access_key_id = <your-aws_access_key_id> aws_secret_access_key = <your-aws_secret_access_key>
-
Create the secret named
awscreds
via the credentials file:kubectl create secret generic awscreds --from-file=credentials=/path/to/your/credentials
-
Create a credentials file in the following format:
AZURE_STORAGE_ACCOUNT_NAME=<your-azure-storage-account-name> AZURE_STORAGE_ACCOUNT_KEY=<your-azure-storage-account-key>
-
Create the secret named
azurecred
via the credentials file:kubectl create secret generic azurecred --from-file=credentials=/path/to/your/credentials
-
-
Configure the backup parameters in the backup-values.yaml file using one of the following options:
The following examples show the minimum configuration required to perform a backup to a cloud provider bucket. For more information about the available backup parameters, see Backup parameters.
neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.10.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: bucketName: "my-bucket" databaseAdminServiceName: "standalone-admin" #This is the Neo4j Admin Service name. database: "neo4j,system" cloudProvider: "gcp" secretName: "gcpcreds" secretKeyName: "credentials" consistencyCheck: enabled: true
neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.10.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: bucketName: "my-bucket" databaseAdminServiceName: "standalone-admin" database: "neo4j,system" cloudProvider: "aws" secretName: "awscreds" secretKeyName: "credentials" consistencyCheck: enabled: true
neo4j: image: "neo4j/helm-charts-backup" imageTag: "5.10.0" jobSchedule: "* * * * *" successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 backoffLimit: 3 backup: bucketName: "my-bucket" databaseAdminServiceName: "standalone-admin" database: "neo4j,system" cloudProvider: "azure" secretName: "azurecreds" secretKeyName: "credentials" consistencyCheck: enabled: true
The /backups mount created by default is an emptyDir type volume. This means that the data stored in this volume is not persistent and will be lost when the pod is deleted. To use a persistent volume for backups add the following section to the backup-values.yaml file:
tempVolume: persistentVolumeClaim: claimName: backup-pvc
You need to create the persistent volume and persistent volume claim before installing the neo4j-admin Helm chart. For more information, see Volume mounts and persistent volumes.
-
Install neo4j-admin Helm chart using the backup-values.yaml file:
helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml
The neo4j/neo4j-admin Helm chart installs a cronjob that launches a pod based on the job schedule. This pod performs a backup of one or multiple databases, a consistency check of the backup file(s), and uploads them to the cloud provider bucket.
-
Monitor the backup pod logs using
kubectl logs pod/<neo4j-backup-pod-name>
to check the progress of the backup. -
Check that the backup files and the consistency check reports have been uploaded to the cloud provider bucket.
Backup parameters
To see what options are configurable on the Helm chart use helm show values
and the Helm chart neo4j/neo4j-admin.
From Neo4j 5.10, the neo4j/neo4j-admin Helm chart also supports assigning your Neo4j pods to specific nodes using nodeSelector
labels, and from Neo4j 5.11, using affinity/anti-affinity rules or tolerations.
For more information, see Assigning backup pods to specific nodes and the Kubernetes official documentation on Affinity and anti-affinity rules and Taints and Tolerations.
For example:
helm show values neo4j/neo4j-admin
## @param nameOverride String to partially override common.names.fullname
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname
fullnameOverride: ""
# disableLookups will disable all the lookups done in the helm charts
# This should be set to true when using ArgoCD since ArgoCD uses helm template and the helm lookups will fail
# You can enable this when executing helm commands with --dry-run command
disableLookups: false
neo4j:
image: "neo4j/helm-charts-backup"
imageTag: "5.11.0"
podLabels: {}
# app: "demo"
# acac: "dcdddc"
podAnnotations: {}
# ssdvvs: "svvvsvs"
# vfsvswef: "vcfvgb"
# define the backup job schedule . default is * * * * *
jobSchedule: ""
# default is 3
successfulJobsHistoryLimit:
# default is 1
failedJobsHistoryLimit:
# default is 3
backoffLimit:
#add labels if required
labels: {}
backup:
# Ensure the bucket is already existing in the respective cloud provider
# In case of azure the bucket is the container name in the storage account
# bucket: azure-storage-container
bucketName: ""
#address details of the neo4j instance from which backup is to be done (serviceName or ip either one is required)
#ex: standalone-admin.default.svc.cluster.local:6362
# admin service name - standalone-admin
# namespace - default
# cluster domain - cluster.local
# port - 6362
#ex: 10.3.3.2:6362
# admin service ip - 10.3.3.2
# port - 6362
databaseAdminServiceName: ""
databaseAdminServiceIP: ""
#default name is 'default'
databaseNamespace: ""
#default port is 6362
databaseBackupPort: ""
#default value is cluster.local
databaseClusterDomain: ""
#name of the database to backup ex: neo4j or neo4j,system (You can provide command separated database names)
# In case of comma separated databases failure of any single database will lead to failure of complete operation
database: ""
# cloudProvider can be either gcp, aws, or azure
cloudProvider: ""
# name of the kubernetes secret containing the respective cloud provider credentials
# Ensure you have read,write access to the mentioned bucket
# For AWS :
# add the below in a file and create a secret via
# 'kubectl create secret generic awscred --from-file=credentials=/demo/awscredentials'
# [ default ]
# region = us-east-1
# aws_access_key_id = XXXXX
# aws_secret_access_key = XXXX
# For AZURE :
# add the storage account name and key in below format in a file create a secret via
# 'kubectl create secret generic azurecred --from-file=credentials=/demo/azurecredentials'
# AZURE_STORAGE_ACCOUNT_NAME=XXXX
# AZURE_STORAGE_ACCOUNT_KEY=XXXX
# For GCP :
# create the secret via the gcp service account json key file.
# ex: 'kubectl create secret generic gcpcred --from-file=credentials=/demo/gcpcreds.json'
secretName: ""
# provide the keyname used in the above secret
secretKeyName: ""
#setting this to true will not delete the backup files generated at the /backup mount
keepBackupFiles: true
#Below are all neo4j-admin database backup flags / options
#To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/backup-restore/online-backup/
pageCache: ""
includeMetadata: "all"
type: "AUTO"
keepFailed: false
parallelRecovery: false
verbose: true
heapSize: ""
#Below are all neo4j-admin database check flags / options
#To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/
consistencyCheck:
enable: false
checkIndexes: true
checkGraph: true
checkCounts: true
checkPropertyOwners: true
#The database name for which consistency check needs to be done.
#Defaults to the backup.database values if left empty
#The database name here should match with one of the database names present in backup.database. If not , the consistency check will be ignored
database: ""
maxOffHeapMemory: ""
threads: ""
verbose: true
# Set to name of an existing Service Account to use if desired
serviceAccountName: ""
# Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
# In that case set an ephemeral or persistent volume with sufficient space here
# The chart defaults to an emptyDir, use this to overwrite default behavior
#tempVolume:
# persistentVolumeClaim:
# claimName: backup-pvc
# securityContext defines privilege and access control settings for a Pod. Making sure that we don't run Neo4j as root user.
securityContext:
runAsNonRoot: true
runAsUser: 7474
runAsGroup: 7474
fsGroup: 7474
fsGroupChangePolicy: "Always"
# default ephemeral storage of backup container
resources:
requests:
ephemeralStorage: "4Gi"
limits:
ephemeralStorage: "5Gi"
# nodeSelector labels
# please ensure the respective labels are present on one of nodes or else helm charts will throw an error
nodeSelector: {}
# label1: "true"
# label2: "value1"
# set backup pod affinity
affinity: {}
# podAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# - labelSelector:
# matchExpressions:
# - key: security
# operator: In
# values:
# - S1
# topologyKey: topology.kubernetes.io/zone
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 100
# podAffinityTerm:
# labelSelector:
# matchExpressions:
# - key: security
# operator: In
# values:
# - S2
# topologyKey: topology.kubernetes.io/zone
#Add tolerations to the Neo4j pod
tolerations: []
# - key: "key1"
# operator: "Equal"
# value: "value1"
# effect: "NoSchedule"
# - key: "key2"
# operator: "Equal"
# value: "value2"
# effect: "NoSchedule"
Restore a single database
To restore a single offline database or a database backup, you first need to delete the database that you want to replace unless you want to restore the backup as an additional database in your DBMS.
Then, use the restore command of neo4j-admin
to restore the database backup.
Finally, use the Cypher command CREATE DATABASE name
to create the restored database in the system
database.
Delete the database that you want to replace
Before you restore the database backup, you have to delete the database that you want to replace with that backup using the Cypher command DROP DATABASE name
against the system
database.
If you want to restore the backup as an additional database in your DBMS, then you can proceed to the next section.
For Neo4j cluster deployments, you run the Cypher command |
-
Connect to the Neo4j DBMS:
kubectl exec -it <release-name>-0 -- bash
-
Connect to the
system
database usingcypher-shell
:cypher-shell -u neo4j -p <password> -d system
-
Drop the database you want to replace with the backup:
DROP DATABASE neo4j;
-
Exit the Cypher Shell command-line console:
:exit;
Restore the database backup
You use the neo4j-admin database restore
command to restore the database backup, and then the Cypher command CREATE DATABASE name
to create the restored database in the system
database.
For information about the command syntax, options, and usage, see Restore a database backup.
For Neo4j cluster deployments, restore the database backup on each cluster server. |
-
Run the
neo4j-admin database restore
command to restore the database backup:neo4j-admin database restore neo4j --from-path=/backups/neo4j --expand-commands
-
Connect to the
system
database usingcypher-shell
:cypher-shell -u neo4j -p <password> -d system
-
Create the
neo4j
database.For Neo4j cluster deployments, you run the Cypher command
CREATE DATABASE name
only on one of the cluster servers.CREATE DATABASE neo4j;
-
Open the browser at http://<external-ip>:7474/browser/ and check that all data has been successfully restored.
-
Execute a Cypher command against the
neo4j
database, for example:MATCH (n) RETURN n
If you have backed up your database with the option
--include-metadata
, you can manually restore the users and roles metadata. For more information, see Restore a database backup → Example.
To restore the |
Was this page helpful?