Back up and restore (online)

For performing backups, Neo4j uses the Admin Service, which is only available inside the Kubernetes cluster and access to it should be guarded. For more information, see Accessing Neo4j.

Back up a database(s) to a cloud provider (AWS, GCP, and Azure) bucket

You can perform a backup of a Neo4j database(s) to any cloud provider (AWS, GCP, and Azure) bucket using the neo4j/neo4j-admin Helm chart. From Neo4j 5.10.0, the neo4j/neo4j-admin Helm chart also supports performing a backup of multiple databases.

Prerequisites

Before you can back up a database and upload it to your bucket, verify that you have the following:

  • A cloud provider bucket (AWS, GCP, or Azure) with read and write access to be able to upload the backup.

  • Credentials to access the cloud provider bucket, such as a service account JSON key file for GCP, a credentials file for AWS, or storage account credentials for Azure.

  • A Kubernetes cluster running on one of the cloud providers with the Neo4j Helm chart installed. For more information, see Quickstart: Deploy a standalone instance or Quickstart: Deploy a cluster.

Steps

To perform a backup of a Neo4j database to any cloud provider (AWS, GCP, and Azure) bucket, follow these steps:

  1. Update the repository to get the latest charts:

    helm repo update
  2. Create a Kubernetes secret with the credentials to access the cloud provider bucket using one of the following options:

    Create the secret named gcpcreds using your GCP service account JSON key file. The JSON key file contains all the details of the service account that has access to the bucket.

    kubectl create secret generic gcpcreds --from-file=credentials=/path/to/gcpcreds.json
    1. Create a credentials file in the following format:

      [ default ]
      region = us-east-1
      aws_access_key_id = <your-aws_access_key_id>
      aws_secret_access_key = <your-aws_secret_access_key>
    2. Create the secret named awscreds via the credentials file:

      kubectl create secret generic awscreds --from-file=credentials=/path/to/your/credentials
    1. Create a credentials file in the following format:

      AZURE_STORAGE_ACCOUNT_NAME=<your-azure-storage-account-name>
      AZURE_STORAGE_ACCOUNT_KEY=<your-azure-storage-account-key>
    2. Create the secret named azurecred via the credentials file:

      kubectl create secret generic azurecred --from-file=credentials=/path/to/your/credentials
  3. Configure the backup parameters in the backup-values.yaml file using one of the following options:

    The following examples show the minimum configuration required to perform a backup to a cloud provider bucket. For more information about the available backup parameters, see Backup parameters.

    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.10.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
      bucketName: "my-bucket"
      databaseAdminServiceName:  "standalone-admin" #This is the Neo4j Admin Service name.
      database: "neo4j,system"
      cloudProvider: "gcp"
      secretName: "gcpcreds"
      secretKeyName: "credentials"
    
    consistencyCheck:
      enabled: true
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.10.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
      bucketName: "my-bucket"
      databaseAdminServiceName:  "standalone-admin"
      database: "neo4j,system"
      cloudProvider: "aws"
      secretName: "awscreds"
      secretKeyName: "credentials"
    
    consistencyCheck:
      enabled: true
    neo4j:
      image: "neo4j/helm-charts-backup"
      imageTag: "5.10.0"
      jobSchedule: "* * * * *"
      successfulJobsHistoryLimit: 3
      failedJobsHistoryLimit: 1
      backoffLimit: 3
    
    backup:
      bucketName: "my-bucket"
      databaseAdminServiceName:  "standalone-admin"
      database: "neo4j,system"
      cloudProvider: "azure"
      secretName: "azurecreds"
      secretKeyName: "credentials"
    
    consistencyCheck:
      enabled: true

    The /backups mount created by default is an emptyDir type volume. This means that the data stored in this volume is not persistent and will be lost when the pod is deleted. To use a persistent volume for backups add the following section to the backup-values.yaml file:

    tempVolume:
      persistentVolumeClaim:
        claimName: backup-pvc

    You need to create the persistent volume and persistent volume claim before installing the neo4j-admin Helm chart. For more information, see Volume mounts and persistent volumes.

  4. Install neo4j-admin Helm chart using the backup-values.yaml file:

    helm install backup-name neo4j-admin -f /path/to/your/backup-values.yaml

    The neo4j/neo4j-admin Helm chart installs a cronjob that launches a pod based on the job schedule. This pod performs a backup of one or multiple databases, a consistency check of the backup file(s), and uploads them to the cloud provider bucket.

  5. Monitor the backup pod logs using kubectl logs pod/<neo4j-backup-pod-name> to check the progress of the backup.

  6. Check that the backup files and the consistency check reports have been uploaded to the cloud provider bucket.

Backup parameters

To see what options are configurable on the Helm chart use helm show values and the Helm chart neo4j/neo4j-admin.
From Neo4j 5.10, the neo4j/neo4j-admin Helm chart also supports assigning your Neo4j pods to specific nodes using nodeSelector labels, and from Neo4j 5.11, using affinity/anti-affinity rules or tolerations. For more information, see Assigning backup pods to specific nodes and the Kubernetes official documentation on Affinity and anti-affinity rules and Taints and Tolerations.

For example:

helm show values neo4j/neo4j-admin
## @param nameOverride String to partially override common.names.fullname
nameOverride: ""
## @param fullnameOverride String to fully override common.names.fullname
fullnameOverride: ""
# disableLookups will disable all the lookups done in the helm charts
# This should be set to true when using ArgoCD since ArgoCD uses helm template and the helm lookups will fail
# You can enable this when executing helm commands with --dry-run command
disableLookups: false

neo4j:
  image: "neo4j/helm-charts-backup"
  imageTag: "5.11.0"
  podLabels: {}
#    app: "demo"
#    acac: "dcdddc"
  podAnnotations: {}
#    ssdvvs: "svvvsvs"
#    vfsvswef: "vcfvgb"
  # define the backup job schedule . default is * * * * *
  jobSchedule: ""
  # default is 3
  successfulJobsHistoryLimit:
  # default is 1
  failedJobsHistoryLimit:
  # default is 3
  backoffLimit:
  #add labels if required
  labels: {}

backup:
  # Ensure the bucket is already existing in the respective cloud provider
  # In case of azure the bucket is the container name in the storage account
  # bucket: azure-storage-container
  bucketName: ""

  #address details of the neo4j instance from which backup is to be done (serviceName or ip either one is required)

  #ex: standalone-admin.default.svc.cluster.local:6362
  # admin service name -  standalone-admin
  # namespace - default
  # cluster domain - cluster.local
  # port - 6362

  #ex: 10.3.3.2:6362
  # admin service ip - 10.3.3.2
  # port - 6362

  databaseAdminServiceName: ""
  databaseAdminServiceIP: ""
  #default name is 'default'
  databaseNamespace: ""
  #default port is 6362
  databaseBackupPort: ""
  #default value is cluster.local
  databaseClusterDomain: ""

  #name of the database to backup ex: neo4j or neo4j,system (You can provide command separated database names)
  # In case of comma separated databases failure of any single database will lead to failure of complete operation
  database: ""
  # cloudProvider can be either gcp, aws, or azure
  cloudProvider: ""

  # name of the kubernetes secret containing the respective cloud provider credentials
  # Ensure you have read,write access to the mentioned bucket
  # For AWS :
  # add the below in a file and create a secret via
  # 'kubectl create secret generic awscred --from-file=credentials=/demo/awscredentials'

  #  [ default ]
  #  region = us-east-1
  #  aws_access_key_id = XXXXX
  #  aws_secret_access_key = XXXX

  # For AZURE :
  # add the storage account name and key in below format in a file create a secret via
  # 'kubectl create secret generic azurecred --from-file=credentials=/demo/azurecredentials'

  #  AZURE_STORAGE_ACCOUNT_NAME=XXXX
  #  AZURE_STORAGE_ACCOUNT_KEY=XXXX

  # For GCP :
  # create the secret via the gcp service account json key file.
  # ex: 'kubectl create secret generic gcpcred --from-file=credentials=/demo/gcpcreds.json'
  secretName: ""
  # provide the keyname used in the above secret
  secretKeyName: ""

  #setting this to true will not delete the backup files generated at the /backup mount
  keepBackupFiles: true

  #Below are all neo4j-admin database backup flags / options
  #To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/backup-restore/online-backup/
  pageCache: ""
  includeMetadata: "all"
  type: "AUTO"
  keepFailed: false
  parallelRecovery: false
  verbose: true
  heapSize: ""

#Below are all neo4j-admin database check flags / options
#To know more about the flags read here : https://neo4j.com/docs/operations-manual/current/tools/neo4j-admin/consistency-checker/
consistencyCheck:
  enable: false
  checkIndexes: true
  checkGraph: true
  checkCounts: true
  checkPropertyOwners: true
  #The database name for which consistency check needs to be done.
  #Defaults to the backup.database values if left empty
  #The database name here should match with one of the database names present in backup.database. If not , the consistency check will be ignored
  database: ""
  maxOffHeapMemory: ""
  threads: ""
  verbose: true

# Set to name of an existing Service Account to use if desired
serviceAccountName: ""

# Volume to use as temporary storage for files before they are uploaded to cloud. For large databases local storage may not have sufficient space.
# In that case set an ephemeral or persistent volume with sufficient space here
# The chart defaults to an emptyDir, use this to overwrite default behavior
#tempVolume:
#  persistentVolumeClaim:
#    claimName: backup-pvc

# securityContext defines privilege and access control settings for a Pod. Making sure that we don't run Neo4j as root user.
securityContext:
  runAsNonRoot: true
  runAsUser: 7474
  runAsGroup: 7474
  fsGroup: 7474
  fsGroupChangePolicy: "Always"

# default ephemeral storage of backup container
resources:
  requests:
    ephemeralStorage: "4Gi"
  limits:
    ephemeralStorage: "5Gi"

# nodeSelector labels
# please ensure the respective labels are present on one of nodes or else helm charts will throw an error
nodeSelector: {}
#  label1: "true"
#  label2: "value1"

# set backup pod affinity
affinity: {}
#  podAffinity:
#    requiredDuringSchedulingIgnoredDuringExecution:
#      - labelSelector:
#          matchExpressions:
#            - key: security
#              operator: In
#              values:
#                - S1
#        topologyKey: topology.kubernetes.io/zone
#  podAntiAffinity:
#    preferredDuringSchedulingIgnoredDuringExecution:
#      - weight: 100
#        podAffinityTerm:
#          labelSelector:
#            matchExpressions:
#              - key: security
#                operator: In
#                values:
#                  - S2
#          topologyKey: topology.kubernetes.io/zone

#Add tolerations to the Neo4j pod
tolerations: []
#  - key: "key1"
#    operator: "Equal"
#    value: "value1"
#    effect: "NoSchedule"
#  - key: "key2"
#    operator: "Equal"
#    value: "value2"
#    effect: "NoSchedule"

Restore a single database

To restore a single offline database or a database backup, you first need to delete the database that you want to replace unless you want to restore the backup as an additional database in your DBMS. Then, use the restore command of neo4j-admin to restore the database backup. Finally, use the Cypher command CREATE DATABASE name to create the restored database in the system database.

Delete the database that you want to replace

Before you restore the database backup, you have to delete the database that you want to replace with that backup using the Cypher command DROP DATABASE name against the system database. If you want to restore the backup as an additional database in your DBMS, then you can proceed to the next section.

For Neo4j cluster deployments, you run the Cypher command DROP DATABASE name only on one of the cluster servers. The command is automatically routed from there to the other cluster members.

  1. Connect to the Neo4j DBMS:

    kubectl exec -it <release-name>-0 -- bash
  2. Connect to the system database using cypher-shell:

    cypher-shell -u neo4j -p <password> -d system
  3. Drop the database you want to replace with the backup:

    DROP DATABASE neo4j;
  4. Exit the Cypher Shell command-line console:

    :exit;

Restore the database backup

You use the neo4j-admin database restore command to restore the database backup, and then the Cypher command CREATE DATABASE name to create the restored database in the system database. For information about the command syntax, options, and usage, see Restore a database backup.

For Neo4j cluster deployments, restore the database backup on each cluster server.

  1. Run the neo4j-admin database restore command to restore the database backup:

    neo4j-admin database restore neo4j --from-path=/backups/neo4j --expand-commands
  2. Connect to the system database using cypher-shell:

    cypher-shell -u neo4j -p <password> -d system
  3. Create the neo4j database.

    For Neo4j cluster deployments, you run the Cypher command CREATE DATABASE name only on one of the cluster servers.

    CREATE DATABASE neo4j;
  4. Open the browser at http://<external-ip>:7474/browser/ and check that all data has been successfully restored.

  5. Execute a Cypher command against the neo4j database, for example:

    MATCH (n) RETURN n

    If you have backed up your database with the option --include-metadata, you can manually restore the users and roles metadata. For more information, see Restore a database backup → Example.

To restore the system database, follow the steps described in Dump and load databases (offline).