Volume mounts and persistent volumes with the Neo4j Helm chart

Neo4j Helm chart uses volume mounts and persistent volumes to manage the storage of data and other Neo4j files.

Volume mounts

A volume mount is part of a Kubernetes Pod spec that describes how and where a volume is mounted within a container.

The Neo4j Helm chart creates the following volume mounts:

  • backups mounted at /backups

  • data mounted at /data

  • import mounted at /import

  • licenses mounted at /licenses

  • logs mounted at /logs

  • metrics mounted at /metrics (Neo4j Community Edition does not generate metrics.)

It is also possible to specify a plugins volume mount (mounted at /plugins), but this is not created by the default Helm chart. For more information, see Add plugins using a plugins volume.

Persistent volumes

PersistentVolume (PV) is a storage resource in the Kubernetes cluster that has a lifecycle independent of any individual pod that uses the PV.
PersistentVolumeClaim (PVC) is a request for a storage resource by a user. PVCs consume PV resources. For more information about what PVs are and how they work, see the Kubernetes official documentation.

The type of PV used and its configuration can have a significant effect on the performance of Neo4j. Some PV types are not suitable for use with Neo4j at all.

The volume type used for the data volume mount is particularly important. Neo4j supports the following PV types for the data volume mount:

  • persistentVolumeClaim

  • hostPath when using Docker Desktop [1].

Neo4j data volume mount does not support azureFile and nfs.

awsElasticBlockStore, azureDisk, gcePersistentDisk are now deprecated volume types in Kubernetes and their use is no longer supported by the Neo4j Helm chart. If you currently use one of these volume types, consult your Kubernetes vendor’s documentation on migrating to Container Storage Interface (CSI) driver-based storage.

For volume mounts other than the data volume mount, generally, all PV types are presumed to work.

hostPath, local, and emptyDir types are expected to perform well, provided suitable underlying storage, such as SSD, is used. However, these volume types have operational limitations and are not recommended.

It is also not recommended to use an HDD or cloud storage, such as AWS S3 mounted as a drive.

Mapping volume mounts to persistent volumes

By default, the Neo4j Helm chart uses a single PV, named data, to support volume mounts.

The volume used for each volume mount can be changed by modifying the volumes.<volume name> object in the Helm chart values.

The Neo4j Helm chart volumes object supports different modes:

Description

Dynamic volumes are recommended for most production workloads due to ease of management. The volume mount is backed by a PV that Kubernetes dynamically provisions using a dedicated StorageClass. The StorageClass is specified in the storageClassName field.

Example

The data volume uses a dedicated storage class:

storage-class-values.yaml
neo4j:
  name: standalone-with-storage-class
volumes:
  data:
    mode: dynamic
    dynamic:
      storageClassName: "neo4j-data"
      requests:
        storage: 10Gi

mode: share

Description

The volume mount shares the underlying volume from one of the other volume objects.

Example

The logs volume mount uses the data volume (this is the default behavior).

volumes:
  logs:
    mode: "share"
    share:
      name: "data"

mode: defaultStorageClass

Description

The volume mount is backed by a PV that Kubernetes dynamically provisions using the default StorageClass.

Example

A dynamically provisioned data volume with a size of 10Gi.

volumes:
  data:
    mode: "defaultStorageClass"
    defaultStorageClass:
      requests:
        storage: 10Gi

For the data volume, if requests.storage is not set, defaultStorageClass defaults to a 10Gi volume. For all other volumes, defaultStorageClass.requests.storage must be set explicitly when using defaultStorageClass mode.

mode: volume

Description

A complete Kubernetes volume object can be specified for the volume mount. Generally, volumes specified in this way have to be manually provisioned.

volume can be any valid Kubernetes volume type. This mode is typically used to mount a pre-existing Persistent Volume Claim (PVC).

For details on how to specify volume objects, see the Kubernetes documentation.

Set file permissions on mounted volumes

The Neo4j Helm chart supports an additional field not present in normal Kubernetes volume objects: setOwnerAndGroupWritableFilePermissions: true|false. If set to true, an initContainer will be run to modify the file permissions of the mounted volume, so that the contents can be written and read by the Neo4j process. This is to help with certain volume implementations that are not aware of the SecurityContext set on pods using them.

Example - reference an existing PersistentVolume

The backups volume mount is backed by the specified PVC. When this method is used, the persistentVolumeClaim object must already exist.

volumes:
  backups:
    mode: volume
    volume:
      persistentVolumeClaim:
        claimName: my-neo4j-pvc

mode: selector

Description

The volume to use is chosen from the existing PVs based on the provided selector object and a PVC that is dynamically generated.

If no matching PVs exist, the Neo4j pod will be unable to start. To match, a PV must have the specified StorageClass, match the label selectorTemplate, and have sufficient storage capacity to meet the requested storage amount.

Example

The data volume is chosen from the available volumes with the neo4j storage class and the label developer: alice.

volumes:
  import:
    mode: selector
    selector:
      storageClassName: "neo4j"
      requests:
        storage: 128Gi
      selectorTemplate:
        matchLabels:
          developer: "alice"

For the data volume, if requests.storage is not set, selector defaults to a 100Gi volume. For all other volumes, selector.requests.storage must be set explicitly when using selector mode.

mode: volumeClaimTemplate

Description

A complete Kubernetes volumeClaimTemplate object is specified for the volume mount. Volumes specified in this way are dynamically provisioned.

Example - provision Neo4j storage using a volume claim template

The data volume uses a dynamically provisioned PVC from the default storage class.

volumes:
  data:
    mode: volumeClaimTemplate
    volumeClaimTemplate:
      storageClassName: "default"
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi

In all cases, do not forget to set the mode field when customizing the volumes object. If not set, the default mode is used, regardless of the other properties set on the volume object.

Provision persistent volumes with Neo4j Helm chart

Provision persistent volumes dynamically

With the Neo4j Helm chart, you can provision a PV dynamically using the default or a custom StorageClass. To see a list of available storage classes in your Kubernetes cluster, run the following command:

kubectl get storageclass

Provision a PV using a dedicated StorageClass Recommended

For production workloads, it is recommended to create a dedicated storage class for Neo4j, which uses the Retain reclaim policy. This is to avoid data loss when disks are deleted after removing the persistent volume resource.

Example: Deploy Neo4j using a dedicated StorageClass

The following example shows how to deploy a Neo4j server with a dynamically provisioned PV that uses a dedicated storageClass.

  1. Create a dedicated storage class that uses the Retain reclaim policy:

    1. Create a storage class in GKE that uses the Retain reclaim policy and pd-ssd high-performance SSD disks:

      cat <<EOF | kubectl apply -f -
      apiVersion: storage.k8s.io/v1
      kind: StorageClass
      metadata:
        name: neo4j-data
      provisioner: pd.csi.storage.gke.io
      parameters:
        type: pd-ssd
      reclaimPolicy: Retain
      volumeBindingMode: WaitForFirstConsumer
      allowVolumeExpansion: true
      EOF
    2. Check the storage class is created:

      kubectl get storageclass neo4j-data
      NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
      neo4j-data           pd.csi.storage.gke.io   Retain          WaitForFirstConsumer   true                   7s
    1. Create a storage class in EKS that uses the Retain reclaim policy and gp3 high-performance SSD disks:

      The EBS CSI Driver addon is required to provision EBS disks in EKS clusters. See the AWS documentation for instructions on installing the driver.

      cat <<EOF | kubectl apply -f -
      kind: StorageClass
      apiVersion: storage.k8s.io/v1
      metadata:
        name: neo4j-data
      provisioner: ebs.csi.aws.com
      parameters:
        type: gp3
      reclaimPolicy: Retain
      allowVolumeExpansion: true
      volumeBindingMode: WaitForFirstConsumer
      EOF
    2. Check the storage class is created:

      kubectl get storageclass neo4j-data
      NAME            PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
      neo4j-data      ebs.csi.aws.com         Retain          WaitForFirstConsumer   true                   2m41s
    1. Create a storage class in GKE that uses the Retain reclaim policy and pd-ssd high-performance SSD disks:

      cat <<EOF | kubectl apply -f -
      apiVersion: storage.k8s.io/v1
      kind: StorageClass
      metadata:
        name: neo4j-data
      provisioner: disk.csi.azure.com
      parameters:
        skuName: Premium_LRS
      reclaimPolicy: Retain
      volumeBindingMode: WaitForFirstConsumer
      allowVolumeExpansion: true
      EOF
    2. Check the storage class is created:

      kubectl get storageclass neo4j-data
      NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
      neo4j-data           disk.csi.azure.com      Retain          WaitForFirstConsumer   true                   7s
  2. Install a Neo4j server with a data volume that uses the new storage class:

    1. Create a file storage-class-values.yaml that configures the data volume to use the new storage class:

      storage-class-values.yaml
      neo4j:
        name: standalone-with-storage-class
      volumes:
        data:
          mode: dynamic
          dynamic:
            storageClassName: "neo4j-data"
            requests:
              storage: 10Gi
    2. Install a single Neo4j server:

      helm install standalone-with-storage-class neo4j -f storage-class-values.yaml
    3. When the installation completes, verify that a PVC has been created:

      kubectl get pvc
      NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      data-standalone-with-storage-class-0   Bound    pvc-5d400f06-f99f-43ac-bf37-6079d692eaac   10Gi       RWO            neo4j-data     23m
  3. Clean up the resources:

    The storage class uses the Retain retention policy, meaning the disk will not be deleted after removing the PVC. To delete the disk, patch the PVC to use the Delete retention policy and delete the PVC:

    export pv_name=$(kubectl get pvc data-standalone-with-storage-class-0 -o jsonpath='{.spec.volumeName}')
    
    kubectl patch pv $pv_name -p '{"spec":{"persistentVolumeReclaimPolicy": "Delete"}}'
    
    kubectl delete pvc data-standalone-with-storage-class-0

    For the data volume, if requests.storage is not set, dynamic defaults to a 100Gi volume. For all other volumes, dynamic.requests.storage must be set explicitly when using dynamic mode.

Provision a PV using defaultStorageClass

Using the default StorageClass of the running Kubernetes cluster is the quickest way to spin up and run Neo4j for simple tests, handling small amounts of data. However, it is not recommended for large amounts of data, as it may lead to performance issues.

Example: Deploy Neo4j using defaultStorageClass

The following example shows how to deploy a Neo4j server with a dynamically provisioned PV that uses the default StorageClass.

  1. Create a file default-storage-class-values.yaml that configures the data volume to use the default StorageClass and a storage size 100Gi:

    storage-class-values.yaml
    volumes:
      data:
        mode: "defaultStorageClass"
        defaultStorageClass:
          requests:
            storage: 100Gi
  2. Install a single Neo4j server:

    helm install standalone-with-default-storage-class neo4j -f default-storage-class-values.yaml

Provision persistent volumes manually

Optionally, the Helm chart can use manually created disks for Neo4j storage. This installation option has more steps than using dynamic volumes, but it does provide more control over how disks are provisioned.

The instructions for the manual provisioning of PVs vary according to the type of PV being used and the underlying infrastructure. In general, there are two steps:

  1. Create the disk/volume to be used for storage in the underlying infrastructure. For example:

    • If using a csi volume — create the Persistent Disk using the cloud provider CLI or console.

    • If using a hostPath volume — on the host node, create the path (directory).

  2. Create a PV in Kubernetes that references the underlying resource created in step 1.

    1. Ensure that the created PV’s app label matches the name of the Neo4j Helm release.

    2. Ensure that the created PV’s capacity.storage matches the storage available on the underlying infrastructure.

If no suitable PV or PVC exists, the Neo4j pod will not start.

The Neo4j StatefulSet can select a persistent volume to use based on its labels. A Neo4j Helm release uses only manually provisioned PVs that have:

  • storageClassName that uses the provisioner kubernetes.io/no-provisioner.

  • An app label — set in their metadata, which matches the name of the neo4j.name value of the Helm installation.

  • Sufficient storage capacity — the PV capacity must be greater than or equal to the value of volumes.data.selector.requests.storage set for the Neo4j Helm release (default is 100Gi).

The neo4j/neo4j-persistent-volume Helm chart provides a convenient way to provision the persistent volume.

Example: Deploy Neo4j using a selector volume

The following example shows how to deploy Neo4j using a selector volume.

  1. Create a file persistent-volume-selector.yaml that configures the data volume to use a selector:

    storage-class-values.yaml
    neo4j:
      name: volume-selector
    volumes:
      data:
        mode: selector
        selector:
          storageClassName: "manual"
          accessModes:
            - ReadWriteOnce
          requests:
            storage: 10Gi
  2. Export environment variables to be used by the commands:

    export RELEASE_NAME=volume-selector
    export GCP_ZONE="$(gcloud config get compute/zone)"
    export GCP_PROJECT="$(gcloud config get project)"
  3. Create the disks to be used by the persistent volume:

    gcloud compute disks create --size 10Gi --type pd-ssd "${RELEASE_NAME}"
  4. Use the neo4j/neo4j-persistent-volume chart to configure the persistent volume. This command will create a persistent volume and a manual storage class that uses the kubernetes.io/no-provisioner provisioner.

    helm install "${RELEASE_NAME}"-disk neo4j/neo4j-persistent-volume \
           --set neo4j.name="${RELEASE_NAME}" \
           --set data.driver=pd.csi.storage.gke.io \
           --set data.storageClassName="manual" \
           --set data.reclaimPolicy="Delete" \
           --set data.createPvc=false \
           --set data.createStorageClass=true \
           --set data.volumeHandle="projects/${GCP_PROJECT}/zones/${GCP_ZONE}/disks/${RELEASE_NAME}" \
           --set data.capacity.storage=10Gi
  5. Now install Neo4j using the persistent-volume-selector.yaml created earlier:

    helm install "${RELEASE_NAME}" neo4j/neo4j -f persistent-volume-selector.yaml
  6. Clean up the helm installation and disks created for the example:

    helm uninstall ${RELEASE_NAME} ${RELEASE_NAME}-disk
    kubectl delete pvc data-${RELEASE_NAME}-0
    gcloud compute disks delete ${RELEASE_NAME} --quiet

The EBS CSI Driver addon is required to provision EBS disks in EKS clusters. You can run the command kubectl get daemonset ebs-csi-node -n kube-system to check if it is installed See the AWS Documentation for instructions on installing the driver.

  1. Create a file persistent-volume-selector.yaml that configures the data volume to use a selector:

    storage-class-values.yaml
    neo4j:
      name: volume-selector
    volumes:
      data:
        mode: selector
        selector:
          storageClassName: "manual"
          accessModes:
            - ReadWriteOnce
          requests:
            storage: 10Gi
  2. Export environment variables to be used by the commands:

    readonly RELEASE_NAME=volume-selector
    readonly AWS_ZONE={availability zone of EKS cluster}
  3. Create the disks to be used by the persistent volume:

    export volumeId=$(aws ec2 create-volume \
                        --availability-zone="${AWS_ZONE}" \
                        --size=10 \
                        --volume-type=gp3 \
                        --tag-specifications 'ResourceType=volume,Tags=[{Key=volume,Value='"${RELEASE_NAME}"'}]' \
                        --no-cli-pager \
                        --output text \
                        --query VolumeId)
  4. Use the neo4j/neo4j-persistent-volume chart to configure the persistent volume. This command will create a persistent volume and a manual storage class that uses the kubernetes.io/no-provisioner provisioner.

    helm install "${RELEASE_NAME}"-disk neo4j-persistent-volume \
        --set neo4j.name="${RELEASE_NAME}" \
        --set data.driver=ebs.csi.aws.com \
        --set data.reclaimPolicy="Delete" \
        --set data.createPvc=false \
        --set data.createStorageClass=true \
        --set data.volumeHandle="${volumeId}" \
        --set data.capacity.storage=10Gi
  5. Now install Neo4j using the persistent-volume-selector.yaml created earlier:

    helm install "${RELEASE_NAME}" neo4j/neo4j -f persistent-volume-selector.yaml
  6. Clean up the helm installation and disks created for the example:

    helm uninstall ${RELEASE_NAME} ${RELEASE_NAME}-disk
        kubectl delete pvc data-${RELEASE_NAME}-0
        aws ec2 delete-volume --volume-id ${volumeId}
  1. Create a file persistent-volume-selector.yaml that configures the data volume to use a selector:

    storage-class-values.yaml
    neo4j:
      name: volume-selector
    volumes:
      data:
        mode: selector
        selector:
          storageClassName: "manual"
          accessModes:
            - ReadWriteOnce
          requests:
            storage: 10Gi
  2. Export environment variables to be used by the commands:

    readonly AKS_CLUSTER_NAME={AKS Cluster name}
    readonly AZ_RESOURCE_GROUP={Resource group of cluster}
    readonly AZ_LOCATION={Location of cluster}
  3. Create the disks to be used by the persistent volume:

    export node_resource_group=$(az aks show --resource-group "${AZ_RESOURCE_GROUP}" --name "${AKS_CLUSTER_NAME}" --query nodeResourceGroup -o tsv)
    export disk_id=$(az disk create --name "${RELEASE_NAME}" --size-gb "10" --max-shares 1 --resource-group "${node_resource_group}" --location ${AZ_LOCATION} --output tsv --query id)
  4. Use the neo4j/neo4j-persistent-volume chart to configure the persistent volume. This command will create a persistent volume and a manual storage class that uses the kubernetes.io/no-provisioner provisioner.

    helm install "${RELEASE_NAME}"-disk neo4j-persistent-volume \
        --set neo4j.name="${RELEASE_NAME}" \
        --set data.driver=disk.csi.azure.com \
        --set data.storageClassName="manual" \
        --set data.reclaimPolicy="Delete" \
        --set data.createPvc=false \
        --set data.createStorageClass=true \
        --set data.volumeHandle="${disk_id}" \
        --set data.capacity.storage=10Gi
  5. Now install Neo4j using the persistent-volume-selector.yaml created earlier:

    helm install "${RELEASE_NAME}" neo4j/neo4j -f persistent-volume-selector.yaml
  6. Clean up the helm installation and disks created for the example:

    helm uninstall ${RELEASE_NAME} ${RELEASE_NAME}-disk
    kubectl delete pvc data-${RELEASE_NAME}-0
    az disk delete --name ${RELEASE_NAME} -y

Provision a PVC for Neo4j Storage

An alternative method for manual provisioning is to use a manually provisioned PVC. This is supported by the Neo4j Helm chart using the volume mode.

The neo4j/neo4j-persistent-volume Helm chart can be used to create a PV and PVC for a manually provisioned disk. A full example can be found in the Neo4j GitHub repository For example, to use a pre-existing PVC called my-neo4j-pvc set these values:

volumes:
  data:
    mode: "volume"
    volume:
      persistentVolumeClaim:
        claimName: my-neo4j-pvc

Reuse a persistent volume

After uninstalling the Neo4j Helm chart, both the PVC and the PV remain and can be reused by a new install of the Helm chart. If you delete the PVC, the PV moves into a Released status and will not be reusable.

To be able to reuse the PV by a new install of the Neo4j Helm chart, remove its connection to the previous PVC:

  1. Edit the PV by running the following command:

    kubectl edit pv <pv-name>
  2. Remove the section spec.claimRef.
    The PV goes back to the Available status and can be reused by a new install of the Neo4j Helm chart.

The performance of Neo4j is very dependent on the latency, IOPS capacity, and throughput of the storage it is using. For the best performance of Neo4j, use the best available disks (e.g., SSD) and set IOPS throttling/quotas to high values. For some cloud providers, IOPS throttling is proportional to the size of the volume. In these cases, the best performance is achieved by setting the size of the volume based on the desired IOPS rather than the amount required for data storage.


1. Not recommended because of inconsistencies in Docker Desktop handling of hostPath volumes.