Volume mounts and persistent volumes with the Neo4j Helm charts

This section describes the volume mounts created by the Neo4j Helm chart and the PersistentVolume types that can be used.

1. Volume mounts

A volume mount is part of a Kubernetes Pod spec that describes how and where a volume is mounted within a container.

The Neo4j Helm chart creates the following volume mounts:

  • backups mounted at /backups

  • data mounted at /data

  • import mounted at /import

  • licenses mounted at /licenses

  • logs mounted at /logs

  • metrics mounted at /metrics (Neo4j Community Edition does not generate metrics)

It is also possible to specify a plugins volume mount (mounted at /plugins), but this is not created by the default Helm chart.

2. Persistent volumes

PersistentVolume (PV) is a storage resource in the Kubernetes cluster that has a lifecycle independent of any individual Pod that uses the PV.
PersistentVolumeClaim (PVC) is a request for a storage resource by a user. PVCs consume PV resources. For more information about what PVs are and how they work, see the Kubernetes official documentation.

The type of PV used and its configuration can have a significant effect on the performance of Neo4j. Some PV types are not suitable for use with Neo4j at all.

The volume type used for the data volume mount is particularly important. Neo4j supports the following PV types for the data volume mount:

Neo4j data volume mounts do not support:

  • azureFile

  • nfs

For volume mounts other than the data volume mount, generally, all PV types are presumed to work.

hostPath, local, and emptyDir types are expected to perform well, provided suitable underlying storage, such as SSD, is used. However, these volume types have operational limitations and are not recommended.

It is also not recommended to use an HDD or cloud storage, such as AWS S3 mounted as a drive.

3. Mapping volume mounts to persistent volumes

By default, the Neo4j Helm chart uses a single PV, named data, to support all chart’s volume mounts.

The volume used for each volume mount can be changed by modifying the volumes.<volume name> object in the Helm Chart values.

The Neo4j Helm chart volumes object supports different modes:

3.1. mode: share

Description

The volume mount shares the underlying volume from one of the other volume objects.

Example

The logs volume mount uses the data volume (this is the default behaviour).

volumes:
  logs:
    mode: "share"
    share:
      name: "data"

3.2. mode: defaultStorageClass

Description

The volume mount is backed by a PV that Kubernetes dynamically provisions using the default StorageClass.

Example

A dynamically provisioned data volume with a size of 10Gi.

volumes:
  data:
    mode: "defaultStorageClass"
    defaultStorageClass:
      requests:
        storage: 10Gi

For the data volume, if requests.storage is not set, defaultStorageClass will default to a 10Gi volume. For all other volumes, defaultStorageClass.requests.storage must be set explicitly when using defaultStorageClass mode.

3.3. mode: dynamic

Description

The volume mount is backed by a PV that Kubernetes dynamically provisions using the specified StorageClass.

Example

A dynamically provisioned import volume with a size of 1Ti using the neo4j storage class.

volumes:
  import:
    mode: dynamic
    dynamic:
      storageClassName: "neo4j"
      requests:
        storage: 1Ti

For the data volume, if requests.storage is not set, dynamic will default to a 100Gi volume. For all other volumes, dynamic.requests.storage must be set explicitly when using dynamic mode.

3.4. mode: volume

Description

A complete Kubernetes volume object can be specified for the volume mount. Generally, volumes specified in this way have to be manually provisioned.

volume can be any valid Kubernetes volume type. This mode can be used in a variety of ways:

  • Attach an existing PersistentVolume by name.

  • Attach cloud disks/volumes, e.g., gcePersistentDisk, azureDisk, or awsElasticBlobstore without creating Kubernetes PersistentVolumes.

  • Attach the contents of a ConfigMap or Secret (as a read only volume).

    For details of how to specify volume objects, see the Kubernetes documentation.

Example - mount an AWS EBS volume

The data volume mount backed by the specified EBS volume. When this method is used, the EBS volume must already exist.

volumes:
  data:
    mode: volume
    volume:
      awsElasticBlockStore:
        volumeID: "vol-0795be227aff63b2a"
        fsType: ext4
Set file permissions on mounted volumes

The Neo4j helm chart supports an additional field not present in normal Kubernetes volume objects: setOwnerAndGroupWritableFilePermissions: true|false. If set to true, an initContainer will be run to modify the file permissions of the mounted volume, so that the contents can be written and read by the Neo4j process. This is to help with certain volume implementations that are not aware of the SecurityContext set on pods using them.

Example - reference an existing PersistentVolume

The backups volume mount backed by the specified PVC. When this method is used, the persistentVolumeClaim object must already exist.

volumes:
  backups:
    mode: volume
    volume:
      setOwnerAndGroupWritableFilePermissions: true
      persistentVolumeClaim:
        claimName: my-neo4j-pvc

3.5. mode: selector

Description

The volume to use is chosen from the existing PVs based on the provided selector object and a PVC, which is dynamically generated.

If no matching PVs exist, the Neo4j pod will be unable to start. To match, a PV must have the specified StorageClass, match the label selectorTemplate, and have sufficient storage capacity to meet the requested storage amount.

Example

The data volume chosen from the available volumes with the neo4j storage class and the label developer: alice.

volumes:
  import:
    mode: selector
    selector:
      storageClassName: "neo4j"
      requests:
        storage: 128Gi
      selectorTemplate:
        matchLabels:
          developer: "alice"

For the data volume, if requests.storage is not set, selector will default to a 100Gi volume. For all other volumes, selector.requests.storage must be set explicitly when using selector mode.

3.6. mode: volumeClaimTemplate

Description

A complete Kubernetes volumeClaimTemplate object is specified for the volume mount. Generally, volumes specified in this way are dynamically provisioned. For details of how to specify volumeClaimTemplate objects, see the Kubernetes documentation.

In all cases, do not forget to set the mode field when customizing the volumes object. If not set, the default mode is used, regardless of the other properties set on the volume object.

4. Provision persistent volumes with Neo4j Helm chart

With the Neo4j Helm charts, you can provision a PV manually or dynamically, using the default or a custom StorageClass.

  • Manual provisioning of persistent volumes. Recommended Default
    Must be labelled with an app label that matches the name of the Neo4j Helm release.

  • Dynamic provisioning using the default StorageClass. Recommended only for small-scale development work.

  • Dynamic provisioning using a dedicated StorageClass.

4.1. Provision persistent volumes manually

You provision a PV for Neo4j to use by explicitly creating it (for example, using kubectl create -f persistentVolume.yaml) before installing the Neo4j Helm release. If no suitable PV exists, the Neo4j pod will not start.

Why prefer manual provisioning?
  • Manual provisioning provides the strongest protection against the automatic removal of volumes containing critical data.

  • The performance of Neo4j is very dependent on the latency, IOPS capacity, and throughput of the storage it is using. Manual provisioning is the best way to ensure the underlying storage is configured for Neo4j performance.

  • Explicitly configuring the underlying storage before installing Neo4j is worthwhile because changing the underlying storage after installation while preserving the data stored in Neo4j, is difficult and may cause significant Neo4j downtime.

A Neo4j Helm release uses only manually provisioned PVs that have:

  • storageClassName set to manual

  • An app label — set in their metadata, which matches the name of the Neo4j Helm release.

  • Sufficient storage capacity — the PV capacity must be greater than or equal to the value of volumes.data.selector.requests.storage set for the Neo4j Helm release (default is 100Gi).

For example, if the release name is my-release and the requested storage is 100Gi, then the PV object must have storageClassName, app label, and capacity as shown in this example:

apiVersion: v1
kind: PersistentVolume
metadata:
  labels:
    app: "my-release"
spec:
  capacity:
    Storage: 100Gi
  storageClassName: "manual"

Then, you install the Neo4j release using the same name:

helm install "my-release" neo4j/neo4j-standalone

4.1.2. Configure the Neo4j Helm release for manual provisioning

The Neo4j helm chart uses manual provisioning by default, so it is unnecessary to set any chart values explicitly. The following default values are used for manual provisioning:

volumes:
  data:
    mode: "selector"
    selector:
      storageClassName: "manual"
      requests:
        storage: 100Gi

With this method a PVC is dynamically generated for the manually provisioned PV.

An alternative method for manual provisioning is to use a manually provisioned PVC. This is supported by the Neo4j Helm chart using the volume mode. For example, to use a pre-existing PVC called my-neo4j-pvc set these values:

volumes:
  data:
    mode: "volume"
    volume:
      persistentVolumeClaim:
        claimName: my-neo4j-pvc

4.1.3. Configure manual provisioning of persistent volumes

The instructions for manually provisioning PVs vary according to the type of PV being used and the underlying infrastructure. In general, there are two steps:

  1. Create the disk/volume to be used for storage in the underlying infrastructure. For example:

    • If using a gcePersistentDisk volume — in Google Compute Engine, create the Persistent Disk.

    • If using a hostPath volume — on the host node, create the path (directory).

  2. Create a PV in Kubernetes that references the underlying resource created in step 1.

    1. Ensure that the created PV’s app label matches the name of the Neo4j Helm release.

    2. Ensure that the created PV’s capacity.storage matches the storage available on the underlying infrastructure.

The performance of Neo4j is very dependent on the latency, IOPS capacity, and throughput of the storage it is using. For the best performance of Neo4j, use the best available disks (e.g., SSD) and set IOPS throttling/quotas to high values. For some cloud providers, IOPS throttling is proportional to the size of the volume. In these cases, the best performance is achieved by setting the size of the volume based on the desired IOPS rather than the amount required for data storage.

4.1.5. Reuse a persistent volume

After uninstalling the Neo4j Helm chart, both the PVC and the PV remain and can be reused by a new install of the helm chart. If you delete the PVC, the PV moves into a Released status and will not be reusable.

To be able to reuse the PV by a new install of the Neo4j Helm chart, remove its connection to the previous PVC:

  1. Edit the PV by running the following command:

    kubectl edit pv <pv-name>
  2. Remove the section spec.claimRef.

The PV goes back to the Available status and can be reused by a new install of the Neo4j Helm chart.

4.2. Provision persistent volumes dynamically

When using dynamic provisioning, the Neo4j release depends on Kubernetes to create a PV on-demand when Neo4j is installed.
For more information on dynamic provisioning, see the Kubernetes official documentation.

Why use dynamic provisioning?

Dynamic provisioning of PV for Neo4j is a good choice for development and test environments, where the ease of installation is more important than flexibility in managing the underlying storage and preservation of the stored data in all situations. With dynamic provisioning, a Neo4j Helm release uses either a specific Kubernetes StorageClass or the default StorageClass of the running Kubernetes cluster.

Using the default StorageClass is the quickest way to spin up and run Neo4j for simple tests, handling small amounts of data. However, it is not recommended for large amounts of data, as it may lead to performance issues.

It is recommended to create a dedicated StorageClass for Neo4j so that the underlying storage configuration can be specified to match the Neo4j usage as much as possible.

The volumes object in the Neo4j values.yaml file is used to configure dynamic provisioning.

4.2.1. Use the default StorageClass to dynamically provision persistent volumes

To use the default StorageClass and a storage size 100Gi, set the following values:

volumes:
  data:
    mode: "defaultStorageClass"
    defaultStorageClass:
      requests:
        storage: 100Gi

4.2.2. Use a dedicated StorageClass to dynamically provision persistent volumes

To use a dedicated StorageClass, you define it in a YAML file and create it using kubectl create. The permitted specification values depend on the provisioner being used. Full details of StorageClass specification are covered in the Kubernetes official documentation.

StorageClass called neo4j-storage that has a storage size 100Gi
volumes:
  import:
    mode: dynamic
    dynamic:
      storageClassName: "neo4j-storage"
      requests:
        storage: 1Ti

The performance of Neo4j is very dependent on the latency, IOPS capacity, and throughput of the storage it is using. For the best performance of Neo4j, use the best available disks (e.g., SSD) and set IOPS throttling/quotas to high values. For some cloud providers, IOPS throttling is proportional to the size of the volume. In these cases, the best performance is achieved by setting the size of the volume based on the desired IOPS rather than the amount required for data storage.


1. Not recommended because of inconsistencies in Docker Desktop handling of hostPath volumes.