Rolling upgrade of Neo4j cluster using Helm chart across 2025–2026 releases
This example demonstrates how to perform a rolling upgrade of a Neo4j cluster between any two releases within the 2025–2026 cycle.
Architecture overview
Understanding how the Neo4j Helm chart deploys cluster members is essential for planning upgrades.
Each Neo4j cluster member, a server, is deployed as a separate Helm release, each with its own StatefulSet containing a single replica (replicas: 1).
This architecture differs from typical StatefulSet-based applications where all replicas live in one StatefulSet.
For a 3-server cluster, you can have:
Helm Release: server-1 → StatefulSet: server-1 → Pod: server-1-0
Helm Release: server-2 → StatefulSet: server-2 → Pod: server-2-0
Helm Release: server-3 → StatefulSet: server-3 → Pod: server-3-0
This design enables per-server offline maintenance mode and independent lifecycle management, but it means:
-
helm upgrademust be run once per cluster servers. -
Each server’s PodDisruptionBudget (PDB) (if enabled) only covers its own single pod.
-
Cluster-wide disruption protection requires a separate, manually created PDB (see PDB section).
Rolling upgrade of a cluster
A rolling upgrade updates each cluster member, a server, one at a time, keeping the cluster available throughout the process. You must ensure the cluster returns to a healthy state before upgrading the next server.
Prerequisites
-
All cluster servers must be running the same version before starting.
-
The cluster must be in a healthy state.
-
Update the Helm repo before starting:
helm repo update neo4j.
Key settings that affect upgrades
| Setting | Default | Description |
|---|---|---|
|
Unset, uses chart’s |
Override the Neo4j image. Use when you need a specific image from a private registry. When unset, the chart uses the Neo4j version matching the chart’s |
|
|
Time allowed for Neo4j to shut down gracefully. Neo4j flushes in-memory data to disk on shutdown. Do not reduce this without understanding the implications. |
|
|
When |
Step-by-step procedure
Repeat the following steps for each cluster member, one at a time.
-
Check the cluster is healthy.
Before upgrading a server, verify the cluster is healthy. Check that all servers are hosting their assigned databases (the query should return no results):
kubectl exec <any-running-server>-0 -- cypher-shell -u neo4j -p <password> \ "SHOW SERVERS YIELD name, hosting, requestedHosting, serverId WHERE requestedHosting <> hosting"Check that all databases are in their expected state (the query should return no results):
kubectl exec <any-running-server>-0 -- cypher-shell -u neo4j -p <password> \ "SHOW DATABASES YIELD name, address, currentStatus, requestedStatus, statusMessage WHERE currentStatus <> requestedStatus RETURN name, address, currentStatus, requestedStatus, statusMessage" -
Upgrade a server.
Upgrade to the target chart’s version (which includes the matching Neo4j image):
helm upgrade <server-release-name> neo4j/neo4j --version <chart-version> -f <server-values>.yamlIf you need to use a specific Neo4j image instead of the chart’s default, set
image.customImagein the member’svalues.yamlbefore runninghelm upgrade. -
Monitor the rollout.
kubectl rollout status --watch --timeout=600s statefulset/<server-release-name> -
Verify the server has rejoined the cluster.
Wait for the pod to be ready, then check the server’s state (should show
EnabledandAvailable):kubectl exec <server-release-name>-0 -- cypher-shell -u neo4j -p <password> \ "SHOW SERVERS"Confirm all databases on this server are in their expected state:
kubectl exec <server-release-name>-0 -- cypher-shell -u neo4j -p <password> \ "SHOW DATABASES YIELD name, address, currentStatus, requestedStatus WHERE currentStatus <> requestedStatus RETURN name, address, currentStatus, requestedStatus" -
Move to the next server.
Once the cluster is healthy again, repeat from step 1 for the next server.
Protecting cluster availability during worker node upgrades (PodDisruptionBudgets)
Kubernetes platforms that perform automated worker node upgrades (e.g., Kubermatic, GKE node auto-upgrades, EKS managed node groups) may drain and replace multiple nodes simultaneously. Without a PDB, all Neo4j pods can be evicted at once, causing cluster downtime.
Per-release PDB (built-in)
The Neo4j Helm chart includes an optional PDB that can be enabled per release:
podDisruptionBudget:
enabled: true
minAvailable: 1
However, because each cluster server is a separate Helm release with its own StatefulSet, each per-release PDB only protects a single pod.
A minAvailable: 1 PDB on a single-pod StatefulSet effectively prevents that pod from being evicted at all, which blocks worker node draining entirely.
|
The per-release PDB is not suitable for controlling the pace of worker node rolling upgrades across a cluster. |
Cluster-wide PDB (recommended for worker node upgrades)
To ensure that at most one Neo4j pod is evicted at a time during worker node upgrades, create a single PDB that spans all cluster servers using a shared label.
All Neo4j pods in a cluster share the label helm.neo4j.com/neo4j.name: <neo4j.name>, where <neo4j.name> is the value set in neo4j.name in your values.yaml.
- Important considerations
-
-
Create the PDB before initiating worker node upgrades.
-
The PDB must be in the same namespace as the Neo4j pods.
-
If you add or remove cluster’s servers, review and update the PDB accordingly (especially if using
minAvailable). -
This PDB is managed outside of the Helm chart lifecycle — you must create, update, and delete it manually or via your own automation.
-
Example: cluster-wide PDB for a 3-server cluster
If your cluster uses neo4j.name: my-cluster, create the following PDB:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: neo4j-cluster-pdb
namespace: <namespace>
spec:
minAvailable: 2
selector:
matchLabels:
helm.neo4j.com/neo4j.name: "my-cluster"
This ensures that Kubernetes will only evict one Neo4j pod at a time, because at least two out of three must remain available. Worker node upgrade controllers (Kubermatic, GKE, etc.) respect this constraint and drain nodes sequentially.
Choosing minAvailable vs maxUnavailable
| Setting | Value | Effect |
|---|---|---|
|
|
At least 2 servers must remain running at all times. |
|
|
At most 1 server can be down at any time. |
Both achieve the same result for a 3-server cluster.
Use whichever is clearer for your operational model.
Note that if you scale the cluster by adding more servers, a maxUnavailable: 1 PDB automatically adapts, while minAvailable: 2 needs to be updated.