Quickstart: Deploy a Neo4j cluster for analytic queries

This quickstart shows how to configure and deploy a special Neo4j cluster that comprises one primary server and N secondary servers to support analytic queries. The primary server handles the transaction workloads, whereas the secondary servers are configured with the Neo4j Graph Data Science library (GDS) and are used only for the analytic workload.
Information on using GDS in a cluster can be found in the Neo4j Graph Data Science library documentation.

The cluster is deployed to a cloud or a local Kubernetes cluster using the Neo4j Helm chart.

Prerequisites

Before you can deploy a Neo4j cluster on Kubernetes, you need to have:

A Kubernetes cluster running and the kubectl command-line tool installed and configured to communicate with your cluster. For more information, see Quickstart: Deploy a cluster → Prerequisites.
A valid license for Neo4j Enterprise Edition. For more information, see Install GDS Enterprise Edition (EE) and Bloom plugins.
The latest version of the Neo4j Helm chart repository.
(Optional) A valid license for GDS Enterprise Edition. To install a licensed plugin, you must provide the license files in a Kubernetes secret. For more information, see Install GDS Enterprise Edition (EE) and Bloom plugins.

Create a value YAML file for each type of server

To set up a Neo4j cluster for analytic queries, you need to create a value YAML file for each type of server, primary and secondary. For example:

Create a value YAML file for the primary server, for example, primary-value.yaml:

neo4j:
  name: analytics-cluster
  acceptLicenseAgreement: "yes"
  edition: enterprise
  password: my-password
volumes:
  data:
    mode: defaultStorageClass

# Disable the Neo4j load balancer and enable the internal service so that the servers can access each other:
services:
  neo4j:
    enabled: false
  internals:
    enabled: true

# Enable the analytics cluster and set the type to primary:
analytics:
  enabled: true
  type:
    name: primary

Create a value YAML file for the secondary servers, for example, secondary-gds.yaml. The password must be the same as for the primary server. If you are using GDS Enterprise Edition, you also need to create a secret with the license file and mount it as the /licenses volume mount. For more information on how to create a secret, see Install GDS Enterprise Edition (EE) and Bloom plugins.

neo4j:
  name: analytics-cluster
  acceptLicenseAgreement: "yes"
  edition: enterprise
  password: my-password
volumes:
  data:
    mode: defaultStorageClass
  # Define the volume mount for the license file:
  licenses:
    disableSubPathExpr: true
    mode: volume
    volume:
      secret:
        secretName: gds-license
        items:
          - key: gds.license
            path: gds.license

# Set the environment variables to download the plugins:
env:
  NEO4J_PLUGINS: '["graph-data-science"]'

# Set the configuration for the plugins directory and the mount for the license file:
config:
  gds.enterprise.license_file: "/licenses/gds.license"
  server.directories.plugins: "plugins"

# Disable the Neo4j load balancer and enable the internal service so that the servers can access each other:
services:
  neo4j:
    enabled: false
  internals:
    enabled: true

# Enable the analytics cluster and set the type to secondary:
analytics:
  enabled: true
  type:
    name: secondary

For all available options, see Customizing a Neo4j Helm chart.

Install the servers

Install a single Neo4j server using the neo4j-primary.yaml file, created in the previous section:
```
helm install primary neo4j/neo4j -f /path/to/neo4j-primary.yaml
```
Install the first secondary server using the secondary-gds.yaml file, created in the previous section:
```
helm install gds1 neo4j/neo4j -f /path/to/secondary-gds.yaml
```
Repeat step 2 to deploy a second secondary server. Use a different name, for example, gds2.

Verify that the GDS library is installed and licensed

Connect to each of the gds pods using the kubectl exec command:
```
kubectl exec -it gds1-0 -- bash
```

From the bin folder, connect to the system database of the gds1 server using the cypher-shell command:

cypher-shell -u neo4j -p my-password -d system -a bolt://gds1-internals.default.svc.cluster.local:7687

Run the following Cypher function to verify that the GDS library is installed:
```
RETURN gds.version();
```
Call gds.isLicensed() to verify that the GDS library is licensed:
```
RETURN gds.isLicensed();
```
The returned value must be true.

Verify the cluster formation

To verify that the cluster is deployed and running, you can install a load balancer and access Neo4j from the Neo4j Browser.

Deploy a Neo4j load balancer to the same namespace as the Neo4j cluster:

helm install lb neo4j/neo4j-load-balancer --set neo4j.name="analytics-cluster"

When deployed, copy the EXTERNAL_IP of the LoadBalancer service. For more information, see Access the Neo4j cluster from outside Kubernetes.
In a web browser, open the Neo4j Browser at http://EXTERNAL_IP:7474/browser and log in using the password you have configured in your values YAML files.

Verify that the cluster is deployed and running:

SHOW SERVERS;

+------------------------------------------------------------------------------------------------------------------------------------------+
| name                                   | address                                  | state     | health      | hosting                    |
+------------------------------------------------------------------------------------------------------------------------------------------+
| "16cd6e9c-aa5a-4737-8ed5-e0df36ce52d3" | "gds2.default.svc.cluster.local:7687"    | "Free"    | "Available" | ["system"]                 |
| "bafbe254-a8a2-498d-9b60-6b3fd0124045" | "primary.default.svc.cluster.local:7687" | "Enabled" | "Available" | ["neo4j", "system"]		     |
| "f1478d5d-1718-4430-a9b6-26fe9695ca30" | "gds1.default.svc.cluster.local:7687"    | "Free"    | "Available" | ["system"]                 |
+------------------------------------------------------------------------------------------------------------------------------------------+

The output shows that the secondary servers are in state free and host only the system database.

Enable the secondary servers to support analytic queries

To support analytic queries on the secondary servers, you need to enable them and change the neo4j database topology to include them.

In Neo4j Browser, enable the secondary servers to support analytic queries:

ENABLE SERVER "f1478d5d-1718-4430-a9b6-26fe9695ca30";
ENABLE SERVER "16cd6e9c-aa5a-4737-8ed5-e0df36ce52d3";

Alter the database topology to include the secondary servers:
```
ALTER DATABASE neo4j SET TOPOLOGY 1 PRIMARY 2 SECONDARY;
```

Check the status of the neo4j database to confirm the change:

SHOW DATABASE neo4j;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name    | type       | aliases | access       | address                                  | role        | writer | requestedStatus | currentStatus | statusMessage | default | home | constituents |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "neo4j" | "standard" | []      | "read-write" | "primary.default.svc.cluster.local:7687" | "primary"   | TRUE   | "online"        | "online"      | ""            | TRUE    | TRUE | []           |
| "neo4j" | "standard" | []      | "read-write" | "gds1.default.svc.cluster.local:7687"    | "secondary" | FALSE  | "online"        | "online"      | ""            | TRUE    | TRUE | []           |
| "neo4j" | "standard" | []      | "read-write" | "gds2.default.svc.cluster.local:7687"    | "secondary" | FALSE  | "online"        | "online"      | ""            | TRUE    | TRUE | []           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Check the status of all servers:

SHOW SERVERS;

+------------------------------------------------------------------------------------------------------------------------------------------+
| name                                   | address                                  | state     | health      | hosting                    |
+------------------------------------------------------------------------------------------------------------------------------------------+
| "16cd6e9c-aa5a-4737-8ed5-e0df36ce52d3" | "gds2.default.svc.cluster.local:7687"    | "Enabled" | "Available" | ["neo4j", "system"]        |
| "bafbe254-a8a2-498d-9b60-6b3fd0124045" | "primary.default.svc.cluster.local:7687" | "Enabled" | "Available" | ["neo4j", "system"]        |
| "f1478d5d-1718-4430-a9b6-26fe9695ca30" | "gds1.default.svc.cluster.local:7687"    | "Enabled" | "Available" | ["neo4j", "system"]        |
+------------------------------------------------------------------------------------------------------------------------------------------+