Deploy an analytics cluster

In analytic use cases, the analytic queries can be intensive enough to warrant running them on separate servers, away from the servers handling the transactional workload. This example shows how to set up a cluster to achieve that. Information on using Neo4j’s graph data science library in a cluster can be found here: Neo4j Graph Data Science Library Manual → GDS with Neo4j cluster.

The analytics cluster can be set up in two ways, with fault tolerance or without fault tolerance. Bear in mind that the GDS library does not support fault tolerance and therefore GDS should only be deployed on servers configured for analytic queries, as described below.

With fault tolerance

Deploy the cluster

Example 1. Configure a cluster with five servers, two only for read queries

In this example, three servers named server01.example.com, server02.example.com and server03.example.com are configured as the transactional part of the cluster. Two more servers names server04.example.com and server05.example.com are configured for the analytical queries. Neo4j Enterprise Edition is installed on all five servers. They are configured by preparing neo4j.conf on each server.

Key points:

  • All servers include all members in their discovery list

  • The servers for analytics have mode constraints configured that restrict their hosting mode to secondary to prevent them from participating in normal write operations.

neo4j.conf on server01.example.com:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server01.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
neo4j.conf on server02.example.com:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server02.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
neo4j.conf on server03.example.com:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server03.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
neo4j.conf on server04.example.com - an analytics server:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server04.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
neo4j.conf on server05.example.com - an analytics server:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server05.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000,server04.example.com:5000,server05.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY

The Neo4j servers are ready to be started. The startup order does not matter.

After the cluster has started, it is possible to connect to any of the instances and run SHOW SERVERS to check the status of the cluster. This shows information about each member of the cluster:

SHOW SERVERS;
+-----------------------------------------------------------------------------------------------------------+
| name                                   | address          | state     | health      | hosting             |
+-----------------------------------------------------------------------------------------------------------+
| "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] |
| "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "server02:7687" | "Enabled" | "Available" | ["system"]          |
| "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "server03:7687" | "Enabled" | "Available" | ["system"]          |
| "df3758b1-337f-4b8a-a9de-8e745ca96549" | "server04:7687" | "Free"    | "Available" | ["system"]          |
| "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "server05:7687" | "Free"    | "Available" | ["system"]          |
+-----------------------------------------------------------------------------------------------------------+

For more extensive information about each server, use the SHOW SERVERS YIELD * command:

SHOW SERVERS YIELD *;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| serverId                               | name                                   | address          | state     | health      | hosting             | requestedHosting    | tags | allowedDatabases | deniedDatabases | modeConstraint | version     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "f3bd1199-bc6f-4a38-b25c-5f7588df5182" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | ["system", "neo4j"] | []   | []               | []              | "NONE"         | "5.8.0"     |
| "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "b331e481-c2ba-4b4e-82f3-bb51fe171483" | "server02:7687" | "Enabled" | "Available" | ["system"]          | ["system"]          | []   | []               | []              | "NONE"         | "5.8.0"     |
| "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "bd80e8fd-a51b-406a-9ed4-42daf4792aa6" | "server03:7687" | "Enabled" | "Available" | ["system"]          | ["system"]          | []   | []               | []              | "NONE"         | "5.8.0"     |
| "df3758b1-337f-4b8a-a9de-8e745ca96549" | "df3758b1-337f-4b8a-a9de-8e745ca96549" | "server04:7687" | "Free"    | "Available" | ["system"]          | []                  | []   | []               | []              | "SECONDARY"    | "5.8.0"     |
| "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "9207bfd9-aa1b-40c2-b965-edcd3955a20e" | "server05:7687" | "Free"    | "Available" | ["system"]          | []                  | []   | []               | []              | "SECONDARY"    | "5.8.0"     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Create new databases in the cluster

As mentioned in the Introduction, a server in a cluster can either host a database in primary or secondary mode. The primaries provide write operations and fault tolerance (if there are more than one of them). The secondaries provide somewhere for read queries to run, including the analytic queries for this example.

Example 2. Create a new database with three primaries and two secondaries

In the system database from the previous example, execute the following Cypher command to create a new database:

CREATE DATABASE bar
TOPOLOGY 3 PRIMARIES 2 SECONDARIES

Without fault tolerance

If fault tolerance is not a priority, it is possible to have a single server handling write operations for the DBMS. This means fewer servers are needed, but a failure can affect the entire DBMS.

If the single writer server or process fails, all databases will be unavailable for writes. They may be less available for reads as well, since that server manages the DBMS.

The following example shows how to set up a non-fault tolerant analytics cluster with three members.

Deploy the cluster

Example 3. Configure a cluster with three servers

In this example, three servers named server01.example.com, server02.example.com and server03.example.com are configured. Neo4j Enterprise Edition is installed on all three servers. They are configured by preparing neo4j.conf on each server. Note that server01.example.com is different from the others, and is the only server where write operations take place. The other servers are able to execute read queries, and if using GDS, to write results back to the writing server.

Key points:

  • The writer server only has itself in the list of discovery. This means it does not seek out the other members when it starts, they have to discover it. This is required in order to have a cluster with only a single primary for the system database.

  • The servers for analytics have mode constraints configured that restrict their hosting mode to secondary to prevent them from participating in normal write operations.

neo4j.conf on server01.example.com - the writer:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server01.example.com
# Only has self in this list
dbms.cluster.discovery.endpoints=server01.example.com:5000
neo4j.conf on server02.example.com - an analytics server:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server02.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY
neo4j.conf on server03.example.com - an analytics server:
server.default_listen_address=0.0.0.0
server.default_advertised_address=server03.example.com
dbms.cluster.discovery.endpoints=server01.example.com:5000,server02.example.com:5000,server03.example.com:5000
initial.server.mode_constraint=SECONDARY
server.cluster.system_database_mode=SECONDARY

The Neo4j servers are ready to be started. The startup order does not matter.

After the cluster has started, it is possible to connect to any of the instances and run SHOW SERVERS to check the status of the cluster. This shows information about each member of the cluster:

SHOW SERVERS;
+-----------------------------------------------------------------------------------------------------------+
| name                                   | address          | state     | health      | hosting             |
+-----------------------------------------------------------------------------------------------------------+
| "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] |
| "e56b49ea-243f-11ed-861d-0242ac120002" | "server02:7687" | "Free"    | "Available" | ["system"]          |
| "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "server03:7687" | "Free"    | "Available" | ["system"]          |
+-----------------------------------------------------------------------------------------------------------+

For more extensive information about each server, use the SHOW SERVERS YIELD * command:

SHOW SERVERS YIELD *;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| serverId                               | name                                   | address          | state     | health      | hosting             | requestedHosting    | tags | allowedDatabases | deniedDatabases | modeConstraint | version     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "d6fbe54b-0c6a-4959-9bcb-dcbbe80262a4" | "server01:7687" | "Enabled" | "Available" | ["system", "neo4j"] | ["system", "neo4j"] | []   | []               | []              | "NONE"         | "5.8.0"     |
| "e56b49ea-243f-11ed-861d-0242ac120002" | "e56b49ea-243f-11ed-861d-0242ac120002" | "server02:7687" | "Free"    | "Available" | ["system"]          | ["system"]          | []   | []               | []              | "SECONDARY"    | "5.8.0"     |
| "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "73e9a990-0a97-4a09-91e9-622bf0b239a4" | "server03:7687" | "Free"    | "Available" | ["system"]          | ["system"]          | []   | []               | []              | "SECONDARY"    | "5.8.0"     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Create new databases in the cluster

As mentioned in the Introduction, a server in a cluster can either host a database in primary or secondary mode. For transactional workloads, a database topology with several primaries is preferred for fault tolerance and automatic failover. The database topology may prioritize secondaries over primaries if the workload is more analytical. Such configuration is optimized for scalability but it is not fault-tolerant and does not provide automatic failover.

Example 4. Create a new database with one primary and two secondaries

In the system database on the writer from the previous example, execute the following Cypher command to create a new database:

CREATE DATABASE bar
TOPOLOGY 1 PRIMARY 2 SECONDARIES
Startup time

The instance may appear unavailable while it is joining the cluster. If you want to follow along with the startup, you can see the messages in neo4j.log.

Running analytic queries

If running large normal Cypher queries, it is possible to use server tags to identify the large servers, and a routing policy to direct the read queries towards those servers. See Multi-data center routing for more details.