A method to replicate a Causal Cluster to new hardware with minimum downtime

If the opportunity arises such that you are in need of replicating your existing Causal Cluster to a new hardware setup, the following can be used to allow for minimal downtime.

Let us first start with an existing 3 instances cluster with the following characteristics:

neo4j> call dbms.cluster.overview
+---------------------------------------------------------------------------------------------------------------------------------------------+
| id                                     | addresses                                                                    | role       | groups |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| "ffc16977-4ab8-41b5-a4e2-e0e32e8abd6f" | ["bolt://10.1.1.1:7617", "http://10.1.1.1:7474", "https://10.1.1.1:7473"] | "LEADER"   | []     |
| "f0a78cd1-7ba3-45f6-aba3-0abb60d785ef" | ["bolt://10.1.1.2:7627", "http://10.1.1.2:7474", "https://10.1.1.2:7473"] | "FOLLOWER" | []     |
| "2fe26571-6fcc-4d1e-9f42-b81d08579057" | ["bolt://10.1.1.3:7637", "http://10.1.1.3:7474", "https://10.1.1.3:7473"] | "FOLLOWER" | []     |
+---------------------------------------------------------------------------------------------------------------------------------------------+

Each instance has defined its conf/neo4j.conf with https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_causal_clustering.expected_core_cluster_size and causal_clustering.initial_discovery_members defined as:

causal_clustering.expected_core_cluster_size=3
causal_clustering.initial_discovery_members=10.1.1.1:5001,10.1.1.2:5002,10.1.1.3:5003

All other ports referenced are using the default values.

To add 3 new instances, for example at IP address 10.2.2.1, 10.2.2.2 and 10.2.2.3 perform the following steps

install and create the new 3 instances cluster at IP addresses 10.2.2.1, 10.2.2.2 and 10.2.2.3.
in each of these 3 new instances conf/neo4j.conf define their ha.initial_hosts to be defined as:
```
causal_clustering.initial_discovery_members=10.1.1.1:5001,10.1.1.2:5001,10.1.1.3:5001
```

Start up each instance at 10.2.2.1, 10.2.2.2, and 10.2.2.3. These 3 new instances will then join the initial cluster at 10.1.1.1, 10.1.1.2 and 10.1.1.3 and copy down the databases\graph.db. Running dbms.cluster.overview(); will return output similar to:

+---------------------------------------------------------------------------------------------------------------------------------------------+
| id                                     | addresses                                                                    | role       | groups |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| "ffc16977-4ab8-41b5-a4e2-e0e32e8abd6f" | ["bolt://10.1.1.1:7687", "http://10.1.1.1:7474", "https://10.1.1.1:7473"] | "LEADER"   | []     |
| "f0a78cd1-7ba3-45f6-aba3-0abb60d785ef" | ["bolt://10.1.1.2:7687", "http://10.1.1.2:7474", "https://10.1.1.2:7473"] | "FOLLOWER" | []     |
| "2fe26571-6fcc-4d1e-9f42-b81d08579057" | ["bolt://10.1.1.3:7687", "http://10.1.1.3:7474", "https://10.1.1.3:7473"] | "FOLLOWER" | []     |
| "847b74c2-34a9-4458-b0e2-ea36cf25fdbf" | ["bolt://10.2.2.1:7687", "http://10.2.2.1:7474", "https://10.2.2.1:7473"] | "FOLLOWER" | []     |
| "39f92686-f581-4454-b288-a2254d38ea5c" | ["bolt://10.2.2.2:7687", "http://10.2.2.2:7474", "https://10.2.2.2:7473"] | "FOLLOWER" | []     |
| "e4114ad2-dcd1-4d22-8f56-a085524c9ed0" | ["bolt://10.2.2.2:7687", "http://10.2.2.3:7474", "https://10.2.2.3:7473"] | "FOLLOWER" | []     |
+---------------------------------------------------------------------------------------------------------------------------------------------+

Once the 3 new instances have completed the copy of graph.db from master, one can then cleanly stop the 3 initial instances at 10.1.1.1, 10.1.1.2, and 10.1.1.3 via a bin/neo4j stop. The 3 remaining instances will continue to run:

+---------------------------------------------------------------------------------------------------------------------------------------------+
| id                                     | addresses                                                                    | role       | groups |
+---------------------------------------------------------------------------------------------------------------------------------------------+
| "847b74c2-34a9-4458-b0e2-ea36cf25fdbf" | ["bolt://10.2.2.1:7687", "http://10.2.2.1:7474", "https://10.2.2.1:7473"] | "LEADER"   | []     |
| "39f92686-f581-4454-b288-a2254d38ea5c" | ["bolt://10.2.2.2:7687", "http://10.2.2.2:7474", "https://10.2.2.2:7473"] | "FOLLOWER" | []     |
| "e4114ad2-dcd1-4d22-8f56-a085524c9ed0" | ["bolt://10.2.2.2:7687", "http://10.2.2.3:7474", "https://10.2.2.3:7473"] | "FOLLOWER" | []     |
+---------------------------------------------------------------------------------------------------------------------------------------------+

If a Load Balancer was in front of the 3 instance cluster, at 10.1.1.1, 10.1.1.2, and 10.1.1.3, it should be updated to now point to 10.2.2.1, 10.2.2.2, and 10.2.2.3.
Since the initial 3 instances have been shut down and to provide ability for the 3 new instances to successfully restart at some later time, update the causal_clustering.initial_discovery_members of the new 3 instances and change:
```
causal_clustering.initial_discovery_members=10.1.1.1:5001,10.1.1.2:5001,10.1.1.3:5001
```
to:
```
causal_clustering.initial_discovery_members=10.2.2.1:5001,10.2.2.2:5001,10.2.2.3:5001
```
If you are currently using the Bolt driver to connect to the cluster, you would then need to update the connection string to reference a new url, for example changing bolt+routing://10.1.1.1:7678 to bolt+routing://10.2.2.1:7678.

Is this page helpful?

Knowledge Base

A method to replicate a Causal Cluster to new hardware with minimum downtime