Causal Clustering Minimum Core Size At Formation
causal_clustering.minimum_core_cluster_size_at_formation is defined as the minimum number of Core machines initially required
to form a cluster. The cluster will form when at least this many Core members have discovered each other.
The following example shows a 5 core cluster with causal_clustering.minimum_core_cluster_size_at_formation=3. Core 1 and 2 are initialised and the first election occurs as below:
Core 3 now joins and the cluster forms successfully:
Causal Clustering Minimum Core Size At Runtime
causal_clustering.minimum_core_cluster_size_at_runtime is defined as the minimum size of the dynamically adjusted voting
set (which only core members may be a part of). Adjustments to the voting set happen automatically as the availability of
core members changes, due to explicit operations such as starting or stopping a member, or unintended issues such as network
partitions. Note that this dynamic scaling of the voting set is generally desirable as under some circumstances it can
increase the number of instance failures which may be tolerated. A majority of the voting set must be available before voting
in or out members.
Let’s try out
causal_clustering.minimum_core_cluster_size_at_runtime=2 on a 3-node cluster. If we lose one core, the
cluster still has consensus and can scale down to 2. But if we lose 1 more we’re at a single node left and lack consensus
so can’t scale down, and we’re waiting for that just-failed node to become available again. At this point, if the first node
of 3 that failed comes back online, it can’t be added back to the cluster since we lack consensus to add it back in. We’re
stuck waiting on only the last failed node.
In the example below, core 1 (leader) sees core 3 leaving the cluster as below:
At this point, the cluster still has quorum for writes done at leader, core 1. But after a subsequent period of
causal_clustering.leader_election_timeout (default 7s), core 3 is removed from the cluster _because of the cluster size at
runtime set to 2.
If we then take core 2 offline also, the cluster becomes read only:
However, if we now add back core 3 before adding back core 2, we still end up with two cores in follower state, leaving the cluster as read-only and we see the below in core 1’s log:
And a write transaction results in the below exception:
It is not until core 2 is added back, that the three cores form a writeable cluster once again:
If we however, stick with the default cluster size at runtime of 3, then the cluster could not have scaled down to 2 (the first node that failed wouldn’t have been voted out), but we would have kept consensus and write ability. But then when the second node fails and we’re down to 1, we lose consensus and write capability, just like the previous scenario, but we’re able to get back consensus and write capability if either of the two failed nodes comes back online, not just the latest failed node.
In conclusion, the only effective difference between
minimum_core_cluster_size_at_runtime at 2 instead of the default of 3
is that when we’re down to 1 operational node (after having scaled down to cluster size of 2), we have to wait until the
just-failed node comes back online, the one that failed before that can’t rejoin because adding a “new” node to the cluster
Having a smaller
minimum_core_cluster_size_at_runtime is therefore a more relevant optimisation when the base/resting
cluster size is larger (e.g. 5). In that situation, having a
minimum_core_cluster_size_at_runtime of 3, rather than 5, allows the cluster to tolerate 3 failures, rather than 2, before losing write capability, i.e. provided that those 3 failures don’t happen faster than the cluster is able to vote out failing members (causal_clustering.leader_election_timeout). Using 2 instead of the default of 3 doesn’t affect the ability to tolerate 1 failure in 3. There typically however, isn’t a a good reason to have it set to 2. One may however set
minimum_core_cluster_size_at_runtime to a smaller than the total number of cores in a cluster of 5 or more.