Designing a resilient multi-data center cluster
Enterprise Edition

Overview

The goal of deploying a resilient multi-data center cluster is to achieve high availability, disaster recovery, and tolerance against the loss of a data center.

You should take into account cluster architecture and topology and decide where database primaries and secondaries are located, balancing performance and fault tolerance.

Pay attention to networking and traffic routing:

If database primaries are distant from each other, that will increase your write latency.
To commit a change, the writer primary must get confirmation from a quorum of members, including itself. If primaries are far apart, network latency adds to commit time.

Recommended cluster design patterns

Read resilience with user database secondaries

For better read performance, you can locate all database primaries in one data center (DC) and the secondaries in another DC. This setup also provides fast writes, because they will be performed within the single DC.

However, if the DC with primaries goes down, your cluster loses write availability. Though read availability may remain via the secondaries.

Recovering from the loss of a data center

You can restore the cluster write availability without the failed DC:

If you have enough secondary members of the database in another DC, you can switch their mode to primary and not have to store a copy or wait a long time for primary copies to restore.
You can use secondaries to re-seed databases if needed. See the dbms.recreateDatabase() procedure for more details.
Example recovery steps
Promote secondary copies of the system database to primaries to make the system database write-available. This requires restarting processes. For other scenarios, see the steps in the Disaster recovery guide on how to make the system database write-available again.

Mark missing servers as not available by cordoning them. For each Unavailable server, run CALL dbms.cluster.cordonServer("unavailable-server-id") on the remaining cluster.

Recreate each user database, letting it choose the existing servers as seeders. You need to accept a smaller topology that will fit in the remaining DC.

For detailed scenarios, see the Disaster recovery guide.

Geo-distribution of user database primaries

You can place each primary copy in a different data center (DC), using at least three data centers.

Therefore, if one DC fails, only a single primary member is lost, and the cluster can continue operating without data loss.

However, you always pay cross-data center latency times for every write operation.

Recovering from the loss of a data center

This setup has no loss of quorum, so the cluster keeps running — only with reduced fault tolerance (with no room for extra failures).

To restore fault tolerance, you can either wait until the affected DC is back online or start a new primary member somewhere else that will provide resilience and re-establish three-DC fault tolerance.

Example recovery steps

Start and enable a new server. See How to add a server to the cluster for details.
Remove the unavailable server from the cluster:
1. First, deallocate databases from it.
2. Then drop the server.
  
  For more information, visit the Managing servers in a cluster.

For detailed scenarios, see the Disaster recovery guide.

Exclusive geo-distribution for the `system` database

You can place all primaries for user databases in one data center (DC) and all secondaries in another.

In a third DC, deploy a server that only hosts a primary member of the system database (in addition to those in the first two data centers).

This server can be a small machine, since the system database has minimal resource requirements.
To prevent user databases from being allocated to it, set the allowedDatabases constraint to some name that will never be used.

Your writes will be fast, because they occur within the single DC.

If a DC goes down, you retain write availability for the system database, which makes restoring write availability to the user databases easier.

However, if the DC with primaries goes down, the user databases will become write-unavailable. Though read availability may still be maintained via the secondaries.

Recovering from the loss of a data center

If you lose the DC with primaries in, the user databases will go write-unavailable, though the secondaries should continue to provide read availability. Because of the third DC, the system database remains write-available, so you will be able to get the user databases back to write-available without process downtime.

However, if you need to use the dbms.recreateDatabase() procedure, it will involve downtime for the user database.

Example recovery steps

Mark missing servers as not present by cordoning them. For each Unavailable server, run CALL dbms.cluster.cordonServer("unavailable-server-id") on one of the available servers.
Recreate each user database, letting it select the existing servers as seeders. You need to accept a smaller topology that will fit in the remaining data center.

For detailed scenarios, see the Disaster recovery guide.

Cluster design patterns to avoid

Two data centers with unbalanced membership

Suppose, you decide to set up just two data centers, placing two primaries in data center 1 (DC1) and one primary in the data center 2 (DC2).

If the writer primary is located in DC1, then writes can be fast because a local quorum can be reached.

This setup can tolerate the loss of one data center — but only if the failure is in DC2. If DC1 fails, you lose two primary members, which means the quorum is lost and the cluster becomes unavailable for writes.

Keep in mind that any issue could push the system back to cross–data center write latencies. Worse, because of the latency, the member in DC2 may fall behind. In that case a failure of a member in DC1 means the database is write-unavailable until the DC2 member has caught up.

If leadership shifts to DC2, this makes all writes slow.

Finally, there is no guarantee against data loss if DC1 goes down. Because the primary member in DC2 may not be up to date with writes, even in append.

Two data centers with balanced membership

The worst scenario is to operate with just two data centers and place two or three primaries in each of them.

This means the failure of either data center leads to loss of quorum and, therefore, to loss of the cluster write-availability.

Besides, all writes have to pay the cross-data center latency cost.

This design pattern is strongly recommended to avoid.

Summary

Table 1. Comparison of cluster designs
Setup	Design	Pros	Cons	Best use case
Recommended patterns
Secondaries for read resilience	Primaries in one data center, secondaries in other data centers	Fast writes (local quorum). Local reads in remote data centers.	Loss of write availability if DC with primaries fails. Recovery requires reseeding. Process restarts required if DC with primaries fails.	Applications needing fast writes. The cluster can tolerate downtime during recovery.
Geo-distributed data centers (3DC)	Each primary in a different data center (≥3).	Survives loss of one DC without data loss. Quorum remains intact.	Higher write latency (cross-data center).	Critical systems needing continuous availability even if a full data center fails.
Full geo-distribution for the `system` database only (3DC)	User database primaries in one DC, secondaries in another, `system` primaries across three data centers	Fast user database writes (local). The `system` database is always available, which means smoother recovery. Reads available if primaries fail.	Loss of user database writes if DC with primaries fails. Recovery requires reseeding.	Balanced approach: fast normal operations, easier recovery, some downtime acceptable.
Non-recommended patterns
Two DCs – Unbalanced membership	Two primaries are in DC1, one primary is in DC2.	Fast writes if a leader is in DC1.	Quorum lost if DC1 fails. Risk of data loss. Cross-DC latency if leader is in DC2.	Should be avoided.
Two DCs – Balanced membership	Equal primaries in two DCs.	(none significant)	Quorum lost if either DC fails. All writes pay cross-DC latency.	Should be avoided.

Designing a resilient multi-data center clusterEnterprise Edition

Overview

Recommended cluster design patterns

Read resilience with user database secondaries

Recovering from the loss of a data center

Geo-distribution of user database primaries

Recovering from the loss of a data center

Exclusive geo-distribution for the system database

Recovering from the loss of a data center

Cluster design patterns to avoid

Two data centers with unbalanced membership

Two data centers with balanced membership

Summary

Designing a resilient multi-data center cluster
Enterprise Edition

Exclusive geo-distribution for the `system` database