Designing a resilient multi-data center clusterEnterprise Edition
Overview
The goal of deploying a resilient multi-data center cluster is to achieve high availability, disaster recovery, and tolerance against the loss of a data center.
You should take into account cluster architecture and topology and decide where database primaries and secondaries are located, balancing performance and fault tolerance.
Pay attention to networking and traffic routing:
-
If database primaries are distant from each other, that will increase your write latency.
-
To commit a change, the writer primary must get confirmation from a quorum of members, including itself. If primaries are far apart, network latency adds to commit time.
Recommended cluster design patterns
Read resilience with user database secondaries
For better read performance, you can locate all database primaries in one data center (DC) and the secondaries in another DC. This setup also provides fast writes, because they will be performed within the single DC.
However, if the DC with primaries goes down, your cluster loses write availability. Though read availability may remain via the secondaries.
Recovering from the loss of a data center
You can restore the cluster write availability without the failed DC:
-
If you have enough secondary members of the database in another DC, you can switch their mode to primary and not have to store a copy or wait a long time for primary copies to restore.
-
You can use secondaries to re-seed databases if needed. See the
dbms.recreateDatabase()
procedure for more details.- Example recovery steps
-
-
Promote secondary copies of the
system
database to primaries to make thesystem
database write-available. This requires restarting processes. For other scenarios, see the steps in the Disaster recovery guide on how to make thesystem
database write-available again. -
Mark missing servers as not available by cordoning them. For each
Unavailable
server, runCALL dbms.cluster.cordonServer("unavailable-server-id")
on the remaining cluster. -
Recreate each user database, letting it choose the existing servers as seeders. You need to accept a smaller topology that will fit in the remaining DC.
-
For detailed scenarios, see the Disaster recovery guide.
Geo-distribution of user database primaries
You can place each primary copy in a different data center (DC), using at least three data centers.
Therefore, if one DC fails, only a single primary member is lost, and the cluster can continue operating without data loss.
However, you always pay cross-data center latency times for every write operation.
Recovering from the loss of a data center
This setup has no loss of quorum, so the cluster keeps running — only with reduced fault tolerance (with no room for extra failures).
To restore fault tolerance, you can either wait until the affected DC is back online or start a new primary member somewhere else that will provide resilience and re-establish three-DC fault tolerance.
- Example recovery steps
-
-
Start and enable a new server. See How to add a server to the cluster for details.
-
Remove the unavailable server from the cluster:
-
First, deallocate databases from it.
-
Then drop the server.
For more information, visit the Managing servers in a cluster.
-
-
For detailed scenarios, see the Disaster recovery guide.
Exclusive geo-distribution for the system
database
system
database distributed across three data centersYou can place all primaries for user databases in one data center (DC) and all secondaries in another.
In a third DC, deploy a server that only hosts a primary member of the system
database (in addition to those in the first two data centers).
-
This server can be a small machine, since the
system
database has minimal resource requirements. -
To prevent user databases from being allocated to it, set the
allowedDatabases
constraint to some name that will never be used.
Your writes will be fast, because they occur within the single DC.
If a DC goes down, you retain write availability for the system
database, which makes restoring write availability to the user databases easier.
However, if the DC with primaries goes down, the user databases will become write-unavailable. Though read availability may still be maintained via the secondaries.
Recovering from the loss of a data center
If you lose the DC with primaries in, the user databases will go write-unavailable, though the secondaries should continue to provide read availability.
Because of the third DC, the system
database remains write-available, so you will be able to get the user databases back to write-available without process downtime.
However, if you need to use the dbms.recreateDatabase()
procedure, it will involve downtime for the user database.
- Example recovery steps
-
-
Mark missing servers as not present by cordoning them. For each
Unavailable
server, runCALL dbms.cluster.cordonServer("unavailable-server-id")
on one of the available servers. -
Recreate each user database, letting it select the existing servers as seeders. You need to accept a smaller topology that will fit in the remaining data center.
-
For detailed scenarios, see the Disaster recovery guide.
Cluster design patterns to avoid
Two data centers with unbalanced membership
Suppose, you decide to set up just two data centers, placing two primaries in data center 1 (DC1) and one primary in the data center 2 (DC2).
If the writer primary is located in DC1, then writes can be fast because a local quorum can be reached.
This setup can tolerate the loss of one data center — but only if the failure is in DC2. If DC1 fails, you lose two primary members, which means the quorum is lost and the cluster becomes unavailable for writes.
Keep in mind that any issue could push the system back to cross–data center write latencies. Worse, because of the latency, the member in DC2 may fall behind. In that case a failure of a member in DC1 means the database is write-unavailable until the DC2 member has caught up.
If leadership shifts to DC2, this makes all writes slow.
Finally, there is no guarantee against data loss if DC1 goes down. Because the primary member in DC2 may not be up to date with writes, even in append.
Two data centers with balanced membership
The worst scenario is to operate with just two data centers and place two or three primaries in each of them.
This means the failure of either data center leads to loss of quorum and, therefore, to loss of the cluster write-availability.
Besides, all writes have to pay the cross-data center latency cost.
This design pattern is strongly recommended to avoid.
Summary
Setup | Design | Pros | Cons | Best use case |
---|---|---|---|---|
Recommended patterns |
||||
Secondaries for read resilience |
Primaries in one data center, secondaries in other data centers |
|
|
Applications needing fast writes. The cluster can tolerate downtime during recovery. |
Geo-distributed data centers (3DC) |
Each primary in a different data center (≥3). |
|
|
Critical systems needing continuous availability even if a full data center fails. |
Full geo-distribution for the |
User database primaries in one DC, secondaries in another, |
|
|
Balanced approach: fast normal operations, easier recovery, some downtime acceptable. |
Non-recommended patterns |
||||
Two DCs – Unbalanced membership |
Two primaries are in DC1, one primary is in DC2. |
Fast writes if a leader is in DC1. |
|
Should be avoided. |
Two DCs – Balanced membership |
Equal primaries in two DCs. |
(none significant) |
|
Should be avoided. |