This section gives an introduction to Neo4j Causal Clustering.
Neo4j’s Causal Clustering provides three main features:
Together, this allows the end-user system to be fully functional and both read and write to the database in the event of multiple hardware and network failures and makes reasoning about database interactions straightforward.
In the remainder of this section we will provide an overview of how causal clustering works in production, including both operational and application aspects.
From an operational point of view, it is useful to view the cluster as being composed from its two different roles: Core and Read replica.
The two roles are foundational in any production deployment but are managed at different scales from one another and undertake different roles in managing the fault tolerance and scalability of the overall cluster.
Core Servers' main responsibility is to safeguard data.
The Core Servers do so by replicating all transactions using the Raft protocol.
Raft ensures that the data is safely durable before confirming transaction commit to the end user application.
In practice this means once a majority of Core Servers in a cluster (
N/2+1) have accepted the transaction, it is safe to acknowledge the commit to the end user application.
The safety requirement has an impact on write latency. Implicitly writes will be acknowledged by the fastest majority, but as the number of Core Servers in the cluster grows so do the size of the majority needed to acknowledge a write.
In practice this means that there are relatively few machines in a typical Core Server cluster, enough to provide sufficient
fault tolerance for the specific deployment.
This is simply calculated with the formula
M = 2F + 1 where
M is the number of Core Servers required to tolerate
For example, in order tolerate 2 failed Core Servers we would need to deploy a cluster of 5.
Note that should the Core Server cluster suffer enough failures that it can no longer process writes, it will become read-only to preserve safety.
Read Replicas' main responsibility is to scale out graph workloads (Cypher queries, procedures, and so on). Read Replicas act like caches for the data that the Core Servers safeguard, but they are not simple key-value caches. In fact Read Replicas are fully-fledged Neo4j databases capable of fulfilling arbitrary (read-only) graph queries and procedures.
Read Replicas are asynchronously replicated from Core Servers via transaction log shipping. Periodically (usually in the ms range) a Read replica will poll a Core Server for any new transactions that it has processed since the last poll, and the Core Server will ship those transactions to the Read replica. Many Read Replicas can be fed data from a relatively small number of Core Servers, allowing for a large fan out of the query workload for scale.
Unlike Core Servers however, Read Replicas do not participate in decision making about cluster topology. Read Replicas should be typically run in relatively large numbers and treated as disposable. Losing a Read replica does not impact the cluster’s availability, aside from the loss of its fraction of graph query throughput. It does not affect the fault tolerance capabilities of the cluster.
While the operational mechanics of the cluster are interesting from an application point of view, it is also helpful to think about how applications will use the database to get their work done. In an application we typically want to read from the graph and write to the graph. Depending on the nature of the workload we usually want reads from the graph to take into account previous writes to ensure causal consistency.
Causal consistency is one of numerous consistency models used in distributed computing. It ensures that causally related operations are seen by every instance in the system in the same order. Client applications never see stale data and (logically) interact with the database as if it was a single server. Consequently client applications enjoy read-your-own-writes semantics making interaction with even large clusters simple and predictable.
Causal consistency makes it easy to write to Core Servers (where data is safe) and read those writes from a Read Replica (where graph operations are scaled out). For example, causal consistency guarantees that the write which created a user account will be present when that same user subsequently attempts to log in.
On executing a transaction, the client can ask for a bookmark which it then presents as a parameter to subsequent transactions. Using that bookmark the cluster can ensure that only servers which have processed the client’s bookmarked transaction will run its next transaction. This provides a causal chain which ensures correct read-after-write semantics from the client’s point of view.
Aside from the bookmark everything else is handled by the cluster. The database drivers work with the cluster topology manager to choose the most appropriate Core Servers and Read Replicas to provide high quality of service.
In this section we have examined Causal Clustering at a high level from an operational and an application development point of view. We now understand that the Core Servers in the cluster are responsible for the long-term safekeeping of data while the more numerous Read Replicas are responsible for scaling out graph query workloads. Reasoning about this powerful architecture is greatly simplified by the Neo4j drivers which abstract the cluster topology to easily provide read levels like causal consistency.