4.2.9. Configure for multi-data center operations

This section shows how to configure Neo4j servers so that they are topology/data center-aware. It describes the precise configuration needed to achieve a scalable multi-data center deployment.

Enabling multi-data center operation

The multi-data center functionality is separately licensed and must be specifically enabled. See the section called “Licensing for multi-data center operations” for details. Enable multi-data center operations

Before doing anything else, we must enable the multi-data center functionality. This is described in the section called “Licensing for multi-data center operations”. Server groups

In order to optimize the use of our Causal Cluster servers according to our specific requirements, we sort them into Server Groups. Server Group membership can map to data centers, availability zones, or any other significant topological elements from the operator’s domain. Server Groups can also overlap.

Server Groups are defined as a key that maps onto a set of servers in a Causal Cluster. Server Group membership is defined on each server using the causal_clustering.server_groups parameter in neo4j.conf. Each server in a Causal Cluster can belong to zero or many server groups.

Example 4.12. Definition of Server Group membership

The membership of a server group or groups can be set in neo4j.conf as in the following examples:

# Add the current instance to the groups `us` and `us-east`
# Add the current instance into the group `london`
# Add the current instance into the group `eu`

We must be aware that membership of each server group is explicit. For example, a server in the gb-london group is not automatically part of some gb or eu group unless that server is explicitly added to those groups. That is, any (implied) relationship between groups is reified only when those groups are used as the basis for requesting data from upstream systems.

Server Groups are not mandatory, but unless they are present, we cannot set up specific upstream transaction dependencies for servers. In the absense of any specifed server groups, the cluster defaults to it’s most pessimistic fallback behavior: each Read Replica will catch up from a random Core Server. Strategy plugins

Strategy plugins are sets of rules that define how Read Replicas contact servers in the cluster in order to synchronize transaction logs. Neo4j comes with a set of pre-defined strategies, and also provides a Design Specific Language, DSL, to flexibly create user-defined strategies. Finally, Neo4j supports an API which advanced users may use to enhance upstream recommendations.

Once a strategy plugin resolves a satisfactory upstream server, it is used for pulling transactions to update the local Read Replica for a single synchronization. For subsequent updates, the procedure is repeated so that the most preferred available upstream server is always resolved.

Configuring upstream selection strategy using pre-defined strategies

Neo4j ships with the following pre-defined strategy plugins:

Plugin name Resulting behavior


Connect to any Core Server selecting at random from those currently available.


Connect to any available Read Replica, but around 10% of the time connect to any random Core Server.


Connect at random to any available instance (Core Servers and Read Replicas) in any of the server groups specified in causal_clustering.server_groups.

Pre-defined strategies are used by configuring the causal_clustering.upstream_selection_strategy option. Doing so allows us to specify an ordered preference of strategies to resolve an upstream provider of transaction data. We provide a comma-separated list of strategy plugin names with preferred strategies earlier in that list. The upstream strategy is chosen by asking each of the strategies in list-order whether they can provide an upstream server from which transactions can be pulled.

Example 4.13. Define an upstream selection strategy

Consider the following configuration example:


With this configuration the instance will first try to connect to any other instance in the group(s) specified in causal_clustering.server_groups. Should we fail to find any live instances in those groups, then we will connect to a random Read Replica.

Figure 4.16. The first satisfactory response from a strategy will be used.
pipeline of strategies

To ensure that downstream servers can still access live data in the event of upstream failures, the last resort of any instance is always to contact a random Core Server. This is equivalent to ending the causal_clustering.upstream_selection_strategy configuration with connect-to-random-core-server.

Configuring user-defined strategies

Neo4j Causal Clusters support a small DSL for the configuration of client-cluster load balancing. This is described in detail in the section called “Policy definitions” and the section called “Filters”. The same DSL is used to describe preferences for how an instance binds to another instance to request transaction updates.

The DSL is made available by selecting the user-defined strategy as follows:


Once the user-defined strategy has been specified, we can add configuration to the causal_clustering.user_defined_upstream_strategy setting based on the server groups that have been set for the cluster.

We will describe this functionality with two examples:

Example 4.14. Defining a user-defined strategy

For illustrative purposes we propose four regions: north, east, south and west and within each region we have a number of data centers such as north1 or west2. We configure our server groups so that each data center maps to its own server group. Additionally we will assume that each data center fails independently from the others and that a region can act as a supergroup of its constituent data centers. So an instance in the north region might have configuration like causal_clustering.server_groups=north2,north which puts it in two groups that match to our physical topology as shown in the diagram below.

Figure 4.17. Mapping regions and data centers onto server groups
nesw regions and dcs

Once we have our server groups, our next task is to define some upstream selection rules based on them. For our design purposes, let’s say that any instance in one of the north region data centers prefers to catchup within the data center if it can, but will resort to any northern instance otherwise. To configure that behavior we add:

causal_clustering.user_defined_upstream_strategy=groups(north2); groups(north); halt()

The configuration is in precedence order from left to right. The groups() operator yields a server group from which to catch up. In this case only if there are no servers in the north2 server group will we proceed to the groups(north) rule which yields any server in the north server group. Finally, if we cannot resolve any servers in any of the previous groups, then we will stop the rule chain via halt().

Note that the use of halt() will end the rule chain explicitly. If we don’t use halt() at the end of the rule chain, then the all() rule is implicitly added. all() is expansive: it offers up all servers and so increases the likelihood of finding an available upstream server. However all() is indiscriminate and the servers it offers are not guaranteed to be topologically or geographically local, potentially increasing the latency of synchronization.

The example above shows a simple hierarchy of preferences. But we can be more sophisticated if we so choose. For example we can place conditions on the server groups from which we catch up.

Example 4.15. User-defined strategy with conditions

In this example we wish to roughly qualify cluster health before choosing from where to catch up. For this we use the min() filter as follows:

causal_clustering.user_defined_upstream_strategy=groups(north2)->min(3), groups(north)->min(3); all();

groups(north2)->min(3) states that we want to catch up from the north2 server group if it has three available machines, which we here take as an indicator of good health. If north2 can’t meet that requirement (is not healthy enough) then we try to catch up from any server across the north region provided there are at least three of them available as per groups(north)->min(3). Finally, if we cannot catch up from a sufficiently healthy north region, then we’ll (explicitly) fall back to the whole cluster with all().

The min() filter is a simple but reasonable indicator of server group health.

Building upstream strategy plugins using Java

Neo4j supports an API which advanced users may use to enhance upstream recommendations in arbitrary ways: load, subnet, machine size, or anything else accessible from the JVM. In such cases we are invited to build our own implementations of org.neo4j.causalclustering.readreplica.UpstreamDatabaseSelectionStrategy to suit our own needs, and register them with the strategy selection pipeline just like the pre-packaged plugins.

We have to override the org.neo4j.causalclustering.readreplica.UpstreamDatabaseSelectionStrategy#upstreamDatabase() method in our code. Overriding that class gives us access to the following items:

Resource Description


This is a directory service which provides access to the addresses of all servers and server groups in the cluster.


This provides the configuration from neo4j.conf for the local instance. Configuration for our own plugin can reside here.


This provides the unique cluster MemberId of the current instance.

Once our code is written and tested, we have to prepare it for deployment. UpstreamDatabaseSelectionStrategy plugins are loaded via the Java Service Loader. This means when we package our code into a jar file, we’ll have to create a file META-INF.services/org.neo4j.causalclustering.readreplica.UpstreamDatabaseSelectionStrategy in which we write the fully qualified class name(s) of the plugins, e.g. org.example.myplugins.PreferServersWithHighIOPS.

To deploy this jar into the Neo4j server we simply copy it into the plugins directory and restart the instance.

Favoring data centers

In Section, “Bias cluster leadership with follower-only instances” we saw how we can bias the leadership credentials of instances in a cluster. Generally speaking this is a rare occurence in a homogenous cluster inside a single data center.

In a multi-DC scenario, while it remains a rare occurence, it does allow expert operators to bias which data centers are used to host Raft leaders (and thus where writes are directed). We apply causal_clustering.refuse_to_be_leader=true in those data centers where we do not want leaders to materialize. In doing so we implicitly prefer the instances where we have not applied that setting.

This may be useful when planning for highly distributed multi-data center deployments. However this must be very carefully considered because in failure scenarios it limits the availability of the cluster. It is advisable to engage Neo4j professional services to help design a suitably resilient topology.