Multi-data center routing

Introduction

When deploying a multi-data center cluster it is often desirable to take advantage of locality to reduce latency and improve performance. For example, it is preferrable that graph-intensive workloads are executed in the local data center at LAN latencies rather than in a faraway data center at WAN latencies. Neo4j’s load balancing and catchup strategy plugins for multi-data center scenarios facilitates precisely this.

Neo4j’s load balancing is a cooperative system where the driver asks the cluster on a recurring basis where it should direct the different classes of its workload (e.g., writes and reads). This allows the driver to work independently for long stretches of time, yet check back from time to time to adapt to changes such as a new server having been added for increased capacity. There are also failure situations where the driver asks again immediately, when it cannot use any of its allocated servers for example.

This is mostly transparent from the perspective of a client. On the server side, the load balancing behaviors are configured using a simple Domain Specific Language, DSL, and exposed under a named load balancing policy which the driver can bind to. All server-side configuration is performed on the Primary servers.

Catchup strategy plugins are sets of rules that define how secondary servers contact upstream servers in the cluster in order to synchronize transaction logs. Neo4j comes with a set of pre-defined strategies, and also user-defined strategies can be created using the same DSL. Finally, Neo4j supports an API which advanced users may use to enhance upstream recommendations.

Once a catchup strategy plugin resolves a satisfactory upstream server, it is used for pulling transactions to update the local secondary for a single synchronization. For subsequent updates, the procedure is repeated so that the most preferred available upstream server is always resolved.

Prerequisite configuration

Server tags

Both load balancing across multiple data centers and user-defined catchup strategies are predicated on the Server Tag concept.

In order to optimize the use of the cluster’s servers according to the specific requirements, they are sorted using Server Tags. Server tags can map to data centers, availability zones, or any other significant topological elements from the operator’s domain, e.g., us, us-east. Applying the same tag to multiple servers logically groups them together. Note that servers can have mulitple tags.

Server tags are defined as a key that maps onto a set of servers in a cluster. Server tags are defined on each server using the initial.server.tags parameter in neo4j.conf. Each server in a cluster can be tagged with to zero or more server tags. Server tags can be altered at runtime via the ALTER SERVER command, see Altering server options for more details.

Example 1. Definition of grouping servers using server tags

Grouping servers using server tags is achieved in neo4j.conf as in the following examples:

Tag the current instance with us and us-east
initial.server.tags=us,us-east
Tag the current instance with london
initial.server.tags=london
Tag the current instance with eu
initial.server.tags=eu

Note that membership of a group implied by a server tag is explicit. For example, a server tagged with gb-london is not automatically part of the same tag group as a server that is tagged with gb or eu unless that server is also explicitly tagged with those tags.

Primaries for reading

Depending on the deployment and the available number of servers in the cluster, different strategies make sense for whether or not the reading workload should be routed to the primary servers. The following configuration allows the routing of read workload to primary servers. Valid values are true and false.

dbms.routing.reads_on_primaries_enabled=true

The load balancing framework

There are different topology-aware load balancing options available for client applications in a multi-data center Neo4j deployment. There are different ways to configure the load balancing for the cluster so that client applications can direct its workload at the most appropriate cluster members, such as those nearby.

The load balancing system is based on a plugin architecture for future extensibility and for allowing user customizations. The current version ships with exactly one such canned plugin called the server policies plugin.

The server policies plugin is selected by setting the following property:

dbms.routing.load_balancing.plugin=server_policies

Under the server policies plugin, a number of load balancing policies can be configured server-side and be exposed to drivers under unique names. The drivers, in turn, must on instantiation select an appropriate policy by specifying its name. Common patterns for naming policies are after geographical regions or intended application groups.

It is crucial to define the exact same policies on all servers since this is to be regarded as cluster-wide configuration and failure to do so leads to unpredictable behavior. Similarly, policies in active use should not be removed or renamed since it breaks applications trying to use these policies. It is perfectly acceptable and expected however, that policies be modified under the same name.

If a driver asks for a policy name that is not available, then the driver is not able to use the cluster. A driver that does not specify any name at all gets the behavior of the default policy as configured. The default policy, if left unchanged, distributes the load across all servers. It is possible to change the default policy to any behavior that a named policy can have.

A misconfigured driver or load balancing policy results in suboptimal routing choices and can even prevent successful interactions with the cluster entirely.

The details of how to write a custom plugin are not documented here. Please contact Neo4j Professional Services if you think that you need a custom plugin.

Use load balancing from Neo4j drivers

Once enabled and configured, the custom load balancing feature is used by drivers to route traffic as intended. See the Neo4j Drivers manuals for instructions on how to configure drivers to use custom load balancing.

Policy definitions

The configuration of load balancing policies is transparent to client applications and expressed via a simple DSL. The syntax consists of a set of rules which are considered in order. The first rule to produce a non-empty result is the final result.

rule1; rule2; rule3

Each rule in turn consists of a set of filters which limit the considered servers, starting with the complete set. Note that the evaluation of each rule starts fresh with the complete set of available servers.

There is a fixed set of filters which composes a rule and they are chained together using arrows.

filter1 -> filter2 -> filter3

If there are any servers still left after the last filter then the rule evaluation has produced a result and this is returned to the driver. However, if there are no servers left then the next rule is considered. If no rule is able to produce a usable result then the driver is signalled a failure.

Policy names

The policies are configured under the namespace of the server_policies plugin and named as desired. Policy names can contain alphanumeric characters and underscores, and they are case sensitive. Below is the property key for a policy with the name mypolicy.

dbms.routing.load_balancing.config.server_policies.mypolicy=

The actual policy is defined in the value part using the DSL.

The default policy name is reserved for the default policy. It is possible to configure this policy like any other and it is used by driver clients that do not specify a policy.

Additionally, any number of policies can be created using unique policy names. The policy name can suggest a particular region or an application for which it is intended to be used.

Filters

There are four filters available for specifying rules, detailed below. The syntax is similar to a method call with parameters.

  • tags(name1, name2, …​)

    • Only servers that are tagged with any of the specified tags pass the filter.

    • The defined names must match those of the server tags.

    • Prior to 5.4 tags() were referred to as groups(), which continue to work but are now deprecated.

  • min(count)

    • Only the minimum amount of servers are allowed to pass (or none).

    • Allows overload conditions to be managed.

  • all()

    • No need to specify since it is implicit at the beginning of each rule.

    • Implicitly the last rule (override this behavior using halt).

  • halt()

    • Only makes sense as the last filter in the last rule.

    • Stops the processing of any more rules.

The tags filter is essentially an OR-filter, e.g. tags(A,B) which passes any server in with either tag A, B or both (the union of the server tags). An AND-filter can also be created by chaining two filters as in tags(A) -> tags(B), which only passes servers with both tags (the intersect of the server tags).

Load balancing examples

The discussion on multi-data center clusters introduced a four region, multi-data center setup. The cardinal compass points for regions and numbered data centers within those regions were used there and the same hypothetical setup is used here as well.

nesw regions and dcs
Figure 1. Mapping regions and data centers onto server tags

The behavior of the load balancer is configured in the property dbms.routing.load_balancing.config.server_policies.<policy-name>. The specified rules allows for fine-tuning how the cluster routes requests under load.

The examples make use of the line continuation character \ for better readability. It is valid syntax in neo4j.conf as well and it is recommended to break up complicated rule definitions using this and a new rule on every line.

The most restrictive strategy is to insist on a particular data center to the exclusion of all others:

Example 2. Specific data center only
dbms.routing.load_balancing.config.server_policies.north1_only=\
tags(north1)->min(2); halt();

This case states that the intention is to send queries to servers tagged with north1, which maps onto a specific physical data center, provided there are two of them available. If at least two servers tagged with north1 cannot be provided, then the operation should halt(), i.e. not try any other data center.

While the previous example demonstrates the basic form of load balancing rules, it is possible to be a little more expansive:

Example 3. Specific data center preferably
dbms.routing.load_balancing.config.server_policies.north1=\
tags(north1)->min(2);

In this case if at least two servers are tagged with north1 then the load is balanced across them. Otherwise, any server in the whole cluster is used, falling back to the implicit, final all() rule.

The previous example considered only a single data center before resorting to the whole cluster. If there is a hierarchy or region concept exposed through the server groups, the fall back can be more graceful:

Example 4. Gracefully falling back to neighbors
dbms.routing.load_balancing.config.server_policies.north_app1=\
tags(north1,north2)->min(2);\
tags(north);\
all();

This example says that the cluster should load balance across servers with the north1 and north2 tags provided there are at least two machines available across them. Failing that, any server in the north region can be used, and if the whole of the north is offline, any server in the cluster can be used.

Catchup strategy plugins

Catchup strategy plugins are sets of rules that define how secondaries contact upstream servers in the cluster in order to synchronize transaction logs. Neo4j comes with a set of pre-defined strategies, and also leverages the DSL to flexibly create user-defined strategies. Finally, Neo4j supports an API which advanced users may use to enhance upstream server recommendations.

Once a catchup strategy plugin resolves a satisfactory upstream server, it is used for pulling transactions to update the local secondary for a single synchronization. For subsequent updates, the procedure is repeated so that the most preferred available upstream server is always resolved.

Configuring upstream selection strategy using pre-defined catchup strategies

Neo4j ships with the following pre-defined catchup strategy plugins. These provide coarse-grained algorithms for selecting an upstream server:

Plugin name Resulting behavior

connect-to-random-primary-server

Connect to any primary server selecting at random from those currently available.

typically-connect-to-random-secondary

Connect to any available secondary server, but around 10% of the time connect to any random primary server.

connect-randomly-to-server-tags

Connect at random to any available secondary server tagged with any of the server tags specified in the comma-separated list server.cluster.catchup.connect_randomly_to_server_tags.

leader-only

Connect only to the current Raft leader of the primary servers.

connect-randomly-to-server-group

Connect at random to any available secondary server in the server groups specified in the comma-separated list server.cluster.catchup.connect_randomly_to_server_group. Deprecated, please use connect-randomly-to-server-tags.

connect-randomly-within-server-group

Connect at random to any available secondary server in any of the server groups to which this server belongs. Deprecated, please use connect-randomly-to-server-tags.

Pre-defined strategies are used by configuring the server.cluster.catchup.upstream_strategy option. Doing so allows for specification of an ordered preference of strategies to resolve an upstream provider of transaction data. A comma-separated list of strategy plugin names with preferred strategies is provided earlier in that list. The catchup strategy is selected by asking each of the strategies in list-order whether they can provide an upstream server from which transactions can be pulled.

Example 5. Define an upstream server selection strategy

Consider the following configuration example:

server.cluster.catchup.upstream_strategy=connect-randomly-to-server-tags,typically-connect-to-random-secondary

With this configuration the secondary server first tries to connect to any other server with tag(s) specified in server.cluster.catchup.connect_randomly_to_server_tags. Should it fail to find any live servers with those tags, then it connects to a random secondary server.

pipeline of strategies
Figure 2. The first satisfactory response from a strategy will be used.

To ensure that downstream servers can still access live data in the event of upstream failures, the last resort of any server is always to contact a random primary server. This is equivalent to ending the server.cluster.catchup.upstream_strategy configuration with connect-to-random-primary-server.

Configuring user-defined catchup strategies

Neo4j clusters support a small DSL for the configuration of client-cluster load balancing. This is described in detail in Domain Specific Language and Filters. The same DSL is used to describe preferences for how a server binds to another server to request transaction updates.

The DSL is made available by selecting the user-defined catchup strategy as follows:

server.cluster.catchup.upstream_strategy=user-defined

Once the user-defined strategy has been specified, we can add configuration to the server.cluster.catchup.user_defined_upstream_strategy setting based on the server tags that have been set for the cluster.

This functionality is described with two examples:

Example 6. Defining a user-defined strategy

For illustrative purposes four regions are proposed: north, south, east, and west and within each region there is a number of data centers such as north1 or west2. The server tags are configured so that each data center maps to its own server tag. Additionally it is assumed that each data center fails independently from the others and that a region can act as a supergroup of its constituent data centers. So a server in the north region might have configuration like initial.server.tags=north2,north which puts it in two groups that match to our physical topology as shown in the diagram below.

nesw regions and dcs
Figure 3. Mapping regions and data centers onto server tags

Once the servers are tagged, the next task is to define some upstream selection rules based on them. For design purposes, assume that any server in one of the north region data centers prefers to catchup within the data center if it can, but resorts to any northern instance otherwise. To configure that behavior, add:

server.cluster.catchup.user_defined_upstream_strategy=tags(north2); tags(north); halt()

The configuration is in precedence order from left to right. The tags() operator yields a server tag from which to catchup. In this case, only if there are no servers tagged with north2 does the operation proceed to the tags(north) rule which yields any server tagged with north. Finally, if no servers can be resolved with any of the previous tags, then the rule chain is stopped via halt().

Note that the use of halt() ends the rule chain explicitly. If a halt() is not used at the end of the rule chain, then the all() rule is implicitly added. all() is expansive: it offers up all servers and so increases the likelihood of finding an available upstream server. However all() is indiscriminate and the servers it offers are not guaranteed to be topologically or geographically local, potentially increasing the latency of synchronization.

The example above shows a simple hierarchy of preferences expressed through the use of server tags. But the hierarchy can be more sophisticated. For example, conditions can be placed on the tagged catchup servers.

Example 7. User-defined strategy with conditions

In this example it is desired to roughly qualify cluster health before selecting from where to catchup. For this, the min() filter is used as follows:

server.cluster.catchup.user_defined_upstream_strategy=tags(north2)->min(3), tags(north)->min(3); all();

tags(north2)->min(3) states that catchup from servers tagged with north2 should be performed only if there are three available servers, which here is interpreted as an indicator of good health. If north2 can’t meet that requirement then catchup should be attempted from any server tagged with north provided there are at least three of them available as per tags(north)->min(3). Finally, if catchup cannot be performed from a sufficiently healthy north region, then the operation (explicitly) falls back to the whole cluster with all().

The min() filter is a simple but reasonable health indicator of a set of servers with the same tag.

Favoring data centers

In a multi-data center scenario, while it remains a rare occurrence, it is possible to bias where writes for the specified database should be directed. db.cluster.raft.leader_transfer.priority_tag can be applied to specify a set of servers with a given tag which should have priority when selecting the leader for a given database. The priority tag can be set on one or multiple databases and it means that the cluster attempts to keep the leadership for the configured database on a server tagged with the configured server tag.

A database for which db.cluster.raft.leader_transfer.priority_tag has been configured is excluded from the automatic balancing of leaderships across a cluster. It is therefore recommended to not use this configuration unless it is necessary.