Leadership, routing, and load balancing

Elections and leadership

The servers in a cluster use the Raft protocol to ensure consistency and safety. An implementation detail of Raft is that it uses a Leader role to impose an ordering on an underlying log with other instances acting as Followers which replicate the leader’s state. Specifically in Neo4j, this means that writes to the database are ordered by the server currently playing the Leader role for the respective database.

Only servers hosting a database in primary mode can be elected leaders for that database, provided that the cluster contains more than one primary server. If a Neo4j DBMS cluster contains multiple databases, each one of those databases operates within a logically separate Raft group, and therefore each has an individual leader. This means that a server may act both as Leader for some databases, and as Follower for other databases.

If a follower has not heard from the leader for a while, then it can initiate an election and attempt to become the new leader. The follower makes itself a Candidate and asks other servers to vote for it. If it can get a majority of the votes, then it assumes the leader role. Servers do not vote for a candidate which is less up-to-date than itself. There can only be one leader at any time per database, and that leader is guaranteed to have the most up-to-date log.

Elections are expected to occur during the normal running of a cluster and they do not pose an issue in and of itself. If you are experiencing frequent re-elections and they are disturbing the operation of the cluster then you should try to figure out what is causing them. Some common causes are environmental issues (e.g. a flaky networking) and work overload conditions (e.g. more concurrent queries and transactions than the hardware can handle).

Leadership balancing

Write transactions are always routed to the leader for the respective database. As a result, unevenly distributed leaderships may cause write queries to be disproportionately directed to a subset of servers. By default, Neo4j avoids this by automatically transferring database leaderships so that they are evenly distributed throughout the cluster. Additionally, Neo4j automatically transfers database leaderships away from instances where those databases are configured to be read-only using server.databases.read_only or similar.

Client-side routing

Client-side routing is when the database client takes control over which cluster member to send specific requests to. Typically this would be to make sure that write operations are sent to the server that can write for the target database, and that read operations are sent to other servers.

Client-side routing is based on getting a routing table from a cluster member, and then using that information to make the routing decisions. A routing table contains information about the writers, readers, and routers for a specific database. There is usually one writer, though there may be none if the database is read only or unhealthy. With the default configuration, all other servers that host the database are considered readers, i.e. the writer is not in the list of readers. This is to let it focus on the write load and not have to manage two kinds of interactions. Typically, all servers that host the database are listed as routers, which are servers that can be contacted to get a new routing table for that database.

Neo4j Drivers retrieve a routing table the first time they attempt to connect to a database, and fetch a fresh one after the configured time-to-live, or if it seems the routing table has got out of date. For example, if the routing table lists server-3 as the writer for the database, but write requests get rejected with a not able to write error, the driver may decide to get a new routing table, because the writer could be a different server.

For lower level details about getting routing tables, refer to the Bolt protocol documentation.

Routing policies

You can control the routing table that servers provide by using routing policies. Policies filter the full set of possible servers for each category according to the rules you define. For example, this can be used to preferentially route to a local data centre, or to specific large machines, depending on your policies.

Server-side routing

Server-side routing is a complement to the client-side routing.

In a cluster deployment of Neo4j, Cypher queries may be directed to a cluster member that is unable to run the given queries. With server-side routing enabled, such queries are rerouted internally to a cluster member that is expected to be able to run them. This situation can occur for write-transaction queries when they address a database for which the receiving cluster member is not the leader.

The cluster role for cluster members is per database. Thus, if a write-transaction query is sent to a cluster member that is not the leader for the specified database (specified either via the Bolt Protocol or with Cypher USE clause), server-side routing is performed if properly configured.

Server-side routing is enabled by the DBMS, by setting dbms.routing.enabled=true for each cluster member. The listen address (server.routing.listen_address) and advertised address (server.routing.advertised_address) also need to be configured for server-side routing communication.

Client connections need to state that server-side routing should be used and this is available for Neo4j Drivers and HTTP API.

Neo4j Drivers can only use server-side routing when the neo4j:// URI scheme is used. The Drivers do not perform any routing when the bolt:// URI scheme is used, instead connecting directly to the specified host.

On the cluster-side you must fulfill the following prerequisites to make server-side routing available:

Set dbms.routing.enabled=true on each member of the cluster.
Configure server.routing.listen_address, and provide the advertised address using server.routing.advertised_address on each member.
Optionally, you can set dbms.routing.default_router=SERVER on each member of the cluster.

The last prerequisite enforces server-side routing on the clients by sending out a routing table with exactly one entry to the client. Therefore, dbms.routing.default_router=SERVER configures a cluster member to make its routing table behave like a standalone instance. The implication is that if a Neo4j Driver connects to this cluster member, then the Neo4j Driver sends all requests to that cluster member. Please note that the default configuration for dbms.routing.default_router is dbms.routing.default_router=CLIENT. See dbms.routing.default_router for more information.

The HTTP-API of each member benefits from these settings automatically.

Server-side routing connector configuration

Rerouted queries are communicated over the Bolt Protocol using a designated communication channel. The receiving end of the communication is configured using the following settings:

Server-side routing driver configuration

Server-side routing uses the Neo4j Java driver to connect to other cluster members. This driver is configured with settings of the format:

dbms.routing.driver.<setting>

Server-side routing encryption

Encryption of server-side routing communication is configured by the cluster SSL policy. For more information, see Cluster Encryption.