Leadership, routing and load balancing

Elections and leadership

The servers in a cluster use the Raft protocol to ensure consistency and safety. An implementation detail of Raft is that it uses a Leader role to impose an ordering on an underlying log with other instances acting as Followers which replicate the leader’s state. Specifically in Neo4j, this means that writes to the database are ordered by the server currently playing the Leader role for the respective database.

Only servers hosting a database in primary mode can be elected leaders for that database, provided that the cluster contains more than one primary server. If a Neo4j DBMS cluster contains multiple databases, each one of those databases operates within a logically separate Raft group, and therefore each has an individual leader. This means that a server may act both as Leader for some databases, and as Follower for other databases.

If a follower has not heard from the leader for a while, then it can initiate an election and attempt to become the new leader. The follower makes itself a Candidate and asks other servers to vote for it. If it can get a majority of the votes, then it assumes the leader role. Servers do not vote for a candidate which is less up-to-date than itself. There can only be one leader at any time per database, and that leader is guaranteed to have the most up-to-date log.

Elections are expected to occur during the normal running of a cluster and they do not pose an issue in and of itself. If you are experiencing frequent re-elections and they are disturbing the operation of the cluster then you should try to figure out what is causing them. Some common causes are environmental issues (e.g. a flaky networking) and work overload conditions (e.g. more concurrent queries and transactions than the hardware can handle).

Leadership balancing

Write transactions are always routed to the leader for the respective database. As a result, unevenly distributed leaderships may cause write queries to be disproportionately directed to a subset of servers. By default, Neo4j avoids this by automatically transferring database leaderships so that they are evenly distributed throughout the cluster. Additionally, Neo4j automatically transfers database leaderships away from instances where those databases are configured to be read-only using server.databases.read_only or similar.

Server-side routing

Server-side routing is a complement to the client-side routing, performed by a Neo4j Driver.

In a cluster deployment of Neo4j, Cypher queries may be directed to a cluster member that is unable to run the given queries. With server-side routing enabled, such queries are rerouted internally to a cluster member that is expected to be able to run them. This situation can occur for write-transaction queries when they address a database for which the receiving cluster member is not the leader.

The cluster role for cluster members is per database. Thus, if a write-transaction query is sent to a cluster member that is not the leader for the specified database (specified either via the Bolt Protocol or by the Cypher syntax: USE clause), server-side routing is performed if properly configured.

Server-side routing is enabled by the DBMS, by setting dbms.routing.enabled=true for each cluster member. The listen address (server.routing.listen_address) and advertised address (server.routing.advertised_address) also need to be configured for server-side routing communication.

Client connections need to state that server-side routing should be used and this is available for Neo4j Drivers and HTTP API.

Neo4j Drivers can only use server-side routing when the neo4j:// URI scheme is used. The Drivers do not perform any routing when the bolt:// URI scheme is used, instead connecting directly to the specified host.

On the cluster-side you must fulfill the following prerequisites to make server-side routing available:

  • Set dbms.routing.enabled=true on each member of the cluster.

  • Configure server.routing.listen_address, and provide the advertised address using server.routing.advertised_address on each member.

  • Optionally, you can set dbms.routing.default_router=SERVER on each member of the cluster.

The last prerequisite enforces server-side routing on the clients by sending out a routing table with exactly one entry to the client. Therefore, dbms.routing.default_router=SERVER configures a cluster member to make its routing table behave like a standalone instance. The implication is that if a Neo4j Driver connects to this cluster member, then the Neo4j Driver sends all requests to that cluster member. Please note that the default configuration for dbms.routing.default_router is dbms.routing.default_router=CLIENT. See dbms.routing.default_router for more information.

The HTTP-API of each member benefits from these settings automatically.

Server-side routing connector configuration

Rerouted queries are communicated over the Bolt Protocol using a designated communication channel. The receiving end of the communication is configured using the following settings:

Server-side routing driver configuration

Server-side routing uses the Neo4j Java driver to connect to other cluster members. This driver is configured with settings of the format:

Server-side routing encryption

Encryption of server-side routing communication is configured by the cluster SSL policy. For more information, see Cluster Encryption.