Using with Neo4j Causal Cluster

Overview

Neo4j Clustering is a feature available in Enterprise Edition which allows high availability of the database through having multiple database members.

Neo4j Enterprise uses a LEADER/FOLLOWER operational view, where writes are always processed by the leader, while reads can be serviced by either followers, or optionally be read replicas, which maintain a copy of the database and serve to scale out read operations horizontally.

Kafka Connect

When using Neo4j Streams in this method with a Neo4j cluster, the most important consideration is to use a routing driver. Generally, these are identified by a URI with bolt+routing:// as the scheme instead of just bolt:// as the scheme.

If your Neo4j cluster is located at graph.mycompany.com:7687, simply configure the Kafka Connect worker with

neo4j.server.uri=bolt+routing://graph.mycompany.com:7687

The use of the bolt+routing driver will mean that the Neo4j Driver itself will handle connecting to the correct cluster member, and managing changes to the cluster membership over time.

For further information on routing drivers, see the Neo4j Driver Manual.

Neo4j Plugin

When using Neo4j Streams as a plugin together with a cluster, there are several things to keep in mind:

  • The plugin must be present in the plugins directory of all cluster members, and not just one.

  • The configuration settings must be present in:

    • all neo4j.conf files for versions prior of 4.0.7, and not just one;

    • all streams.conf files for versions since 4.0.7, and not just one.

Through the course of the cluster lifecycle, the leader may change; for this reason the plugin must be everywhere, and not just on the leader.

The plugin detects the leader, and will not attempt to perform a write (i.e. in the case of the consumer) on a follower where it would fail. The plugin checks cluster toplogy as needed.

Additionally for CDC, a consideration to keep in mind is that as of Neo4j 3.5, committed transactions are only published on the leader as well. In practical terms, this means that as new data is committed to Neo4j, it is the leader that will be publishing that data back out to Kafka, if you have the producer configured.

The neo4j streams utility procedures, in particular CALL streams.publish, can work on any cluster member, or read replica. CALL streams.consume may also be used on any cluster member, however it is important to keep in mind that due to the way clustering in Neo4j works, using streams.consume together with write operations will not work on a cluster follower or read replica, as only the leader can process writes.

Remote Clients

Sometimes there will be remote applications that talk to Neo4j via official drivers, that want to use streams functionality. Best practices in these cases are:

  • Always use a bolt+routing:// driver URI when communicating with the cluster in the client application.

  • Use Explicit Write Transactions in your client application when using procedure calls such as CALL streams.consume to ensure that the routing driver routes the query to the leader.