When to use Kafka Connect vs. Neo4j Streams as a Plugin

This section covers how to decide whether to run as a Kafka Connect worker, or as a Neo4j Plugin.

Kafka Connect

Pros

  • Processing is outside of Neo4j so that memory & CPU impact doesn’t impact Neo4j. You don’t need to size the database with Kafka utilization in mind.

  • Much easier for Kafka pros to manage; they benefit from the Confluent ecosystem, such as connecting the REST API to manipulate connectors, the control center to administer & monitor them.

  • By restarting the worker, you can restart your sink strategy without having downtime for Neo4j.

  • Upgrade Neo4j-Streams without restarting the cluster Strictly an external bolt client, so better overall security management of plugin actions.

Cons

  • You can’t do TransactionEventHandlers from outside of the database, so you can only sink to Neo4j, you can’t produce from it.

  • If you’re using Confluent Cloud, you can’t host the connector in the cloud (yet). So this requires a 3rd piece of architecture: Confluent Cloud, Neo4j, and the Connect Worker (usually a separate VM)

  • Possibly worse throughput due to bolt latency & overhead, and separate network hop.

Neo4j-Streams Plugin

Pros

  • Much easier for Neo4j pros to manage

  • You can produce records back out to Kafka

  • You can use Neo4j procedures so that "custom produce" (see later section) becomes an option.

  • Possibly better throughput, because you don’t have bolt latency / overhead

Cons

  • Memory & CPU consumption on your Neo4j Server

  • Requires restarting Neo4j in order to update your configuration and/or Cypher.

  • Upgrading plugin requires cluster restart

  • Need to track config to be identical across all members in the cluster

  • Lesser ability to manage the plugin because it is running inside of the database and not under a particular user account.