Optimizing Kafka

The documentation on the deprecated Neo4j Streams plugin can be found here.

Neo4j can’t ingest fast if Kafka isn’t set up correctly. While this isn’t a common source of problems, it has come up. Confluent has good overall documentation on optimizing Kafka that is worth being familiar with.

The main trade-offs are these, and they have to make sense at the Kafka layer before they can make sense for Neo4j.

  • Do you want to optimize for high throughput, which is the rate that data is moved from producers to brokers or brokers to consumers?

  • Do you want to optimize for low latency, which is the elapsed time moving messages end-to-end (from producers to brokers to consumers)?

  • Do you want to optimize for high durability, which guarantees that messages that have been committed will not be lost?

  • Do you want to optimize for high availability, which minimizes downtime in case of unexpected failures? Kafka is a distributed system, and it is designed to tolerate failures.