Optimizing Kafka

Neo4j can’t ingest fast if Kafka isn’t set up correctly. While this isn’t a common source of problems, it has come up. Confluent has good overall documentation on optimizing Kafka that is worth being familiar with.

The main trade offs are these, and they have to make sense at the Kafka layer before they can make sense for Neo4j.

  • Do you want to optimize for high throughput, which is the rate that data is moved from producers to brokers or brokers to consumers?

  • Do you want to optimize for low latency, which is the elapsed time moving messages end-to-end (from producers to brokers to consumers)?

  • Do you want to optimize for high durability, which guarantees that messages that have been committed will not be lost?

  • Do you want to optimize for high availability, which minimizes downtime in case of unexpected failures? Kafka is a distributed system, and it is designed to tolerate failures.