Most Kafka benchmarks appear to test high throughput but not low latency. Apache Kafka was traditionally used for high throughput rather than latency-sensitive messaging, but it does have a low-latency configuration. (Mostly setting linger.ms=0 and reducing buffer sizes). In this configuration, you can get below 1-millisecond latency a good percentage of the time for modest throughputs.
Benchmarks tend to focus on clustering Kafka in a high-throughput configuration. While this is perhaps the most common use case, how does it perform if you need lower latencies?