Experiments with Kafka's head-of-line blocking (2023)
Recorded: Jan. 19, 2026, 10:03 a.m.
| Original | Summarized |
Experiments with Kafka's head-of-line blocking artur-rodrigues.com home Experiments with Kafka's head-of-line blocking Context No two consumers from the same Consumer Group can read from the same partition. Therefore, to avoid idle consumers, a topic must have at least as many partitions as there are consumers. Beanstalkd Architecture With beanstalkd, jobs are sent to tubes. Consumers simply connect to the server and reserve jobs from a given tube. For a given beanstalkd server, jobs are given out in the same order they were enqueued. beanstalkd_tube.put(msg.to_s) kafka_producer.produce( If we only had a single consumer, the total time to complete all jobs would be at least 40s, as that consumer would sleep for 10s four times. If we had an unlimited number of consumers, the minimum total time would be 10s, as at least four consumers would have to sleep for 10s in parallel. main_tube = beanstalkd.tubes[BEANSTALKD_MAIN_TUBE] After sleeping, consumers produce a dummy message to a different topic/tube, which is used by an out of bound watcher process that keeps track of global progress. Each watcher process starts the clock when the first dummy message is received and stops i when the 100th message is received. At this point, without no messages having been produced, we can inspect the topology of Kafka partitions and consumers: As seen above, each consumer has been assigned two partitions, and all 10 are empty. Time to produce the 100 messages: And wait for the results: The full experiment is available on github.com/arturhoo/kafka-experiment. As seen above, the beanstalkd setup was able to process 96 out of the 100 messages in less than one second. The Kafka setup, however, had two long 10s periods of time where no messages was processed - that is because there was at one consumer (queue-kafka-consumer-2) who was assigned two messages with a sleep duration of 10s. Artur Rodrigues github.com/arturhoo |
Artur Rodrigues’ article explores the implications of Kafka’s head-of-line blocking behavior through a technical experiment comparing it to beanstalkd, a traditional work queue system. The piece begins by contextualizing Kafka’s architecture, emphasizing its design for high-throughput message streaming rather than as a job queue. In Kafka, messages are organized into topics with partitions, and consumers within the same group divide partition responsibilities. This structure ensures that no two consumers in a group read from the same partition, which optimizes parallelism but introduces head-of-line blocking when a consumer processing a slow message delays subsequent messages in its assigned partitions. Rodrigues contrasts this with beanstalkd, where jobs are distributed to consumers regardless of their processing speed, allowing for continuous job allocation even if some workers are delayed. The experiment aims to quantify this difference by simulating a workload with 100 jobs, four of which require 10 seconds of sleep time, and measuring the total processing duration under both systems. The Kafka setup involves a topic with 10 partitions and five consumers, each assigned two partitions within a single consumer group. This configuration ensures that each consumer processes messages from its allocated partitions sequentially, creating dependencies where a slow message in one partition halts progress on other messages within the same partition. In contrast, beanstalkd’s architecture allows consumers to reserve jobs directly from a tube, with no partitioning mechanism. Jobs are served in the order they were enqueued, but multiple consumers can independently pull jobs from the same queue, enabling parallel processing even if some consumers are delayed. The experiment’s code implementation demonstrates this divergence: Kafka consumers process messages from their assigned partitions, while beanstalkd consumers reserve jobs directly, bypassing partition constraints. Both systems use a watcher process to track global progress by producing dummy messages after job completion, allowing for precise timing measurements. During the experiment, Rodrigues observes that Kafka’s head-of-line blocking significantly increases total processing time. While beanstalkd completes all 100 jobs in approximately 10 seconds, Kafka takes nearly double that time (20 seconds), primarily due to the slow 10-second jobs blocking their respective partitions. The article details how Kafka’s consumer group assigns two partitions per consumer, but when a single consumer is responsible for multiple slow jobs, it creates bottlenecks. For instance, one Kafka consumer (queue-kafka-consumer-2) is assigned two 10-second jobs, causing a 10-second delay during which no other messages in those partitions are processed. This contrasts sharply with beanstalkd, where four consumers simultaneously handle the slow jobs while the fifth consumer rapidly processes the remaining 96 short jobs. The experiment’s logs and timing data visually reinforce this disparity, showing that Kafka’s system experiences two 10-second gaps where no progress occurs, whereas beanstalkd maintains steady throughput. Rodrigues attributes this difference to Kafka’s partitioning model, which prioritizes order preservation and scalability for streaming use cases but sacrifices latency in job-queue scenarios. The article also highlights Kafka’s scalability advantages when multiple consumer groups are involved. Unlike beanstalkd, where a single queue is consumed by one group at a time, Kafka allows independent consumer groups to process the same topic concurrently. This feature is ideal for scenarios requiring real-time data replication or analytics but complicates latency management when multiple groups are active. In the experiment, however, Kafka’s single consumer group setup amplifies head-of-line blocking by limiting parallelism to the number of partitions, which is fixed at 10. Rodrigues notes that increasing the number of partitions could mitigate this issue, but it requires careful planning and may not always be feasible in production environments. The experiment’s results underscore the trade-offs between Kafka’s design philosophy and traditional work queue systems like beanstalkd, which explicitly prioritize fairness and low latency for job processing. Rodrigues concludes that while Kafka excels in high-throughput, ordered message streaming scenarios, its head-of-line blocking behavior makes it suboptimal for use cases requiring rapid job execution. The article’s detailed analysis of the experiment, including code snippets and timing data, provides concrete evidence of this limitation. By comparing Kafka’s partition-based model with beanstalkd’s simple job reservation mechanism, Rodrigues illustrates how architectural choices impact system performance. The findings suggest that developers should carefully evaluate Kafka’s suitability for their specific use cases, particularly when dealing with heterogeneous workloads where some tasks may take significantly longer than others. The experiment also raises questions about potential optimizations, such as dynamically adjusting partitioning or implementing fallback mechanisms for slow consumers. Ultimately, the article serves as a practical guide for understanding Kafka’s strengths and weaknesses, emphasizing the importance of aligning system design with application requirements. |