Yesterday, Twitter shared that they are migrating to Apache Kafka from an in-house Pub/Sub system, EventBus which is built on top of Apache DistributedLog. The main reasons behind this adoption were Kafka’s lower latency, better resource savings, and a strong community support.
Apache Kafka is an open-source distributed stream-processing software that provides a unified, high-throughput, low-latency platform for handling real-time data feeds. It has seen broad adoption from many big companies such as LinkedIn, Netflix, Uber and many more, making it the de facto real-time messaging system of choice in the industry.
The Twitter team evaluated Kafka under similar workloads that are run on EventBus such as durable writes, tailing reads, catchup reads, and high fanout reads, for several months. They concluded these reasons for moving to Kafka:
This evaluation highlighted that Kafka provides significantly lower latency, regardless of the amount of throughput. Throughput was measured by the timestamp difference from the time the message was created to when the consumer read the message. Its lower latency could be attributed to these factors:
EventBus separates the serving and storage layers, which calls for additional hardware while Kafka uses only a single host to provide both. During the evaluation, the team saw a 68% resource savings for single consumer use cases, and 75% for multiple consumers use cases. In addition to this, the latest versions of Kafka support data replication, providing the durability guarantees.
As hundreds of developers are contributing to the Kafka project, which ensures regular bug fixes, improvements, and new features as opposed to only eight or so engineers working on EventBus/DistributedLog. Kafka comes with features that Twitter needed such as a streaming library, at-least-once HDFS pipeline, and exactly-once processing, which are not yet implemented in EventBus. Additionally, they will be able to get solutions to any problems they encounter on either the client or server side and get access to a better documentation.
Check out the complete announcement on Twitter’s website.
Apache Kafka 2.0.0 has just been released
Getting started with the Confluent Platform: Apache Kafka for enterprise
Working with Kafka Streams