Kafka Architecture Deep Dive

In the previous post, we introduced Apache Kafka and explained why it’s such a powerful platform for event streaming. Now, it's time to look under the hood. In this post, we’ll explore Kafka’s internal architecture and the roles of its core components.

High-Level Overview

Kafka is a distributed system that runs as a cluster of one or more servers (called brokers). It handles incoming streams of data (records) from producers, stores them reliably, and makes them available to consumers.

Core Components

1. Broker

A broker is a Kafka server that receives messages from producers, stores them on disk, and serves them to consumers. A Kafka cluster typically contains multiple brokers for scalability and fault tolerance.

2. Topic

Topics are logical channels to which producers publish records and from which consumers read. Topics are split into partitions to allow Kafka to scale horizontally.

3. Partition

Each topic is divided into one or more partitions. Partitions are ordered, immutable sequences of records that are continually appended to. Partitions enable parallelism and load balancing.

4. Producer

Producers are client applications that send records to Kafka topics. Kafka producers can choose the partition to which each record is sent, often based on a key.

5. Consumer

Consumers are client applications that read records from topics. They work as part of a consumer group, where each group coordinates consumption to avoid duplicate processing.

6. Consumer Group

Consumers in the same group share the work of reading from partitions. Each partition is consumed by only one consumer in the group at a time, which enables parallel consumption.

7. Zookeeper (Deprecated in Newer Versions)

Kafka historically used Apache ZooKeeper for metadata management and cluster coordination. As of Kafka 2.8+, Kafka supports running without ZooKeeper, known as KRaft mode.

How Kafka Stores Data

Kafka persists data to disk for durability and fault tolerance. Data is stored in a commit log format, and each record within a partition has a unique offset. Kafka consumers track offsets to know where to resume reading.

Data Flow

  1. Producer sends a record to a specific topic (and optionally a partition).
  2. The broker writes the record to disk in that partition.
  3. Consumer subscribes to the topic and reads from the partition, tracking its offset.

Replication & Fault Tolerance

Each partition has a configurable number of replicas. One replica is elected as the leader, and the rest are followers. If a broker fails, a follower can take over as leader, ensuring high availability.

Retention & Compaction

Kafka retains messages for a configurable period or until a log size limit is reached. You can also enable log compaction, where Kafka retains only the latest record for each key—ideal for changelog use cases.

Conclusion

Kafka’s architecture is designed for durability, scalability, and high throughput. Understanding how producers, brokers, topics, and consumers work together is key to designing effective Kafka-based solutions.

Post a Comment

0 Comments