Kafka 101: Core Concepts and How It Works

Apache Kafka is a high-throughput, low-latency, and scalable messaging system. Thanks to its distributed architecture, it manages large-scale data streams. In this post, we’ll explore the fundamental concepts of Kafka and how it works.

What is Apache Kafka?

Kafka is a messaging system originally developed by LinkedIn and later open-sourced under the Apache Software Foundation. It is widely used for big data processing, real-time stream analytics, and event-driven architectures.

Kafka Overview

Figure 1. High-level architecture overview of Apache Kafka showing the interaction between producers, brokers, and consumers. Source: Created by the author.

Key Components of Kafka

To understand Kafka, it’s important first to get familiar with its core components:

Topic, Partition, Segment, and Log in Kafka

Kafka Topic Structure

Figure 2. Kafka topic structure showing the relationship between topics, partitions, and segments in a distributed setup. Source: Created by the author.

Topic

A topic in Kafka is a logical category where messages are grouped. Producers send messages to specific topics, while consumers read from them. Topics are divided into partitions to handle data more efficiently.

Partition

Partitions enable horizontal scaling of a topic. Each partition behaves as an independent append-only log, meaning messages are added sequentially and are not deleted (until the retention period expires).

Key features:

Kafka Leader Replication

Figure 3. Leader-follower replication model in Kafka demonstrating how data is replicated across multiple brokers for fault tolerance. Source: Created by the author.

Example:

Topic: "user-activity"

Partition 0 → Broker 1
Partition 1 → Broker 2
Partition 2 → Broker 3

Replication Factor

The replication factor in Kafka defines how many copies of a partition are stored on different brokers. It enhances data durability and protects against broker failures.

Key features:

Example replication setup:

Topic: "user-activity", Replication Factor: 3

Partition 0 → Leader: Broker 1, Replica: Broker 2, Broker 3
Partition 1 → Leader: Broker 2, Replica: Broker 1, Broker 3
Partition 2 → Leader: Broker 3, Replica: Broker 1, Broker 2

This setup ensures no data loss and uninterrupted operation in case of a broker failure.

Segment

Partitions are divided into smaller files called segments. This helps optimize disk usage and speeds up data lookup.

Key features:

Kafka Broker

Figure 4. Internal structure of a Kafka broker showing how messages are organized in segments within partitions. Source: Created by the author.

Example:

Partition 0
├── segment_0001.log
├── segment_0002.log
├── segment_0003.log

Log And Record

Kafka’s log-based architecture stores messages in log files. Each partition acts as an append-only log, where messages are written sequentially and remain immutable.

Example of a Kafka record:

Key: "user_123"
Value: "{"action": "login", "timestamp": "2025-03-18T12:34:56Z"}"
Timestamp: "2025-03-18T12:34:56Z"

Consumers read and process records from the logs of specific topics.

Kafka Log

Figure 5. Kafka's log-based storage model illustrating how messages are appended sequentially in partition logs. Source: Created by the author.

How Does Kafka Work?

To understand Kafka’s operation, consider the following steps:

  1. A producer sends a message to a specific topic.

  2. A broker stores the message in the appropriate partition of that topic.

  3. A consumer reads messages from the topic and processes them.

  4. Messages are retained for a configured duration (retention period) and then deleted.

This process enables Kafka to deliver real-time data streaming and forms the backbone of event-driven systems.

Advantages of Kafka

Here’s why Kafka is widely adopted:

Installing Apache Kafka (KRaft Mode) via Confluent Platform on macOS using .tar.gz

Confluent Platform extends Apache Kafka with additional tools and integrations, and it now supports KRaft mode (ZooKeeper-free architecture). Here’s how to set up Confluent Platform with KRaft mode on macOS using the .tar.gz archive.

🔍 What is Confluent Platform?

Confluent Platform is an enhanced distribution of Apache Kafka built by its original creators. It bundles Apache Kafka with additional tools like Schema Registry, Kafka Connect, REST Proxy, and ksqlDB, making it easier to build, monitor, and scale real-time data pipelines.

Traditionally, Kafka required ZooKeeper—a separate coordination service—to manage metadata and cluster state. However, ZooKeeper adds operational complexity. Confluent Platform now supports KRaft mode, a newer architecture where Kafka handles its own metadata internally, removing the need for ZooKeeper altogether.

In this guide, we use Confluent Platform in KRaft mode, allowing a simpler, ZooKeeper-free setup.

🛠️ Prerequisites

Before you begin, ensure the following:

Step 1: Download the Confluent Platform

Go to the Confluent downloads page and download the tar.gz version.

Or use curl:

curl -O https://packages.confluent.io/archive/7.5/confluent-7.5.0.tar.gz

Extract it:

tar -xzf confluent-7.5.0.tar.gz
cd confluent-7.5.0

Step 2: Configure KRaft (No Zookeeper)

Edit the KRaft server config:

vim etc/kafka/kraft/server.properties

Make sure it includes:

process.roles=broker,controller

node.id=1

controller.quorum.voters=1@localhost:9093

listeners=PLAINTEXT://:9092,CONTROLLER://:9093

log.dirs=/tmp/kraft-logs

You can use a different log path if you like. Save and exit the file.

Step 3: Format the Metadata

Before starting the Kafka server, format the storage directory for KRaft:

bin/kafka-storage format -t $(bin/kafka-storage random-uuid) -c etc/kafka/kraft/server.properties

Step 4: Start the Kafka Server

Start the Kafka broker (now also acting as the controller):

bin/kafka-server-start etc/kafka/kraft/server.properties

No Zookeeper is needed. You should see logs showing both broker and controller roles starting.

Step 5: Create a Topic and Send Messages

As you type messages into the producer terminal, you’ll instantly see them appear in the consumer terminal. This demonstrates Kafka’s real-time streaming capability.

In a new terminal:

bin/kafka-topics --bootstrap-server localhost:9092 --create --topic test-topic --partitions 1 --replication-factor 1

Then, produce messages (also in a new terminal):

bin/kafka-console-producer --bootstrap-server localhost:9092 --topic test-topic

And consume them (also in a new terminal):

bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic test-topic --from-beginning

Step 6 (Optional): Add to PATH

To run Confluent Kafka tools globally:

export CONFLUENT_HOME=~/path/to/confluent-7.5.0
export PATH=$PATH:$CONFLUENT_HOME/bin

Then:

source ~/.zshrc  # or ~/.bash_profile

 Summary

In this post, we covered the core concepts of Apache Kafka and walked through installing Kafka in KRaft mode using the Confluent Platform on macOS. By eliminating ZooKeeper and leveraging the Confluent distribution, we achieved a cleaner and easier Kafka setup—ideal for modern, event-driven applications.

You’re now ready to start building real-time streaming pipelines locally. For production use, consider exploring multi-node clusters, replication strategies, and integrating tools like Schema Registry or Kafka Connect.

References: