Dockerize Kafka multi-node cluster

Dhruv Saksena
5 min readJun 20, 2021

--

This weekend I attempted to deploy a multi-node Kafka cluster via Docker and it’s really easy once you are clear with the basic concepts of containerisation.

Kafka is a distributed system and it has multiple broker and zookeeper nodes. For, every partition it has an elected leader to which the client connects. Whenever, a client connects to this system via any of the brokers it asks for the leader for that specific partition. Kaka returns the leader node along with metadata on the endpoints which will listen to the client. Now, the data is being read/write from leader of that specific partition.

I tried it via docker compose and following is my docker-compose.yml file-

version: '3'
services:
zookeeper-1:
image: confluentinc/cp-zookeeper:latest
networks:
- kafka-network
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- 22181:2181
volumes:
- /Users/dhruv/Documents/docker/kafka/zk-1:/var/lib/zookeeper/data
zookeeper-2:
image: confluentinc/cp-zookeeper:latest
networks:
- kafka-network
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- 32181:2181
volumes:
- /Users/dhruv/Documents/docker/kafka/zk-2:/var/lib/zookeeper/data
kafka-1:
image: confluentinc/cp-kafka:latest
networks:
- kafka-network
depends_on:
- zookeeper-1
- zookeeper-2
volumes:
- /Users/dhruv/Documents/docker/kafka/broker-1:/var/lib/kafka/data
ports:
- 29092:29092
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:2181,zookeeper-2:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-1:9092,PLAINTEXT_HOST://localhost:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
kafka-2:
image: confluentinc/cp-kafka:latest
networks:
- kafka-network
depends_on:
- zookeeper-1
- zookeeper-2
volumes:
- /Users/dhruv/Documents/docker/kafka/broker-2:/var/lib/kafka/data
ports:
- 39092:39092
environment:
KAFKA_BROKER_ID: 2
KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:2181,zookeeper-2:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-2:9092,PLAINTEXT_HOST://localhost:39092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
networks:
kafka-network:
driver: bridge

Following are details of the file-

Zookeeper

Here, Iam using Confluentic’s zookeper docker image for bringing up my kafka zookeeper. Here, I have deployed two containers of zookeeper with name as zookeeper-1 and zookeeper-2. In this cluster, I have deployed two node for zookeeper in a replicated mode. You can increase the number of nodes anytime now by adding an entry for zookeeper-3. It’s recommended that for production we have odd number of instances of zookeeper’s so that the leader election can be done easily.

Here, ZOOKEEPER_TICK_TIME corresponds to the tickTime in Kafka, which is the basic time-unit in which other variables of Kafka operate. Confluentic offers other parameters as well like ZOOKEEPER_INIT_LIMIT, ZOOKEEPER_SYNC_LIMIT, which you can use in your configuration.

The value you set here will be multiplied by tickTime to give the actual time in milliseconds.

Ex: ZOOKEEPER_INIT_LIMIT: 2 (denotes 2*2000(tickTime set in config) ms).

Here, Im mounting /Users/dhruv/Documents/docker/kafka/zk-1 as directory to persist all data of zookeeper container on my local storage. Docker container use their own Union File System to keep the data generated by the applications with themselves, but when the container is destroyed the data is lost aswell. To avoid this, we form a mapping of container’s file system to host file system, so that data from container is synched into our local file system aswell.

Kafka Brokers

Here also, we are using image from confluent to construct a broker container.

Here, we are creating dependency with Zookeeper so that Docker compose starts zookeeper first and then the brokers.

Note: depends_on only starts the services in order of their dependencies and if you want the broker to start executing only after the zookeeper has started successfully then you need to define the startup order in the docker compose file.

Here, we are using unique ids for every broker with the configuration of KAFKA_BROKER_ID. Also, KAFKA_ZOOKEEPER_CONNECT lists which are the zookeeper nodes.

Here, we have identified two listeners to our cluster PLAINTEXT and PLAINTEXT_HOST. Both these listeners are using PLAINTEXT security protocol to connect to cluster. This protocol mapping with listener is specified via KAFKA_LISTENER_SECURITY_PROTOCOL_MAP.

Amongst, these two listeners, PLAINTEXT is the listener which is used for inter-broker communication. This is specified via KAFKA_INTER_BROKER_LISTENER_NAME.

Whenever a client connects with any of the brokers then KAFKA_ADVERTISED_LISTENERS is the metadata which is passed to the client alongwith the leader information.

Now, if we execute the Kafkacat command within the docker network from terminal-

Im executing Kafkacat in metadata list mode(-L) to see what metadata information will be sent to a client who connects to broker-2. It gives response as localhost:39092 to which a client can now connect to the Kafka cluster. So, within Docker internal network you can easily use kafka-1, kafka-2 to connect, which internally translates to localhost:39092

Now, if I execute the same command(with network_mode as host)outside of the docker network and inside my host machine then following will be the output, which implies that for outside we need to use localhost:39092.

To summarize, a client first connect to a Kafka cluster with the list of — bootstrap-server only for the initial connection. Kafka in turn returns the meta-data specified KAKFA_ADVERTISED_LISTENERS and later for all production/consumption of messages the client uses this address.

Now, let’s bring up the services with the following command in the folder where your docker-compose.yml is present.

docker compose up

This will bring up all the services and it will look like this-

Now, lets enter into any broker container and create a topic with below command-

kafka-topics — create — bootstrap-server localhost:39092 — replication-factor 1 — partitions 2 — topic reviews

To check if the topic has been created execute the below command-

kafka-topics --list --bootstrap-server localhost:39092

To check the leader information in the cluster for any given topic execute the below command-

kafka-topics --bootstrap-server localhost:39092 --describe --topic reviews

As, we have specified broker with port 39092 with id as 2. The same is shown as leader in the output.

Hope this helps !!

--

--

No responses yet