Kafka Interview Questions and Answers
Top 100 Apache Kafka Interview Questions & Answers
This section provides common interview questions about Apache Kafka, along with concise answers.
What is Apache Kafka?
- A distributed event streaming platform designed for high-throughput, low-latency data feeds.
- Used for building real-time data pipelines and streaming applications.
What are the core components of Kafka?
- Producers
- Consumers
- Brokers
- Topics
- Partitions
- Replicas
- Zookeeper (or KRaft)
What is a Kafka Topic?
- A category or feed name to which records are published.
- Consumers subscribe to topics to receive messages.
What is a Kafka Partition?
- A sequence of messages within a topic.
- Topics are divided into partitions for parallelism and scalability.
- Messages within a partition are ordered.
Why are Partitions important in Kafka?
- Enable parallelism for both producers (sending messages) and consumers (reading messages).
- Allow topics to span multiple brokers, handling more data than a single machine.
- Provide ordering guarantees within a partition.
What is a Kafka Broker?
- A single Kafka server.
- A Kafka cluster consists of one or more brokers.
What is a Kafka Cluster?
- A group of Kafka brokers working together.
What is a Producer in Kafka?
- An application that publishes (sends) messages to Kafka topics.
What is a Consumer in Kafka?
- An application that subscribes to topics and processes (reads) messages from them.
What is a Consumer Group?
- A group of consumers that share a common
group.id
. - Messages from a topic's partitions are distributed among the consumers within a group.
How does Kafka ensure message ordering?
- Kafka guarantees ordering within a single partition.
- It does not guarantee ordering across multiple partitions in the same topic.
What is an Offset?
- A unique identifier assigned to each message within a partition.
- It's a monotonically increasing integer.
How do Consumers track their position?
- Consumers track their position in each partition by storing the offset of the last message they have processed.
- This is called "offset management".
Where are Consumer Offsets stored?
- In Kafka itself, in a dedicated internal topic called
__consumer_offsets
.
What is Zookeeper's role in Kafka (pre-KRaft)?
- Used for managing and coordinating Kafka brokers (cluster metadata, leader election for brokers and partitions).
- Note: This is being replaced by KRaft.
What is KRaft?
- Kafka Raft Metadata mode.
- It's a new consensus protocol that replaces Zookeeper for managing Kafka's metadata, simplifying the architecture.
What is Replication Factor?
- The number of copies of a partition that are maintained across different brokers.
- It determines fault tolerance.
What is an ISR?
- In-Sync Replicas.
- The set of replicas for a partition that are fully caught up with the leader and considered healthy.
- Producers can wait for acknowledgments from ISRs.
What is the Leader Replica?
- One replica of a partition is designated as the leader.
- All producer writes go to the leader, and consumers typically read from the leader.
What is a Follower Replica?
- Replicas that passively replicate the data from the leader.
- They become the new leader if the current leader fails.
How does Kafka achieve fault tolerance?
- Through replication.
- If a broker fails, a follower replica on another broker can be elected as the new leader for the affected partitions.
Explain Kafka's Delivery Semantics.
- At-most-once: Messages might be lost but are never redelivered.
- At-least-once: Messages are never lost but might be redelivered.
- Exactly-once: Messages are delivered exactly once. Achieved in Kafka Streams and with transactional producers/consumers.
How to achieve At-least-once delivery?
- By setting
acks=all
on the producer. - Ensuring the consumer commits offsets after processing the message.
How to achieve Exactly-once delivery?
- Requires enabling idempotence on the producer (
enable.idempotence=true
). - Using transactions for atomic writes across multiple partitions/topics or when integrating with external systems.
- Kafka Streams provides exactly-once processing guarantees.
What is Idempotence in Kafka Producers?
- Guarantees that sending the same message multiple times due to retries will result in the message being written to the log exactly once.
- Prevents duplicates from producer retries.
What are Kafka Transactions?
- Allow producers to send messages to multiple partitions and topics atomically.
- Either all messages in a transaction are committed, or none are.
- Essential for exactly-once delivery when writing to multiple destinations.
What is the role of a Producer's acks
setting?
- Controls the level of acknowledgment the producer requires from the broker before considering a write successful.
acks=0
: no wait.acks=1
: wait for leader.acks=all
: wait for leader and all ISRs.
What is the Consumer Group Rebalance?
- The process by which partitions are reassigned among the consumers within a consumer group.
- Occurs when a consumer joins or leaves the group, or when a topic's partitions change.
How does a Consumer Group Rebalance affect consumers?
- Consumers stop processing messages.
- They release their assigned partitions.
- They are then assigned a new set of partitions.
- This can cause processing pauses.
What is a Rebalance Listener?
- An interface (
ConsumerRebalanceListener
) that allows consumer applications to react to rebalance events. - Useful for tasks like committing offsets before losing a partition.
What is Kafka Connect?
- A framework for streaming data between Kafka and other systems.
- Examples: databases, key-value stores, search indexes, file systems.
What are Source Connectors in Kafka Connect?
- Ingest data from external systems into Kafka topics.
33. What are Sink Connectors in Kafka Connect?
- Export data from Kafka topics to external systems.
What are Converters in Kafka Connect?
- Handle the serialization and deserialization of message keys and values.
- Convert data between the Kafka Connect internal format and the format used in Kafka or external systems.
What are Transforms (Single Message Transforms - SMTs) in Kafka Connect?
- Lightweight message transformations.
- Applied to individual messages as they flow through a connector.
What is Kafka Streams?
- A client-side library for building stateful and stateless stream processing applications.
- Uses Kafka's client libraries directly.
Explain the difference between KStream and KTable.
- KStream: Unbounded sequence of records (events).
- KTable: Changelog stream representing a continuously updating table (state).
What is State in Kafka Streams?
- Kafka Streams applications can maintain local state stores (key-value, window, session).
- Used to remember information across processing steps.
How does Kafka Streams handle state fault tolerance?
- State stores are backed by Kafka topics (changelog topics).
- State can be restored by replaying the changelog topic.
What is Windowing in Kafka Streams?
- Grouping records based on time boundaries.
- Used for aggregations or joins on time-series data.
What is Schema Registry?
- A centralized repository for managing schemas (like Avro, Protobuf, JSON Schema) for data in Kafka.
Why use Schema Registry?
- Ensures data compatibility.
- Provides central schema management.
- Reduces boilerplate code in producer/consumer applications.
What serialization formats are commonly used with Kafka?
- JSON
- Avro
- Protobuf
- Plain strings/bytes
Avro and Protobuf are often preferred with Schema Registry.
How does Kafka handle message retention?
- Retains messages for a configurable period of time (time or size) or until explicitly deleted.
- Retention is set per topic.
What is Log Compaction?
- A retention policy that retains the last known value for each message key within a partition.
- Removes older records with the same key.
What are Kafka Command-Line Tools?
- Utilities for administrative tasks.
- Examples: creating/deleting topics, describing topics, listing consumer groups, producing/consuming messages.
How do you create a topic using the command line?
kafka-topics --create \
--topic <topic_name> \
--bootstrap-server <broker_address> \
--partitions <num_partitions> \
--replication-factor <num_replicas>
How do you produce a message from the command line?
kafka-console-producer \
--topic <topic_name> \
--bootstrap-server <broker_address>
How do you consume messages from the command line?
kafka-console-consumer \
--topic <topic_name> \
--bootstrap-server <broker_address> \
--from-beginning
What are the key configuration parameters for a Kafka Producer?
bootstrap.servers
key.serializer
value.serializer
acks
retries
batch.size
linger.ms
enable.idempotence
What are the key configuration parameters for a Kafka Consumer?
bootstrap.servers
group.id
key.deserializer
value.deserializer
auto.offset.reset
enable.auto.commit
auto.commit.interval.ms
max.poll.records
Explain auto.offset.reset
in consumers.
- Determines where to start reading if no committed offset exists or is available for a consumer group.
earliest
: Start from the beginning of the partition.latest
: Start from the end (most recent messages).
Explain enable.auto.commit
in consumers.
- If
true
, offsets are automatically committed periodically in the background based on theauto.commit.interval.ms
setting. - If
false
, manual commit is required usingcommitSync()
orcommitAsync()
.
What is the impact of increasing the number of partitions?
- Higher potential throughput.
- Increased management overhead.
- More complex rebalances.
- No cross-partition ordering guarantee.
When should you use more partitions?
- When needing higher throughput than current partitions can handle.
- When needing more parallel consumer instances in a consumer group.
What is the role of message keys?
- Used for partitioning (default partitioner hashes the key to determine the partition).
- Ensures messages with the same key go to the same partition, preserving ordering for that key.
- Essential for Log Compaction.
What happens if a Producer sends a message with a null key?
- The default partitioner uses round-robin distribution among available partitions.
What happens if a Producer sends a message with a specific partition number?
- The message goes directly to that specified partition, ignoring the key and any partitioner logic.
How does Kafka handle backpressure on the consumer side?
- Consumers use a pull model, meaning they request batches of messages from the broker.
- Consumers control their own fetch rate, preventing brokers from overwhelming them.
What is the difference between a Queue and a Topic in messaging systems?
- Queue: Typically point-to-point. A message is consumed by only one consumer. Messages are usually removed after consumption.
- Topic: Publish-subscribe model. A message published to a topic can be consumed by multiple independent consumer groups. Messages are retained for a configurable period.
Can a single consumer instance belong to multiple consumer groups?
- Yes. A consumer instance can subscribe to topics as part of different consumer groups simultaneously.
Can multiple consumer instances with the same group.id
read from the same partition simultaneously?
- No. Within a single consumer group, each partition is assigned to at most one consumer instance at any given time during a stable state.
What is the maximum size of a message in Kafka?
- Configurable via broker settings (
message.max.bytes
,replica.fetch.max.bytes
). - Defaults are typically around 1MB.
- Can be increased, but has performance implications (memory, disk I/O, network).
How do you monitor Kafka?
- Using JMX metrics exposed by brokers, producers, and consumers.
- Integrating with monitoring tools like Prometheus and Grafana.
- Monitoring OS-level metrics (CPU, memory, disk I/O, network).
- Using Kafka-specific monitoring tools (Confluent Control Center, etc.).
What are some important metrics to monitor for Kafka Brokers?
- Request rate and latency.
- Network traffic (bytes in/out).
- Disk usage and I/O.
- Leader and follower counts per partition.
- ISR size for partitions.
- Offline partitions.
What are some important metrics to monitor for Kafka Producers?
- Request rate and latency.
- Error rate.
- Batch size and rate.
- Compression rate.
- Queue size (messages waiting to be sent).
What are some important metrics to monitor for Kafka Consumers?
- Fetch rate and latency.
- Records consumed rate.
- Commit latency.
- Consumer lag (difference between latest offset and committed offset).
- Rebalance rate.
What is Consumer Lag?
- The difference, in terms of offsets, between the latest message written to a partition and the last message processed (and its offset committed) by a specific consumer group for that partition.
- High lag indicates the consumer group is falling behind.
How can you reduce Consumer Lag?
- Add more consumers to the consumer group (up to the number of partitions).
- Optimize the consumer's message processing logic.
- Increase the number of partitions for the topic (if applicable).
- Adjust consumer fetch parameters (e.g.,
fetch.min.bytes
,fetch.max.wait.ms
).
What is Kafka Security?
- Implementing measures to secure your Kafka cluster and data.
-
Includes:
- Authentication: Verifying the identity of clients (Producers, Consumers, Brokers). Examples: SASL (GSSAPI/Kerberos, PLAIN), SSL/TLS Client Authentication.
- Authorization: Controlling what authenticated clients are allowed to do (e.g., read/write to topics). Uses Access Control Lists (ACLs).
- Encryption: Protecting data in transit (SSL/TLS) and potentially at rest (though Kafka itself doesn't encrypt data at rest by default).
What are ACLs in Kafka?
- Access Control Lists.
- Define permissions for principals (users, service accounts) on Kafka resources.
- Resources include topics, brokers, consumer groups, and the cluster itself.
- Permissions include Read, Write, Create, Delete, Alter, Describe, ClusterAction.
Explain SASL vs. SSL for Authentication.
- SASL (Simple Authentication and Security Layer): A framework for authentication protocols. Kafka supports mechanisms like GSSAPI (Kerberos), PLAIN, SCRAM. Authenticates the client to the broker.
- SSL/TLS (Secure Sockets Layer / Transport Layer Security): Primarily provides encryption for data in transit. Can also be used for authentication using client certificates.
- They can be used together: SSL for encryption and SASL for authentication over the encrypted connection.
How do you handle schema evolution in Kafka?
- Using a Schema Registry to centralize and manage schemas.
- Employing a schema format that supports evolution (like Avro or Protobuf).
- Defining compatibility rules in the Schema Registry (e.g., Backward, Forward, Full).
- Producers use the latest compatible schema, Consumers can read data written with older (compatible) schemas.
What are the different types of schema compatibility?
- Backward Compatibility: New schema can read data written with the old schema. (Consumers using new code can read old data). Safe for consumers.
- Forward Compatibility: Old schema can read data written with the new schema. (Consumers using old code can read new data). Safe for producers.
- Full Compatibility: Both backward and forward compatible.
- None: No compatibility checks enforced.
What are the benefits of using Avro with Kafka and Schema Registry?
- Compact binary format (smaller messages).
- Rich type system.
- Strong schema evolution support with well-defined compatibility rules.
- Code generation from schemas.
- Avoids issues with JSON's lack of strict schema enforcement.
What is the role of replica.lag.time.max.ms
?
- Broker configuration.
- The maximum time a follower replica is allowed to lag behind the leader replica before being considered out of sync and removed from the In-Sync Replicas (ISR) set.
What is the significance of min.insync.replicas
?
- Topic or broker configuration.
- When
acks=all
is used by the producer, this setting specifies the minimum number of replicas (including the leader) that must acknowledge a write before the producer considers the write successful. - Ensures data durability even if some replicas are unavailable.
How does Kafka scale horizontally?
- Brokers: Add more brokers to the cluster to increase overall capacity (storage, network, processing).
- Partitions: Increase the number of partitions for a topic to increase parallel processing capability for producers and consumers.
- Consumers: Add more consumers to a consumer group to increase parallel processing of partitions.
What is the difference between Kafka and traditional Message Queues (e.g., RabbitMQ)?
- Kafka: Designed for high-throughput, durable event streaming, and replayability (distributed log). Data is retained for a period and consumed by multiple groups independently. Pull-based consumer model.
- Traditional MQ: Often designed for point-to-point or fan-out with message removal after consumption. Typically uses a push-based model to consumers. Less emphasis on long-term data retention and replayability.
When would you choose Kafka over a traditional Message Queue?
- High volume of data needing to be processed in real-time.
- Need for durable storage and the ability to replay past events.
- Requirement for multiple, independent applications to consume the same data stream.
- Building real-time data pipelines and streaming applications.
What are the trade-offs of using Kafka?
- Complexity: Distributed system with multiple components (brokers, Zookeeper/KRaft, Schema Registry, Connect, Streams).
- Operational Overhead: Requires careful monitoring, tuning, and management.
- Learning Curve: Concepts like partitions, offsets, consumer groups, and delivery semantics need to be understood.
What is the role of the Kafka Controller Broker?
- One broker in the cluster is elected as the controller.
-
Manages the state of the cluster, including:
- Performing leader election for partitions when a broker fails.
- Managing partition reassignment.
- Maintaining the list of in-sync replicas (ISRs).
How does Kafka handle broker failures?
- The Controller broker detects the failure.
- The Controller initiates a leader election process for all partitions whose leader was on the failed broker.
- A new leader is elected from the partition's In-Sync Replicas (ISRs).
- Clients (Producers and Consumers) are notified of the leadership change.
How does Kafka handle producer failures?
- Producers can be configured with retries (
retries
parameter). - With idempotence enabled (
enable.idempotence=true
), retries are safe and do not introduce duplicates. - Without idempotence, retries can lead to duplicate messages in Kafka.
How does Kafka handle consumer failures?
- If a consumer instance fails, its connection to the broker is lost.
- This triggers a consumer group rebalance.
- The partitions previously assigned to the failed consumer are reassigned to other healthy consumers within the same consumer group.
What is the difference between Manual and Automatic Offset Committing?
- Automatic: Offsets are committed periodically in the background based on the
auto.commit.interval.ms
setting. Risk of processing messages multiple times (if commit happens before processing is complete and consumer fails) or losing messages (if consumer fails after processing but before commit). - Manual: The application explicitly calls
commitSync()
orcommitAsync()
after processing a batch of messages. Provides more control and is necessary for At-least-once or Exactly-once processing guarantees.
When would you use commitSync()
vs. commitAsync()
?
commitSync()
: Blocks the consumer until the offset commit is successful. Simpler to use but reduces throughput. Good for ensuring offsets are committed before shutting down or during rebalances.commitAsync()
: Non-blocking. The consumer continues fetching messages while the commit happens in the background. Offers higher throughput but requires handling commit failures via callbacks.
What are the different types of Joins in Kafka Streams?
- KStream-KStream Join: Joins two event streams based on a common key within a defined time window.
- KTable-KTable Join: Joins two changelog streams (tables) based on a common key.
- KStream-KTable Join: Joins an event stream with a table. The event from the stream is joined with the current state of the table.
- KStream-GlobalKTable Join: Joins an event stream with a GlobalKTable (a full copy of a table on every stream instance).
What is RocksDB used for in Kafka Streams?
- RocksDB is the default state store backend for Kafka Streams.
- It's an embedded key-value store used to persist and query the local state maintained by stream processing applications (e.g., for aggregations, joins, windowing).
How can you scale Kafka Connect?
- By running multiple Kafka Connect worker processes in distributed mode.
- Tasks assigned to connectors are distributed among these workers.
- Adding more workers increases the overall capacity and fault tolerance of the Connect cluster.
What is the difference between standalone and distributed mode in Kafka Connect?
- Standalone Mode: A single Connect worker process runs all connectors and tasks. Not fault-tolerant; if the worker fails, all connectors stop. Suitable for development or testing.
- Distributed Mode: Multiple Connect worker processes form a cluster. Connectors and tasks are distributed and managed across workers. Fault-tolerant; if a worker fails, its tasks are reassigned to other workers. Scalable and suitable for production.
How does Kafka Connect handle failures?
- In distributed mode, if a Connect worker process fails, the Connect cluster detects the failure.
- The tasks that were running on the failed worker are automatically reassigned to other healthy workers in the cluster.
- Connectors and tasks can also be configured to restart automatically upon failure.
What are Connectors, Tasks, and Workers in Kafka Connect?
- Connector: Defines the configuration for copying data between a source/sink system and Kafka. It manages the creation and lifecycle of Tasks.
- Task: The unit of parallelism within a connector. A connector divides the job of copying data into multiple tasks, which run independently on workers.
- Worker: The process that runs the connector and task instances. In distributed mode, multiple workers form a cluster.
What is Kafka's relationship with the CAP theorem?
- The CAP theorem states that a distributed system can only guarantee two out of Consistency, Availability, and Partition Tolerance.
- Kafka is often described as a CP system, prioritizing Consistency and Partition Tolerance over strict Availability in the event of a network partition. Kafka guarantees consistency within a partition's replicas.
What is the difference between synchronous and asynchronous sending in Kafka Producers?
- Synchronous Sending: The producer sends a message and waits for an acknowledgment (based on the
acks
setting) before sending the next message. Simpler code but lower throughput. - Asynchronous Sending: The producer sends messages in the background without waiting for an immediate acknowledgment. Uses callbacks to handle responses or errors. Offers much higher throughput.
What are the benefits of using compression in Kafka?
- Reduces network bandwidth usage between producers, brokers, and consumers.
- Reduces disk space usage on brokers.
- Can potentially improve throughput by reducing the amount of data transferred, even if CPU usage increases slightly for compression/decompression.
What are common compression types in Kafka?
- Gzip
- Snappy
- LZ4
- ZStandard (ZSTD) - often recommended for good balance of compression and performance.
What is the purpose of request.timeout.ms
and delivery.timeout.ms
in the producer?
request.timeout.ms
: The maximum time the producer will wait for a response to a single request (e.g., a metadata request or a send request) before giving up and retrying or failing.delivery.timeout.ms
: The maximum time elapsed from when a message is sent (producer.send()
) until it is successfully delivered to the broker or fails. This includes time spent in the producer buffer, retries, etc.
How do you handle poison pill messages in consumers?
- Logging and Skipping: Log the error and skip the message, committing the offset to move forward.
- Dead Letter Queue (DLQ): Send the problematic message to a separate topic (DLQ) for later inspection and handling.
- Robust Error Handling/Retries: Implement retry logic within the consumer or use frameworks that provide it.
Explain the concept of "strengthening" a Kafka cluster.
- Making the cluster more resilient, secure, and performant.
-
Involves:
- Increasing replication factor for topics.
- Configuring
min.insync.replicas
appropriately. - Adding more brokers for increased capacity and fault tolerance.
- Implementing authentication and authorization (ACLs).
- Enabling encryption (SSL/TLS).
- Implementing robust monitoring and alerting.
- Regular backups of metadata.
What is the role of the Kafka Log?
- The append-only, ordered sequence of messages stored on the broker's disk for each partition.
What is a Log Segment?
- Kafka partitions are broken down into segments (files) for easier management, retention, and cleanup.
How does Kafka store messages on disk?
- As a sequence of log segments within the directory for each partition. Messages are appended sequentially.
What is Zero-Copy Principle in Kafka?
- Kafka uses the
sendfile
system call to efficiently transfer data from disk to network socket without involving the CPU for copying data between kernel and user space buffers. Improves throughput.
What are the benefits of Kafka's pull-based consumer model?
- Consumers control their own rate, can batch messages efficiently, and can rewind/replay the log.
What are the benefits of Kafka's push-based producer model?
- Brokers can control the rate at which they accept data, managing their load.
What is Idempotent Producer?
- A producer configured with
enable.idempotence=true
, guaranteeing that retrying a send will not result in duplicate messages.
What is the difference between a KStream and a GlobalKTable?
- KStream: Local view of events, partitioned across stream processing instances.
- GlobalKTable: Full copy of the table state available on *every* stream processing instance. Useful for joining against a small, static dataset.
How does Kafka Streams handle out-of-order events?
- Using timestamps (event time, processing time, ingestion time) and windowing with grace periods to handle late-arriving records.
What is Event Time vs. Processing Time vs. Ingestion Time?
- Event Time: Time event occurred in the source system.
- Processing Time: Time event was processed by the stream application.
- Ingestion Time: Time event was stored in Kafka.