Disliked: acks=all messages are also lost?

Message Queuing is a technical module that will definitely be asked in the interview, although it is not as important as concurrent programming and databases, it is also a key question in the interview. So today we will look at a high frequency in MQ, but may break your previous knowledge of an interview question.

By key question I mean that this interview question will affect your overall interview outcome.

We get asked a lot when interviewing for Message Queue (MQ), especially when interviewing for Kafka:How do I ensure that messages are not lost?

Then, our answer will be divided into the following 3 parts:

assurancesProducer messages are not lost。
assurancesKafka service (server-side) messages are not lost。
assurancesConsumer messages are not lost。

Only by ensuring that none of these three parts of the message is lost can you ensure that Kafka as a whole does not lose messages.

This is because the Kafka message delivery process is as follows (3 parts in total):

1. How to ensure that producer messages are not lost?

So how do you ensure that producer messages are not lost?

To figure this out, we need to understand the flow of execution of a message sent by a producer.

The execution flow of a message sent by a Kafka producer is as follows:

By default, all messages are cached in the RecordAccumulator cache, and then the Sender thread pulls the messages and sends them to the Kafka server. Through the collaboration of the RecordAccumulator and Sender threads, the batch sending of messages, performance optimization, and exception handling are achieved, ensuring efficient and reliable message through the collaboration of the RecordAccumulator and Sender threads.

1.1 RecordAccumulator Cache Role

pending message: The RecordAccumulator is a key component of the Kafk a producer that acts as a cache for messages sent from the Main Thread. These messages are waiting in the RecordAccumulato r to be sent in bulk by the Sender thread.
batch file: RecordAccumulator reduces the number of network requests for a single message to be sent by collecting messages in bulk, thus improving sending efficiency. the Sender thread can fetch messages from RecordAccumulator in bulk and send them to the Kafka cluster in one go, which reduces the resource consumption of network transmission.
performance optimization: The cache size of the RecordAccumulator can be configured via the Producer Client parameter (default value is 32MB). A proper cache size can balance memory usage and sending efficiency to achieve optimal performance.
memory management: If the RecordAccumulator's cache is full, the producer blocks when it calls the send() method again to send a message (the default blocking time is 60 seconds, which can be configured via a parameter). If the blocking timeout expires, an exception is thrown. This mechanism helps prevent the producer from exhausting system resources by caching messages indefinitely.
ByteBuffer Multiplexing: In order to minimize the resource consumption caused by the frequent creation and release of ByteBuffers, RecordAccumulator maintains an internal BufferPool that enables the reuse of ByteBuffers. ByteBuffers of a specific size are cached for reuse in subsequent messages.

1.2 Role of the Sender thread

pull: The Sender thread is a background thread in the Kafka producer that is responsible for pulling cached messages from the RecordAccumulator.The Sender thread periodically polls the RecordAccumulator to check if there are any new messages that need to be sent.
Batch build requests: When a Sender thread finds a new message to send, it constructs one or more ProducerRequest requests. Each request contains multiple messages for efficient bulk sending. This bulk sending mechanism can significantly improve network transmission efficiency.
Sending a Message to a Kafka Cluster: The Sender thread sends the constructed ProducerRequest to the appropriate partition of the Kafka cluster. It sends the message to the corresponding Broker node based on the partition's Leader node information.
Exception handling: During the message sending process, exceptions may occur such as network failure, partition unavailability, etc. The Sender thread is responsible for handling these exceptions, such as retrying, reconnecting, etc., to ensure that the message is sent reliably.
Status Update: Once a message has been successfully received and logged in the Kafka Broker's log, the Sender thread notifies the RecordAccumulator to update the status of the message. This way, the producer knows which messages have been successfully sent and which ones need to be retried.

2. Two scenarios of producer message loss

After understanding the flow of messages sent by the Kafka producer, we can know that there are two scenarios in which messages are lost in this session:

Network jitter (message unreachable): The link between the producer and the Kafka server is unreachable and the send times out. At this time the state of each node is normal, but the consumer side just did not consume the message, as if the message was lost.
No message acknowledgment (ack): After the producer message is sent, there is no ack message acknowledgement, and the message is returned as successful, but after the message is sent, the Kafka service goes down or loses power, resulting in the loss of the message.

How do we fix this?

2.1 Handling of network fluctuations

If the network fluctuates, you can set the message retry, because the network jitter message is not reachable, so as long as you configure the number of retries, then the message will be retried to ensure that the message is not lost.

In a Spring Boot project, you only need to set the number of retries for the producer in the configuration file:

spring:  
  kafka:  
    producer:  
      retries: 3

2.2 Message Acknowledgement Settings

The Kafka producer's ACK (Acknowledgment) mechanism is the way the producer waits for an acknowledgment after sending a message to the Kafka cluster. This mechanism determines when the producer considers a message to have been successfully sent and has a direct impact on the reliability and performance of the message.

There are three main types of ACK mechanisms for Kafka producers.

① acks=0

The producer considers the message committed as soon as it is sent to the network buffer and does not wait for any response from the server. The number of retries set at this point is invalid.

specificities：

peak performance: Highest throughput due to not waiting for any acknowledgement.
Minimum reliability: Messages may be lost in transit, and the producer has no way of knowing if the message successfully reached the server.

Applicable Scenarios: Scenarios that do not require high message reliability but seek extreme performance.

② acks=1

After sending the message to the partition leader of the topic, the producer waits for the acknowledgment of the leader, that is, the message is considered to have been committed (at this point, the leader writes successfully and does not flush to disk), without waiting for the acknowledgment of all replicas.

specificities：

Medium reliability and performance: Provides a degree of reliability because the producer receives an acknowledgment only after the leader replica acknowledges the message. However, if the leader replica fails after the acknowledgement and the message has not been copied to the other replicas, the message may be lost.
Balancing Performance and Reliability: Provides a compromise between producer performance and message reliability.

Applicable Scenarios: For scenarios where transferring normal logs allows for the occasional loss of a small amount of data.

③ acks=all or acks=-1

The producer needs to wait until all synchronized replicas (ISRs, In-Sync Replicas) have successfully written the message before considering it committed.

specificities：

Highest reliability: The producer receives an acknowledgement only after all synchronized copies have acknowledged receipt of the message, ensuring the reliability of the message.
lower performance: The need to wait for acknowledgements from all synchronized copies may result in increased latency in message sending, which may affect performance.

Applicable Scenarios: For scenarios that require high message reliability, such as financial transactions and other mission-critical applications.

In a Spring Boot project, acks can be set in a configuration file:

spring:  
  kafka:  
    producer:  
      acks: all

=all messages must not be lost?

Normally when we set acks=all, it is actually guaranteed that no data will be lost. HoweverIn a special case, if the Topic has only one Partition, i.e. only one Leader node, the message will be lost.。

If there is only one Leader node, the acks=all setting has a similar effect to the acks=1 setting, and if the Leader goes down after acknowledging the message and before it has a chance to flush the message to disk, then the message will be lost.

All things must be evil, and when an interviewer asks you with a question statement, the answer is basically no. If it's a yes, the interviewer probably won't ask you again, so when you hear a question that goes against common sense, think hard first about whether there are any other answers to the question.

Post-lesson Reflections

How do Kafka servers and consumers ensure that messages are not lost?

This article has been included in my interview mini-site, which contains modules such as Redis, JVM, Concurrency, Concurrency, MySQL, Spring, Spring MVC, Spring Boot, Spring Cloud, MyBatis, Design Patterns, Message Queuing and more.