SRE Essentials - Kafka Explored Zero-Copy Technology

If you're familiar with Kafka, there's a performance optimization technique it uses that might catch your attention -- zero-copy optimization for the operating system.

The zero-copy operation avoids non-essential copies of data, not, of course, that there are no copies at all.

In the Kafka scenario, the OS can bypass the Kafka broker Java program by copying data from the page cache to the socket buffer. This saves some extra copies and some switching between userland and kernelland. Let's look at an example.

Traditional copy

If your application is going to read a file from disk and send it over the network, it may make a bunch of unnecessary copies, as well as user-state/kernel-state switches.

Kafka 传统拷贝

Some terminology:

read buffer: read buffer, operating system page cache
socket buffer: socket buffer, byte buffer used by OS to manage packets
NIC buffer: byte buffer in the NIC
DMA copy: DMA stands for Direct Memory Access, a feature of the memory controller that allows hardware (graphics cards, sound cards, network cards, etc.) to directly access certain data in memory (RAM) without CPU intervention.

In this example, we have 4 mode switches (between user state and kernel state) and 4 data copies.

The application (Kafka in this case) uses DMA copy to load data from disk to the read buffer (User State ->Kernel State）
read buffer to the application's cache (Kernel State ->User State）
To send data to the network, the application actually writes to the socket buffer first (User State ->Kernel State）
socket buffer to NIC buffer (return from kernel state to user state after response data is written)

zero-copy

Kafka 零拷贝

In order to reduce the number of copies and send the data directly from disk to the network, Kafka stores the data in a format that is consistent with the format of the response that will be sent.

In traditional copy mode, steps 2 and 3 don't make much sense because Kafka doesn't do any additional processing of the data, it simply forwards it. Can it be sent directly from disk to the network? The answer is yes. With zero-copy technology, the data on disk still has to go into the read buffer first, and then it doesn't have to be copied to the application's cache, but directly to the NIC buffer, as shown in Step 2: Appends just file descriptors, which is just a file descriptor given to the Socket buffer, and the actual data isn't copied to the Socket buffer. This is called scatter-gather operation (also known as Vectorized I/O). scatter-gather is the behavior of just storing the read buffer data pointer in the socket buffer and letting the DMA read the data directly from memory.

What was the end result?

4 mode switches become 2
Two DMA copies. Still two.
1 tiny copy of the pointer

Kafka 传统拷贝和零拷贝对比

In Kafka

You may have heard that Kafka achieves high performance because of zero-copy, but ideally, zero-copy is not that powerful in most Kafka clusters.

The CPU is rarely the bottleneck. The network saturates much faster, so in most cases the absence of copies in memory doesn't make much of a difference.
Enabling encryption and SSL/TLS already prevents Kafka from using zero-copy.

Original text./p/apache-kafka-zero-copy-operating-system-optimization
Translator: Ba Huite, author of Geek Time column "Operation and Maintenance Monitoring System Practical Notes", initiator of Open-Falcon and Nightingale open source projects, currently in entrepreneurship, as theFlashcat Co-founder, specializing in the surveillance/observability direction. Feel free to talk with me about monitoring/observability related technologies and products.