Optimizing Go Language Data Packaging: Performance Benchmarking and Analysis

take: On a LAN, packets captured on the NICs of multiple machines need to be synchronized to a single machine.
Original program: tcpdump -w Write the file and then call rsync at regular intervals to synchronize it.
Rehabilitation Program: Use Go to rewrite this packet capture logic and synchronization logic to directly send the captured packets over the network to the server, which writes them, thus reducing the need for a drop disk operation.

Constructing a pcap file is as simple as writing apcap file header, each subsequent piece of data is described by adding a metadata.
utilizationpcapgo This function can be realized.[:] for the grabbed packet data.

ci := {
	CaptureLength: int(n),
	Length:        int(n),
	Timestamp:     (),
}
if  > len() {
	 = len()
}
(ci, [:])

In order to distinguish the packet by which machine it came from an Id is added, counting the metadata and the original packet, the expression structure is as follows

// from /google/gopacket
type CaptureInfo struct {
	// Timestamp is the time the packet was captured, if that is known.
	Timestamp  `json:"ts" msgpack:"ts"`
	// CaptureLength is the total number of bytes read off of the wire.
	CaptureLength int `json:"cap_len" msgpack:"cap_len"`
	// Length is the size of the original packet.  Should always be >=
	// CaptureLength.
	Length int `json:"len" msgpack:"len"`
	// InterfaceIndex
	InterfaceIndex int `json:"iface_idx" msgpack:"iface_idx"`
}

type CapturePacket struct {
	CaptureInfo
	Id   uint32 `json:"id" msgpack:"id"`
	Data []byte `json:"data" msgpack:"data"`
}

There is one detail to be finalized.What structure is used to send the captured packets to the server?? json/msgpack/custom formats?

json/msgpack have corresponding specifications, strong generality, not easy to get out of the bugs, the performance will be a little worse.
Compared to json/msgpack, custom formats can remove unnecessary fields, even the key can not appear in the serialization, and through some optimizations to reduce the allocation of memory, to ease the pressure on the gc.

Custom binary protocol optimization ideas are as follows

CaptureInfo/Id field directly fixed N bytes, for CaptureLength/Length can directly use 2 bytes to express, Id if the number of small use 1 byte can be expressed.
memory reuse
1. The Encode logic does not allocate memory internally, so it writes directly to the external buffer, and if the external buffer is synchronized, the entire logic has 0 memory allocated.
2. Decode does not allocate memory internally, it only parses metadata and copies Data slices, and if the operation is synchronized externally, it allocates 0 memory for the whole process.
3. If the operation is asynchronous, the Data is copied at the point where Encode/Decode is called, which can be optimized by allocating 128/1024/8192/65536 of the data in four separate allocations.

There are two optimization points for the

Asynchronous operations require each packet to have its own space, which cannot be reused, and are used to construct the space belonging to the packet.
Metadata serialization of fixed byte length buffers, using make or arrays triggers the gc

func acquirePacketBuf(n int) ([]byte, func()) {
	var (
		buf   []byte
		putfn func()
	)
	if n <= CapturePacketMetaLen+128 {
		smallBuf := ().(*[CapturePacketMetaLen + 128]byte)
		buf = smallBuf[:0]
		putfn = func() { (smallBuf) }
	} else if n <= CapturePacketMetaLen+1024 {
		midBuf := ().(*[CapturePacketMetaLen + 1024]byte)
		buf = midBuf[:0]
		putfn = func() { (midBuf) }
	} else if n <= CapturePacketMetaLen+8192 {
		largeBuf := ().(*[CapturePacketMetaLen + 8192]byte)
		buf = largeBuf[:0]
		putfn = func() { (largeBuf) }
	} else {
		xlargeBuf := ().(*[CapturePacketMetaLen + 65536]byte)
		buf = xlargeBuf[:0]
		putfn = func() { (xlargeBuf) }
	}
	return buf, putfn
}

func (binaryPack) EncodeTo(p *CapturePacket, w ) (int, error) {
	buf := ().(*[CapturePacketMetaLen]byte)
	defer (buf)

	.PutUint64(buf[0:], uint64(()))
    ...
	return nm + nd, err
}

Packet Construction Size (By Tongyi Qianqian)

methodologies	Raw data length (bytes)	Encoded data length (bytes)	Number of change bytes (bytes)
Binary Pack	72	94	+22
Binary Pack	1024	1046	+22
Binary Pack	16384	16406	+22
MsgPack	72	150	+78
MsgPack	1024	1103	+79
MsgPack	16384	16463	+79
Json Pack	72	191	+119
Json Pack	1024	1467	+443
Json Pack	16384	21949	+5565
Json Compress Pack	72	195	+123
Json Compress Pack	1024	1114	+90
Json Compress Pack	16384	15504	-120

analyze

Binary Pack：
- For smaller data (72 bytes), the encoding adds 22 bytes.
- For larger data (16384 bytes), the encoding adds 22 bytes.
- Overall, Binary Pack is more efficiently encoded, adding relatively few bytes.
MsgPack：
- For smaller data (72 bytes), the encoding adds 78 bytes.
- For larger data (16384 bytes), the encoding adds 79 bytes.
- The encoding efficiency of MsgPack is not as good as Binary Pack in small data volume, but still maintains high efficiency in large data volume.
Json Pack：
- For the smaller data (72 bytes), the encoding adds 119 bytes.
- For larger data (16384 bytes), the encoding adds 5565 bytes.
- Json Pack encoding is less efficient, especially for large amounts of data, adding more bytes.
Json Compress Pack：
- For the smaller data (72 bytes), the encoding adds 123 bytes.
- For larger data (16384 bytes), the encoding adds 120 bytes.
- Json Compress Pack adds more bytes for small data volumes, but less bytes for large data volumes, indicating better compression.

With this table, you can more visually see how different data packing methods perform with different data volumes. Hope this is helpful to you!

benchmark

json

You can see that multiplexing with buffers is a significant improvement, mainly due to the reduction in memory allocation.

BenchmarkJsonPack/encode#72-20                    17315143             647.1 ns/op             320 B/op          3 allocs/op
BenchmarkJsonPack/encode#1024-20                   4616841              2835 ns/op            1666 B/op          3 allocs/op
BenchmarkJsonPack/encode#16384-20                   365313             34289 ns/op           24754 B/op          3 allocs/op
BenchmarkJsonPack/encode_with_buf#72-20           24820188             447.4 ns/op             128 B/op          2 allocs/op
BenchmarkJsonPack/encode_with_buf#1024-20         13139395             910.6 ns/op             128 B/op          2 allocs/op
BenchmarkJsonPack/encode_with_buf#16384-20         1414260              8472 ns/op             128 B/op          2 allocs/op
BenchmarkJsonPack/decode#72-20                     8699952              1364 ns/op             304 B/op          8 allocs/op
BenchmarkJsonPack/decode#1024-20                   2103712              5605 ns/op            1384 B/op          8 allocs/op
BenchmarkJsonPack/decode#16384-20                   159140             73101 ns/op           18664 B/op          8 allocs/op

msgpack

Again, we see a boost in multiplexing with buffers, and the watershed between json and msgpack is around 1024 bytes, above which msgpack is much faster and the memory footprint doesn't grow with the data when parsing.

BenchmarkMsgPack/encode#72-20                     10466427              1199 ns/op             688 B/op          8 allocs/op
BenchmarkMsgPack/encode#1024-20                    6599528              2132 ns/op            1585 B/op          8 allocs/op
BenchmarkMsgPack/encode#16384-20                   1478127              8806 ns/op           18879 B/op          8 allocs/op
BenchmarkMsgPack/encode_with_buf#72-20            26677507             388.2 ns/op             192 B/op          4 allocs/op
BenchmarkMsgPack/encode_with_buf#1024-20          31426809             400.2 ns/op             192 B/op          4 allocs/op
BenchmarkMsgPack/encode_with_buf#16384-20         22588560             494.5 ns/op             192 B/op          4 allocs/op
BenchmarkMsgPack/decode#72-20                     19894509             654.2 ns/op             280 B/op         10 allocs/op
BenchmarkMsgPack/decode#1024-20                   18211321             664.0 ns/op             280 B/op         10 allocs/op
BenchmarkMsgPack/decode#16384-20                  13755824             769.1 ns/op             280 B/op         10 allocs/op

The effect of compression

In the case of an intranet, where bandwidth is not an issue, the results of this pressure test were directly Passed

BenchmarkJsonCompressPack/encode#72-20               19934            709224 ns/op         1208429 B/op         26 allocs/op
BenchmarkJsonCompressPack/encode#1024-20             17577            766349 ns/op         1212782 B/op         26 allocs/op
BenchmarkJsonCompressPack/encode#16384-20            11757            860371 ns/op         1253975 B/op         25 allocs/op
BenchmarkJsonCompressPack/decode#72-20              490164             28972 ns/op           42048 B/op         15 allocs/op
BenchmarkJsonCompressPack/decode#1024-20            187113             71612 ns/op           47640 B/op         23 allocs/op
BenchmarkJsonCompressPack/decode#16384-20            35790            346580 ns/op          173352 B/op         30 allocs/op

Customized binary protocols

For serialization and deserialization after multiplexing the memory, the speed improvement is very significant, and in synchronous operation, it can do 0 byte allocation. Asynchronous scenarios use fixed byte allocation (two return values are allocated on the heap).

BenchmarkBinaryPack/encode#72-20                  72744334             187.1 ns/op             144 B/op          2 allocs/op
BenchmarkBinaryPack/encode#1024-20                17048832             660.6 ns/op            1200 B/op          2 allocs/op
BenchmarkBinaryPack/encode#16384-20                2085050              6280 ns/op           18495 B/op          2 allocs/op
BenchmarkBinaryPack/encode_with_pool#72-20        34700313             109.2 ns/op              64 B/op          2 allocs/op
BenchmarkBinaryPack/encode_with_pool#1024-20      39370662             101.1 ns/op              64 B/op          2 allocs/op
BenchmarkBinaryPack/encode_with_pool#16384-20     18445262             177.2 ns/op              64 B/op          2 allocs/op
BenchmarkBinaryPack/encode_to#72-20              705428736             16.96 ns/op               0 B/op          0 allocs/op
BenchmarkBinaryPack/encode_to#1024-20            575312358             20.78 ns/op               0 B/op          0 allocs/op
BenchmarkBinaryPack/encode_to#16384-20           100000000             113.4 ns/op               0 B/op          0 allocs/op
BenchmarkBinaryPack/decode_meta#72-20           1000000000             2.890 ns/op               0 B/op          0 allocs/op
BenchmarkBinaryPack/decode_meta#1024-20         1000000000             2.886 ns/op               0 B/op          0 allocs/op
BenchmarkBinaryPack/decode_meta#16384-20        1000000000             2.878 ns/op               0 B/op          0 allocs/op
BenchmarkBinaryPack/decode_with_pool#72-20       106808395             31.51 ns/op              16 B/op          1 allocs/op
BenchmarkBinaryPack/decode_with_pool#1024-20     100319094             35.94 ns/op              16 B/op          1 allocs/op
BenchmarkBinaryPack/decode_with_pool#16384-20     26447718             138.6 ns/op              16 B/op          1 allocs/op

To summarize.

"Tongyi Ch'ien", one of the thousands of questions asked by Mao * before he was born

Binary Pack：
- encode_to: optimal performance, almost no memory allocation, suitable for scenarios with high performance requirements.
- encode_with_pool: optimized using memory pool, significantly reduces time and memory overhead for most scenarios.
- encode: standard method with high time and memory overhead.
MsgPack：
- encode_with_buf: uses preallocated buffers, significantly reducing time and memory overhead for most scenarios.
- encode: standard method with high time and memory overhead.
- decode: average decoding performance, high memory overhead.
Json Pack：
- encode_with_buf: uses preallocated buffers, significantly reducing time and memory overhead for most scenarios.
- encode: standard method with high time and memory overhead.
- decode: poor decoding performance and high memory overhead.
Json Compress Pack：
- encode: standard method, time and memory overhead is very high, not recommended for high-performance requirements of the scene.
- decode: poor decoding performance and high memory overhead.

My summary.

Intranet environment for transmission, the general network bandwidth will not become a bottleneck, so you can not consider the data compression, the above results also see the compression is very resource-intensive;
If you don't care about the content of the data and the amount of data is very large (e.g., transferring pcap packets), then it may be more appropriate to use a custom protocol, as there is a huge amount of room for optimization in parsing fixed-length metadata, and the binary parsing is faster than json/msgpack and the memory allocation is very small.

quote

Benchkark results for constructed packets./zxhio/benchmark/tree/main/pack