Optimizing Go Language Data Packaging: Performance Benchmarking and Analysis
take: On a LAN, packets captured on the NICs of multiple machines need to be synchronized to a single machine.
Original program: tcpdump -w Write the file and then call rsync at regular intervals to synchronize it.
Rehabilitation Program: Use Go to rewrite this packet capture logic and synchronization logic to directly send the captured packets over the network to the server, which writes them, thus reducing the need for a drop disk operation.
Constructing a pcap file is as simple as writing apcap file header, each subsequent piece of data is described by adding a metadata.
utilizationpcapgo This function can be realized.[:] for the grabbed packet data.
ci := {
CaptureLength: int(n),
Length: int(n),
Timestamp: (),
}
if > len() {
= len()
}
(ci, [:])
In order to distinguish the packet by which machine it came from an Id is added, counting the metadata and the original packet, the expression structure is as follows
// from /google/gopacket
type CaptureInfo struct {
// Timestamp is the time the packet was captured, if that is known.
Timestamp `json:"ts" msgpack:"ts"`
// CaptureLength is the total number of bytes read off of the wire.
CaptureLength int `json:"cap_len" msgpack:"cap_len"`
// Length is the size of the original packet. Should always be >=
// CaptureLength.
Length int `json:"len" msgpack:"len"`
// InterfaceIndex
InterfaceIndex int `json:"iface_idx" msgpack:"iface_idx"`
}
type CapturePacket struct {
CaptureInfo
Id uint32 `json:"id" msgpack:"id"`
Data []byte `json:"data" msgpack:"data"`
}
There is one detail to be finalized.What structure is used to send the captured packets to the server?? json/msgpack/custom formats?
json/msgpack have corresponding specifications, strong generality, not easy to get out of the bugs, the performance will be a little worse.
Compared to json/msgpack, custom formats can remove unnecessary fields, even the key can not appear in the serialization, and through some optimizations to reduce the allocation of memory, to ease the pressure on the gc.
Custom binary protocol optimization ideas are as follows
- CaptureInfo/Id field directly fixed N bytes, for CaptureLength/Length can directly use 2 bytes to express, Id if the number of small use 1 byte can be expressed.
- memory reuse
- The Encode logic does not allocate memory internally, so it writes directly to the external buffer, and if the external buffer is synchronized, the entire logic has 0 memory allocated.
- Decode does not allocate memory internally, it only parses metadata and copies Data slices, and if the operation is synchronized externally, it allocates 0 memory for the whole process.
- If the operation is asynchronous, the Data is copied at the point where Encode/Decode is called, which can be optimized by allocating 128/1024/8192/65536 of the data in four separate allocations.
There are two optimization points for the
- Asynchronous operations require each packet to have its own space, which cannot be reused, and are used to construct the space belonging to the packet.
- Metadata serialization of fixed byte length buffers, using make or arrays triggers the gc
func acquirePacketBuf(n int) ([]byte, func()) {
var (
buf []byte
putfn func()
)
if n <= CapturePacketMetaLen+128 {
smallBuf := ().(*[CapturePacketMetaLen + 128]byte)
buf = smallBuf[:0]
putfn = func() { (smallBuf) }
} else if n <= CapturePacketMetaLen+1024 {
midBuf := ().(*[CapturePacketMetaLen + 1024]byte)
buf = midBuf[:0]
putfn = func() { (midBuf) }
} else if n <= CapturePacketMetaLen+8192 {
largeBuf := ().(*[CapturePacketMetaLen + 8192]byte)
buf = largeBuf[:0]
putfn = func() { (largeBuf) }
} else {
xlargeBuf := ().(*[CapturePacketMetaLen + 65536]byte)
buf = xlargeBuf[:0]
putfn = func() { (xlargeBuf) }
}
return buf, putfn
}
func (binaryPack) EncodeTo(p *CapturePacket, w ) (int, error) {
buf := ().(*[CapturePacketMetaLen]byte)
defer (buf)
.PutUint64(buf[0:], uint64(()))
...
return nm + nd, err
}
Packet Construction Size (By Tongyi Qianqian)
methodologies | Raw data length (bytes) | Encoded data length (bytes) | Number of change bytes (bytes) |
---|---|---|---|
Binary Pack | 72 | 94 | +22 |
Binary Pack | 1024 | 1046 | +22 |
Binary Pack | 16384 | 16406 | +22 |
MsgPack | 72 | 150 | +78 |
MsgPack | 1024 | 1103 | +79 |
MsgPack | 16384 | 16463 | +79 |
Json Pack | 72 | 191 | +119 |
Json Pack | 1024 | 1467 | +443 |
Json Pack | 16384 | 21949 | +5565 |
Json Compress Pack | 72 | 195 | +123 |
Json Compress Pack | 1024 | 1114 | +90 |
Json Compress Pack | 16384 | 15504 | -120 |
analyze
-
Binary Pack:
- For smaller data (72 bytes), the encoding adds 22 bytes.
- For larger data (16384 bytes), the encoding adds 22 bytes.
- Overall, Binary Pack is more efficiently encoded, adding relatively few bytes.
-
MsgPack:
- For smaller data (72 bytes), the encoding adds 78 bytes.
- For larger data (16384 bytes), the encoding adds 79 bytes.
- The encoding efficiency of MsgPack is not as good as Binary Pack in small data volume, but still maintains high efficiency in large data volume.
-
Json Pack:
- For the smaller data (72 bytes), the encoding adds 119 bytes.
- For larger data (16384 bytes), the encoding adds 5565 bytes.
- Json Pack encoding is less efficient, especially for large amounts of data, adding more bytes.
-
Json Compress Pack:
- For the smaller data (72 bytes), the encoding adds 123 bytes.
- For larger data (16384 bytes), the encoding adds 120 bytes.
- Json Compress Pack adds more bytes for small data volumes, but less bytes for large data volumes, indicating better compression.
With this table, you can more visually see how different data packing methods perform with different data volumes. Hope this is helpful to you!
benchmark
json
You can see that multiplexing with buffers is a significant improvement, mainly due to the reduction in memory allocation.
BenchmarkJsonPack/encode#72-20 17315143 647.1 ns/op 320 B/op 3 allocs/op
BenchmarkJsonPack/encode#1024-20 4616841 2835 ns/op 1666 B/op 3 allocs/op
BenchmarkJsonPack/encode#16384-20 365313 34289 ns/op 24754 B/op 3 allocs/op
BenchmarkJsonPack/encode_with_buf#72-20 24820188 447.4 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/encode_with_buf#1024-20 13139395 910.6 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/encode_with_buf#16384-20 1414260 8472 ns/op 128 B/op 2 allocs/op
BenchmarkJsonPack/decode#72-20 8699952 1364 ns/op 304 B/op 8 allocs/op
BenchmarkJsonPack/decode#1024-20 2103712 5605 ns/op 1384 B/op 8 allocs/op
BenchmarkJsonPack/decode#16384-20 159140 73101 ns/op 18664 B/op 8 allocs/op
msgpack
Again, we see a boost in multiplexing with buffers, and the watershed between json and msgpack is around 1024 bytes, above which msgpack is much faster and the memory footprint doesn't grow with the data when parsing.
BenchmarkMsgPack/encode#72-20 10466427 1199 ns/op 688 B/op 8 allocs/op
BenchmarkMsgPack/encode#1024-20 6599528 2132 ns/op 1585 B/op 8 allocs/op
BenchmarkMsgPack/encode#16384-20 1478127 8806 ns/op 18879 B/op 8 allocs/op
BenchmarkMsgPack/encode_with_buf#72-20 26677507 388.2 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/encode_with_buf#1024-20 31426809 400.2 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/encode_with_buf#16384-20 22588560 494.5 ns/op 192 B/op 4 allocs/op
BenchmarkMsgPack/decode#72-20 19894509 654.2 ns/op 280 B/op 10 allocs/op
BenchmarkMsgPack/decode#1024-20 18211321 664.0 ns/op 280 B/op 10 allocs/op
BenchmarkMsgPack/decode#16384-20 13755824 769.1 ns/op 280 B/op 10 allocs/op
The effect of compression
In the case of an intranet, where bandwidth is not an issue, the results of this pressure test were directly Passed
BenchmarkJsonCompressPack/encode#72-20 19934 709224 ns/op 1208429 B/op 26 allocs/op
BenchmarkJsonCompressPack/encode#1024-20 17577 766349 ns/op 1212782 B/op 26 allocs/op
BenchmarkJsonCompressPack/encode#16384-20 11757 860371 ns/op 1253975 B/op 25 allocs/op
BenchmarkJsonCompressPack/decode#72-20 490164 28972 ns/op 42048 B/op 15 allocs/op
BenchmarkJsonCompressPack/decode#1024-20 187113 71612 ns/op 47640 B/op 23 allocs/op
BenchmarkJsonCompressPack/decode#16384-20 35790 346580 ns/op 173352 B/op 30 allocs/op
Customized binary protocols
For serialization and deserialization after multiplexing the memory, the speed improvement is very significant, and in synchronous operation, it can do 0 byte allocation. Asynchronous scenarios use fixed byte allocation (two return values are allocated on the heap).
BenchmarkBinaryPack/encode#72-20 72744334 187.1 ns/op 144 B/op 2 allocs/op
BenchmarkBinaryPack/encode#1024-20 17048832 660.6 ns/op 1200 B/op 2 allocs/op
BenchmarkBinaryPack/encode#16384-20 2085050 6280 ns/op 18495 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#72-20 34700313 109.2 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#1024-20 39370662 101.1 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_with_pool#16384-20 18445262 177.2 ns/op 64 B/op 2 allocs/op
BenchmarkBinaryPack/encode_to#72-20 705428736 16.96 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/encode_to#1024-20 575312358 20.78 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/encode_to#16384-20 100000000 113.4 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#72-20 1000000000 2.890 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#1024-20 1000000000 2.886 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_meta#16384-20 1000000000 2.878 ns/op 0 B/op 0 allocs/op
BenchmarkBinaryPack/decode_with_pool#72-20 106808395 31.51 ns/op 16 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#1024-20 100319094 35.94 ns/op 16 B/op 1 allocs/op
BenchmarkBinaryPack/decode_with_pool#16384-20 26447718 138.6 ns/op 16 B/op 1 allocs/op
To summarize.
"Tongyi Ch'ien", one of the thousands of questions asked by Mao * before he was born
Binary Pack:
- encode_to: optimal performance, almost no memory allocation, suitable for scenarios with high performance requirements.
- encode_with_pool: optimized using memory pool, significantly reduces time and memory overhead for most scenarios.
- encode: standard method with high time and memory overhead.
MsgPack:
- encode_with_buf: uses preallocated buffers, significantly reducing time and memory overhead for most scenarios.
- encode: standard method with high time and memory overhead.
- decode: average decoding performance, high memory overhead.
Json Pack:
- encode_with_buf: uses preallocated buffers, significantly reducing time and memory overhead for most scenarios.
- encode: standard method with high time and memory overhead.
- decode: poor decoding performance and high memory overhead.
Json Compress Pack:
- encode: standard method, time and memory overhead is very high, not recommended for high-performance requirements of the scene.
- decode: poor decoding performance and high memory overhead.
My summary.
Intranet environment for transmission, the general network bandwidth will not become a bottleneck, so you can not consider the data compression, the above results also see the compression is very resource-intensive;
If you don't care about the content of the data and the amount of data is very large (e.g., transferring pcap packets), then it may be more appropriate to use a custom protocol, as there is a huge amount of room for optimization in parsing fixed-length metadata, and the binary parsing is faster than json/msgpack and the memory allocation is very small.
quote
- Benchkark results for constructed packets./zxhio/benchmark/tree/main/pack