Implementation of RAFT's buffer in cornerstone

1. Overview:

Talking about raft protocol implementation does not bypass the popular online mit6.824, but its go language, there is no official lab answer, the framework is also very obscure and difficult to understand, and the whole network does not have a blog has a clear explanation, some just dump a bunch of terms and then directly paste the code without any comments, very unsuitable for learning.
But the github topcornerstoneis a pure c++ implementation of a very elegant and streamlined raft protocol , code style beautiful , easy to understand the code , clear interface , very friendly to the c++ party , but also very suitable for beginners to learn raft to learn .
Given that no one has ever parsed the source code for such a great piece of code as cornerstone, I decided to document the process of learning the source code and parsing it in detail.
So where do we have to start analyzing cornerstone? The starting entry point should be as small as possible and at the same time highly generalized and used again in many segments.
(indicates contrast)bufferIt is a very important concept in cornerstone, from rpc sending request, receiving response, to logging and so on all use buffer to store information.
So let's start our parsing of the cornerstone source code with buffer.

overall organization

As shown, the overall buffer can be divided into 3 parts:
size: RecordsdataThe number of bytes in the block, numbered from 0 (size = 0 means that the data block is empty rather than the buffer being empty), excluding the precedingsizetogether withpos（size + pos common nameheader）。
According to the size of the size (that is, the number of bytes in the data block), the buffer can be divided into large and small blocks, where size >= 0x8000 is a large block, otherwise it is a small block.(Here is a question: the size of the block size is a range, how do we quickly determine whether the buffer is a large block or a small block? (The answer to this question will be discussed in detail later)
pos: RecordsdataThe read and write pointers to the blocks, also numbered from 0 (hereposrecords the location of both read and write operations, and is itself a pointer to aI can't save two messages.So we need to manually adjust it ourselvespos）
data: stores the actual data inside the buffer, which can beint，ulong，strand many other types

memory allocation

Let's start with the source code:

bufptr buffer::alloc(const size_t size) {
    if (size >= 0x80000000) {
        throw std::out_of_range("size exceed the max size that cornrestone::buffer could support");
    }

    if (size >= 0x8000) {
        bufptr buf(reinterpret_cast<buffer*>(new char[size + sizeof(uint) * 2]), &free_buffer);
        any_ptr ptr = reinterpret_cast<any_ptr>(());
        __init_b_block(ptr, size);
        return buf;
    }

    bufptr buf(reinterpret_cast<buffer*>(new char[size + sizeof(ushort) * 2]), &free_buffer);
    any_ptr ptr = reinterpret_cast<any_ptr>(());
    __init_s_block(ptr, size);
    return buf;
}

We will only analyze the allocation of large chunks here, and the same for small chunks of code.
(1)First determine the size of the requested allocation size, if thesize >= 0x80000000The exception is thrown directly.

(2)(coll.) fail (a student)size >= 0x8000(aka INT_MAX = 32768) when it means to allocate large blocks. This is accomplished through thenew char[size + sizeof(uint) * 2]The requested size + number of bytes for the header is allocated. Here bufptr is defined in theinteriorusing bufptr = uptr<buffer, void (*)(buffer*)>;The second parameter is a unique_ptr that points to a buffer class. According to the definition of bufptr, we can know that it is a unique_ptr pointing to the buffer class, and the second parametervoid(*)(buffer*)is a function pointer with void return value and buffer* parameter, which corresponds to the source code's&free_buffer, is a customized function that frees the contents pointed to by bufptr.

(3)Expanding bufptr isunique_ptr<buffer, &free_buffer> buf(reinterpret_cast<buffer*>(new char[size + sizeof(uint) * 2]), &free_buffer), where reinterpret_cast is used for unrelated type interconversions. new char[] returns a char * pointer, but according to theunique_ptr<T> A(xxx)The xxx inside the syntax brackets is a pointer to type T, so we need to use reinterpret_cast to convert the char * pointer to a buffer *.

(4)Finish allocating the memory then go to theany_ptr ptr = reinterpret_cast<any_ptr>(());Here any_ptr is in thebasic_types.hxxThe definition inside istypedef void* any_ptr;The And()is a member function of unique_ptr to obtain its original pointer, thenany_ptr ptr = reinterpret_cast<any_ptr>(());What this line accomplishes is to extract the original pointer and convert it to thevoid*typology

(5)Next.__init_b_block(ptr, size);This macro defines

#define __init_block(p, s, t) ((t*)(p))[0] = (t)s;\
    ((t*)(p))[1] = 0
#define __init_s_block(p, s) __init_block(p, s, ushort)
#define __init_b_block(p, s) __init_block(p, s, uint);\
    *((uint*)(p)) |= 0x80000000

b_block for large blocks, s_block for small blocks
(5.1)Whether large or small chunks are passed__init_block(p, s, t)to initialize, t denotes the type (ushortmaybeuint), p is the pointer to the buffer, s is the buffer'ssizeParameters.

(5.2)aheadThe general architecture of the bufferInside we said that the buffer is divided into three parts, so here p[0],p[1] is obviously the correspondingsize together withpos Parameters.

(5.3)Why initializesize together withposParameters are not used directly with thep[0] = size, p[1] = posAnd? Here.((t*)(p))[0]，((t*)(p))[1]What is it again?
As we stipulatesize >= 0x8000(USHORT_MAX = 32768), that is, the size of our p[0] storage in the big block can not be used ushort to express, must be used uint type, so we will p pointer strong to uint * type, so uint * meaning of p[0] will be indicated to p start to count backward uint number of bytes to store the size of our. pos is also the same reason, because the pos is a description of the data block, so pos is a read/write pointer to the data block.\(\in\) [0, size), also need to consider whether to use uint type or ushort type.

(5.4)When initializing a chunk why*((uint*)(p)) |= 0x80000000？
Here is the key to the problem we described earlier of how to determine whether the buffer is a large or small block.
0x80000000 converted to decimal is 2³¹, due to the large chunks ofsize、posis a uint, so it has 32 and unsigned bits, 2³¹It just happens to occupy the highest bit of the uint.
Let p strongly converted to uint * type and then * take the content of the uint type of value (actually is the uint type of p [0]), and then its | = 0x80000000 so that the highest bit is 1.

And exactly how is this highest bit of 1 used to determine the size of the block?

#define __is_big_block(p) (0x80000000 & *((uint*)(p)))

We convert p[0] to the uint type, followed by a concatenation with 0x80000000.

If it's a big chunk, since we__init_b_blockThe result of comparing the highest position 1 of p[0] with 0x80000000 at the time of the corresponding__is_big_blockThe return value is 1
If it is a small block, since the & operation is performed bitwise, the highest bit is 0, and the following bits get 0 due to 0x80000000 being all 0, corresponding to the__is_big_blockThe return value is 0
Instead of performing bitwise operations, we thensize >= 0x80000000The judgment to quickly determine whether the buffer is a large or small chunk.

(6) Understanding completion__init_b_blockAfter defining the macros of thebufptrTaking the original pointer and then converting it toany_ptr？
First we have to know that unique_ptr in smart pointers haveexclusive ownershipconcept, while uint* and ushort* are both ordinary pointers with no ownership management, so they cannot be converted.
But unique_ptr gives us the get() member function, which allows us toNo transfer of ownerships use raw pointers, which are convertible to the uint* or ushort* we need.
So we need to first call the()Takes the original pointer and converts it to any_ptr of type void* and then to uint* or ushort* as needed.

Writing of data

4.1 Write of byte data

void buffer::put(byte b) {
    if (size() - pos() < sz_byte) {
        throw std::overflow_error("insufficient buffer to store byte");
    }

    byte* d = data();
    *d = b;
    __mv_fw_block(this, sz_byte);
}

Before we explain exactly how to write it again, let's explain the strange function inside the code.

(1)size()function (math.)

size_t buffer::size() const {
    return (size_t)(__size_of_block(this));
}

__size_of_blockThe macro definition of

#define __size_of_block(p) (__is_big_block(p)) ? (*((uint*)(p)) ^ 0x80000000) : *((ushort*)(p))

If it is a large block, we will speak into the int type of p[0] iso or 0x80000000. earlier we said that the large block of p[0] need to |= 0x80000000 will be the highest position 1 to achieve the purpose of rapid judgment of the size of the block. In the large block of real p[0] that the size of the data, we need to take the opposite or will be eliminated 1 to get the real size.
If it's a small chunk, take the ushort type p[0] directly

(2)pos()function (math.)

size_t buffer::pos() const {
    return (size_t)(__pos_of_block(this));
}

The corresponding macro definition is

#define __pos_of_s_block(p) ((ushort*)(p))[1]
#define __pos_of_b_block(p) ((uint*)(p))[1]

Choose uint or ushort type p[1] depending on whether the chunk is large or small

(3)data()function (math.)

byte* buffer::data() const {
    return __data_of_block(this);
}

The corresponding macro definition:

#define __data_of_block(p) (__is_big_block(p)) ? (byte*) (((byte*)(((uint*)(p)) + 2)) + __pos_of_b_block(p)) : (byte*) (((byte*)(((ushort*)p) + 2)) + __pos_of_s_block(p))

The __data_of_block here is a bit complicated, so let's look at it as a step-by-step breakdown of a large block as an example, and ditto for a small block.

first by(uint*)(p)Convert p to uint* type, then + 2 (2 bytes of uint) on top of that. According to the general architecture of the buffer earlier we know that the first two regions of the buffer aresizetogether withposThe size and pos of the chunk are both uints. The size and pos of the chunk are both uint, so by converting p to uint* and then + 2, we can jump to the beginning of the data chunk.
But the process of reading and writing buffer read/write pointer will change, for example, we wrote 1, and then want to write 0, if we still write from the beginning of the data will directly overwrite the beginning of the 1, only from the end of the 1 to continue to write 0 to be reasonable, in other words, we have to realize the append mode. In other words, we want to realize append mode.((byte*)(((uint*)(p)) + 2)) + __pos_of_b_block(p)) pass (a bill or inspection etc)+__pos_of_b_block(p) Jumps to the current location of the read/write pointer.
The same thing happens with small blocks, except that they are switched from uint* to ushort*.

With these two steps we get a pointer to the current location of the read/write pointer.

(4)__mv_fw_block(this, sz_byte);
This is a macro definition

#define __mv_fw_block(p, d) if(__is_big_block(p)){\
    ((uint*)(p))[1] += (d);\
    }\
    else{\
    ((ushort*)(p))[1] += (ushort)(d);\
    }

Each time we read or write the buffer, we update the p[1] representation of theposthat implements stream reads or stream writes.
For example, to read in 12345, we read 1 and let pos += 1, so that if we read it again, we can read 2. If we read 123 once, we let pos += 3, so that if we read it again, we can start with 4. The same applies to writing.

After introducing these functions, let's go back to writing byte data.

First determine whether the data written exceeds the size of the buffer, and if so, throw an overflow exception.
Otherwise fetch the current read/write pointer location of thebyte *d = data();
pass (a bill or inspection etc)*d = bWrite byte-type data b
__mv_fw_block(this, sz_byte);Update read and write pointers

4.2 Writing data of type int32

Unlike simple direct byte data writing, multibyte data writing is sequential.
Let's look at the source code first:

void buffer::put(int32 val) {
    if (size() - pos() < sz_int) {
        throw std::overflow_error("insufficient buffer to store int32");
    }

    byte* d = data();
    for (size_t i = 0; i < sz_int; ++i) {
        *(d + i) = (byte)(val >> (i * 8));
    }

    __mv_fw_block(this, sz_int);
}

The front is similar to writing byte data, but the focus is on the back of the for loop.

    for (size_t i = 0; i < sz_int; ++i) {
        *(d + i) = (byte)(val >> (i * 8));
    }

sz_intmeanNumber of bytes in intThe for loop iterates through the forint valof each byte.
*（d + i）is to assign a value to the position of the ith byte counting from d.
(byte)(val >> (i * 8)); It's shifting val to the right.i * 8bits, and then the right-shifted lower 8 bits are taken by byte conversion.

Together.The position of the i-th byte counting from d is assigned to val shifted i * 8 bits to the lower 8 bits.。
Let's give specific examples:
For example to put in 1010001111111110010 (decimal = 327658).

First i = 0, put 11110010 into d[0] of the buffer.
Second time i = 1, put 01111111 into d[1] of the buffer.
Third time i = 2, put 00010100 into d[2] of the buffer.
Fourth time i = 3, put 00000000 into d[3] of the buffer.

Obviously you can see that this is storing val in reverse order byte by byte, so why not just store it positively in the original order?
The answer is because it's hard, we do byte conversions by (byte), and that conversion is intercepting theLow 8-bitspeciousHigher 8-bitSo we can only store val in reverse order when we for loop through each byte of val.

4.3 Writing data of type ulong

void buffer::put(ulong val) {
    if (size() - pos() < sz_ulong) {
        throw std::overflow_error("insufficient buffer to store int32");
    }

    byte* d = data();
    for (size_t i = 0; i < sz_ulong; ++i) {
        *(d + i) = (byte)(val >> (i * 8));
    }

    __mv_fw_block(this, sz_ulong);
}

Parsing is the same as 4.2, the for loop iterates through each byte of val and stores them in reverse order.

4.4 Writing data of type string

void buffer::put(const std::string& str) {
    if (size() - pos() < (() + 1)) {
        throw std::overflow_error("insufficient buffer to store a string");
    }

    byte* d = data();
    for (size_t i = 0; i < (); ++i) {
        *(d + i) = (byte)str[i];
    }

    *(d + ()) = (byte)0;
    __mv_fw_block(this, () + 1);
}

due toEach element of the string type is of type char(Byte is actually unsigned char), so even though string is also multi-byte data, it does not need to be stored in reverse order like int32 or ulong.
*(d + ()) = (byte)0;Setting the last byte to 0 indicates the termination of the string, in the same way that a c-string such as "hello World!" automatically adds '\0' at the end.

4.5 Write of buffer type data

void buffer::put(const buffer& buf) {
    size_t sz = size();
    size_t p = pos();
    size_t src_sz = ();
    size_t src_p = ();
    if ((sz - p) < (src_sz - src_p)) {
        throw std::overflow_error("insufficient buffer to hold the other buffer");
    }

    byte* d = data();
    byte* src = ();
    ::memcpy(d, src, src_sz - src_p);
    __mv_fw_block(this, src_sz - src_p);
}

Deep copy via memcpy.

Reading of data

5.1 byte data reading

byte buffer::get_byte() {
    size_t avail = size() - pos();
    if (avail < sz_byte) {
        throw std::overflow_error("insufficient buffer available for a byte");
    }

    byte val = *data();
    __mv_fw_block(this, sz_byte);
    return val;
}

First determine whether the buffer has enough space, if not, throw an overflow exception
Call data() directly to get the location of the read/write pointer and take its contents in * notation to get the desired val
__mv_fw_block(this, sz_byte);Update read and write pointers

5.2 int32 data reading

int32 buffer::get_int() {
    size_t avail = size() - pos();
    if (avail < sz_int) {
        throw std::overflow_error("insufficient buffer available for an int32 value");
    }

    byte* d = data();
    int32 val = 0;
    for (size_t i = 0; i < sz_int; ++i) {
        int32 byte_val = (int32)*(d + i);
        val += (byte_val << (i * 8));
    }

    __mv_fw_block(this, sz_int);
    return val;
}

Focus on the for loop that iterates through all the bytes of an int32 starting at d.

For each byte, take out thebyte_val
commander-in-chief (military)byte_valshift lefti * 8Bits are added to theval
come (or go) backval

ahead4.2 Write of int32 dataWe've already said that multibyte data is stored in reverse order (except for string types)

Let's take 1010001111111110010 (decimal = 327658) as an example again
Since the data in the reverse order storage buffer should be 11110010|01111111|00010100 |00000000

i = 0, fetch byte_val = 11110010, this byte_val should occupy bits 0-7 in the original data, so we left-shift by 0 * 8 = 0 bits and accumulate to val
i = 1, fetch byte_val = 01111111, this byte_val should occupy 8-15 bits in the original data, so we shift left by 1 * 8 = 8 bits and accumulate to val
i = 2, fetch byte_val = 00010100, this byte_val should occupy 16-23 bits in the original data, so we shift left by 2 * 8 = 16 bits and accumulate to val
i = 3, fetch byte_val = 00000000, this byte_val should occupy 24-31 bits in the original data, so we left-shift 3 * 8 = 24 bits and accumulate to val

In the end, we get the normal order of the data val

5.3 ulong data reading

ulong buffer::get_ulong() {
    size_t avail = size() - pos();
    if (avail < sz_ulong) {
        throw std::overflow_error("insufficient buffer available for an ulong value");
    }

    byte* d = data();
    ulong val = 0L;
    for (size_t i = 0; i < sz_ulong; ++i) {
        ulong byte_val = (ulong)*(d + i);
        val += (byte_val << (i * 8));
    }

    __mv_fw_block(this, sz_ulong);
    return val;
}

The analysis is the same as 5.2, except that the number of bytes traversed is changed from sz_int to sz_ulong.

5.4 Reading of string types

const char* buffer::get_str() {
    size_t p = pos();
    size_t s = size();
    size_t i = 0;
    byte* d = data();
    while ((p + i) < s && *(d + i)) ++i;
    if (p + i >= s || i == 0) {
        return nilptr;
    }

    __mv_fw_block(this, i + 1);
    return reinterpret_cast<const char*>(d);
}

The string type data is read a little differently.

while ((p + i) < s && *(d + i)) ++i;
    if (p + i >= s || i == 0) {
        return nilptr;
    }

First of all we don't know the length of the string type data, we only know that it ends in 0. So we have to use while to keep fetching
Inside the while, the judgment of 0 is done directly with thewhile(*(d + i)) ++iThat's all. Why do you need another one?(p + i) < sAnd?
Because only*（d + i）We don't know if we're going out of bounds, and for some reason we may not find a 0 even after traversing the buffer.
This time it's time for us to add another(p + i) < sMake sure you don't cross the line.
in the event thatp + i >= sindicating transgression of boundaries.i == 0means there is no data, should return nilptr

    __mv_fw_block(this, i + 1);
    return reinterpret_cast<const char*>(d);

Finally, adjust the read and write pointers of the__mv_fw_block(this, i + 1);bei + 1bytes instead ofiReasons.
Assuming string = "abc" (ending in the '\0' tag), the while condition is still satisfied when i = 2, i++→i = 3
It's a good idea to set pos = 0 before reading in the string, when__mv_fw_blockwill pos += (i + 1) → pos = 4
Indicates that the next read/write operation should start from pos = 4. That is, it skips the first position after "abc" and '\0'.
in the event that__mv_fw_block(this, i + 1);beibytes, it means that the next read/write operation should start from '\0' and will change '\0' resulting in the inability of the string to be recognized.

5.5 Read the data and store it in the specified buffer.

void buffer::get(bufptr& dst) {
    size_t sz = dst->size() - dst->pos();
    ::memcpy(dst->data(), data(), sz);
    __mv_fw_block(this, sz);
}

First calculate the number of bytes to copysz,dst->size() - dst->pos();Indicates that it is all the bytes left in dst
::memcpy(dst->data(), data(), sz); Counting back from data()szBytes of data are copied to dst.
Adjusting read and write pointers__mv_fw_block(this, sz);

6. Summary

1. Before determining the size of the block, we cleverly use the highest bit to quickly determine through bitwise operations.
The two most important operations areread together withWrite (put), we can see the idea of reverse order storage when faced with multi-byte data.
3. Faced with a situation where a smart pointer needs to be converted, we should first take aget()to get the original pointer and then pass thereinterpret_castConversion is performed.