Location>code7788 >text

A 10,000-word article that takes you deep into the underlying data structures of Redis

Popularity:470 ℃/2024-11-21 08:37:50

Data Structures of the Redis Database

Redis key-value pairs in thekey is a string object.but (not)value is the Redis data type., which can be a String, or a List, Hash, Set, or Zset datatype.

It's actually the underlying Redis that uses aglobal hash tableKeeping all key-value pairs, the biggest benefit of a hash table is the O(1) time complexity to quickly find the key-value pairs. A hash table is actually an array, and the elements of the array are called hash buckets.

  • redisDb structure, representing the structure of the Redis database, which holds pointers to the dict structure; // there are 16 databases by default
  • dict structure, which holds 2 hash tables that are normally used with theHash table 1Hash table 2Only used when rehashing;
  • The ditctht structure, which represents the structure of a hash table, holds an array of hash tables, where each element of the array is a pointer to a hash table node structure (dictEntry);
  • The dictEntry structure, which represents the structure of a hash table node, holds thevoid* key cap (a poem)void* value pointer, key points to a String object, and value refers to several Redis data types.
struct redisServer {
   //...
    redisDb *db; //...
//...
int dbnum; //default 16
}

typedef struct redisDb {
    dict *dict; // Global hash table.
    //...
} redisDb.

struct dict {
   //...
     dictht ht[2]; //two dictEntry, one starts empty, used during rehash migration
    //...
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
};

typedef struct dictht {
    dictEntry **table; // Array of hash table nodes.
    unsigned long size; // size of the hash table
    unsigned long sizemask; // hash table size mask, used to calculate the index value, always equal to size-1
    unsigned long used; // the number of nodes already in the hash table
} dictht.

struct dictEntry { // concrete object
    void *key; // key
    union {
        void *val; //key
        uint64_t u64; int64_t s64; //key
        int64_t s64; double d; void *val; uint64_t u64; int64_t s64; //key
        double d; } v; //value
    } v; //value
    struct dictEntry *next; //pointer to next node
    void *metadata[]; }
};

void * key and void * value pointers point to the Redis object. Redis has a global hash table, key is String, value is a different type of object, if Java, it can be used directly Map<String,Object> to the generic representation. Redis directly implemented by the C language , so the specific each object by the redisObject structure to represent the specific type, with type to represent the specific type, as follows:

typedef struct redisObject {
    unsigned type: 4; // object type
    unsigned storage: 2; // REDIS_VM_MEMORY or REDIS_VM_SWAPPING
    unsigned encoding: 4; // encoding used by the object
    unsigned lru: 22; // lru time (relative to )
    int refcount; // Reference count of the object.
    void *ptr; // The underlying implementation data structure pointing to the object.
} robj.

  • type, which identifies what type of object it is (String, List, Hash, Set, and Zset);
  • encoding, which identifies which underlying data structure the object uses;
  • ptr, a pointer to the underlying data structure.

As shown in the figure, the correspondence between Redis data types (also called Redis objects) and the underlying data structures is shown:

  • By default hash is stored using listpack, when the number of saved field-values is greater than 512 or the value of a single field is greater than 64 bytes, it is changed to hashtable.
  • By default zSet uses listpack as the storage structure, when the number of elements in the set is greater than or equal to 128 or the number of bytes of a single value is greater than or equal to 64, the storage structure will be modified to skiplist.

These values are all modifiable and there is no need to memorize them; in the

hash-max-listpack-entries 512
hash-max-listpack-value 64

zset-max-listpack-entries 128
zset-max-listpack-value 64

SDS

Simple Dynamic String, Simple Dynamic String

Defects in the C language

Get string length complexity is O(n)

In C, the end position of a character array is denoted by "\0", meaning the end of the string.
Therefore c language to obtain the length of the string function strlen, is to traverse the character array of each character, encountered characters for "\0" after the traversal will stop, and then return to the number of characters have been counted, that is, the length of the string, so the complexity of O(n)

Strings can only hold text data

The end position of the character array is indicated by "\0".
Therefore, except for the end of the string, the string can't contain "\0" character inside, otherwise the first "\0" character read by the program will be mistaken as the end of the string, and this restriction makes the C language string can only save text data, and it can't This restriction makes C strings only hold text data, not binary data such as images, audio, and video culture.

Buffer overflow possible

C language strings are not recorded in their own buffer size, so the strcat function assumes that the programmer in the execution of this function, has been allocated for the dest enough memory to accommodate all the contents of the src string, and once this assumption does not hold true, a buffer overflow will occur will probably cause the program to terminate.

SDS structure

  • len, which records the length of the string. This way, when you get the length of the string, you only need to return the value of this member variable, and the time complexity is only O(1).
  • alloc, the length of the space allocated to the character array. This way, when modifying a string, the remaining space can be calculated by alloc - len, which can be used to determine whether the space meets the modification requirements, and if it does not, the space of the SDS will be automaticallyexpansionto the size required to perform the modification before the actual modification operation is performed. This way no buffer overflows occur
  • flags, which are used to represent different types of SDS. 5 types are designed, namely sdshdr5, sdshdr8, sdshdr16, sdshdr32 and sdshdr64.
  • buf[], an array of characters, is used to hold the actual data. Not only can it hold strings, but also binary data.

The SDS API uses binary to process the data that SDS stores in buf[], allowing Redis to store not only text data, but also binary data in any format.

Dynamic in SDS actually refers to dynamic capacity expansion

hisds hi_sdsMakeRoomFor(hisds s, size_t addlen)
{
    ... ...
    // s currently has enough space left, no need to expand, just return it
    if (avail >= addlen)
        return s; // Get the current length of s.
    // Get the current length of s
    len = hi_sdslen(s);
    sh = (char *)s - hi_sdsHdrSize(oldtype);
    // the minimum length needed for s after expansion
    newlen = (len + addlen);
    // the size needed to allocate new space for s based on the new length
    if (newlen < HI_SDS_MAX_PREALLOC)
        //new length < HI_SDS_MAX_PREALLOC then allocate the required space * 2
// Default definition of HI_SDS_MAX_PREALLOC is (1024*1024) i.e. 1M.
        newlen *= 2; else
    newlen *= 2; else
        //otherwise, the allocation length is the current length + 1MB
        newlen += HI_SDS_MAX_PREALLOC; //otherwise, allocate the length as current length + 1MB.
       ...
}

// #define HI_SDS_MAX_PREALLOC (1024*1024)
  • If the required sds length is less than 1 MB, then the final expansion is performed as a doubled expansion, i.e. 2x newlen
  • If the required sds length is more than 1 MB, then the final expansion length should be newlen + 1MB.

bidirectional linked list

Chained tables in Redis are bi-directional chained table structures

typedef struct listNode {
    //predecessor node
    struct listNode *prev;
    //backplane node
    struct listNode *next;
    //The value of the node
    void *value;
} listNode;

However, Redis encapsulates the list structure on top of the listNode structure; similar to how Java defines a Node node and encapsulates a LinkedList on top of it.

typedef struct list {
// chain table head node
    listNode *head.
    //Tail node
    listNode *tail; //node value copy function
    // node value copy function
    void *(*dup)(void *ptr).
    //node value release function
    void (*free)(void *ptr).
    //node value comparison function
    int (*match)(void *ptr, void *key); //node value comparison function.
    //number of nodes in the chain table
    unsigned long len; } list; //node value comparison function; //node value comparison function; //node value comparison function
} list.

The advantages and disadvantages of Redis' linked tables are the same as those of chained tables

Compression List

Compressed lists, developed by Redis to conserve memory, are sequential data structures consisting of contiguous blocks of memory, similar to arrays.

The compression list has four important fields:

  • zlbytes, records the number of bytes of memory occupied by the entire compressed list;
  • zltail, record how many bytes the "tail" node of the compressed list is from the start address, that is, the offset of the tail of the list;
  • zllen, which records the number of nodes contained in the compression list;
  • zlend, mark the end point of the compressed list, fixed value 0xFF (decimal 255).

In a compressed list, if you want to find and locate the first element and the last element, you can directly locate them by the length of the three fields (zllen) in the table header, and the complexity is O(1). When searching for other elements, it is not so efficient, and you can only search for them one by one, and the complexity is O(N), so compressed lists are not suitable for storing too many elements.

The compression list node (entry) is composed as follows:

field in the entry:

  • prevlen, which records thePrevious nodelength, which is intended to enable back-to-front traversal;
  • encoding, which records the actual data of the current node'sType and length, there are two main types: strings and integers.
  • data, records the actual data of the current node, the type and length are determined by encoding;

When inserting data into a compressed list, the compressed list will use different sizes of space for the information stored in the prevlen and encoding elements depending on whether the data type is a string or an integer and the size of the data.Design Ideas for Different Space Size Allocations Based on Data Size and TypeThis is exactly what Redis uses to save memory.

The prevlen attribute in each node in the compressed list records the length of the previous node, and the amount of space in the prevlen attribute is related to the value of the previous node length, for example:

  • If the length of the previous node is less than 254 bytes, then the prevlen attribute requires 1 byte of space to hold this length value;
  • If the length of the previous node is greater than or equal to 254 bytes, then the prevlen attribute requires 5 bytes of space to hold this length value;

chain update (computing)

Compressed list to add a new element or modify an element, if the space is not enough, compressed list occupied memory space will need to be reallocated. Since prevlen is the length of the previous node, when the newly inserted element is larger, then it may lead to subsequent elements of prevlen occupied space are changed (for example, 1 byte programming 5 bytes), if the current node's prevlen attribute from 1 byte to 5 bytes leads to the next node's prevlen attribute is also larger, then it may cause "chain update" problem, resulting in each element space to be reallocated, resulting in a decline in the performance of accessing the compressed list. Chain update" problem, resulting in each element of the space to be reallocated, resulting in access to the compressed list performance degradation.

Advantages and disadvantages of compressed lists

Pros:

  • according toDifferent space size allocations for data size and typeto compress memory space
  • Compressed list is a memory compact data structure, occupies a piece of contiguous memory space, not only can utilize the CPU cache (the memory address of the linked list itself is not contiguous, so it can not utilize the CPU cache), but also will be for the different lengths of the data, encoded accordingly, this method can effectively save the memory overhead.

Drawbacks:

  • You can't save too many elements or the query will be less efficient;
  • When adding or modifying an element, the memory space occupied by the compressed list needs to be reallocated, which may even trigger a chain update problem.

Currently replaced by listpack

hash table

This structure is used by both the global hash table and the objects of the underlying hash table.

Node structure in the hash table

struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;     /* Next entry in the same hash bucket. */
    void *metadata[];          
};

The dictEntry structure contains not only pointers to keys and values, but also a pointer to the next hash table node, which can link multiple key-value pairs with the same hash value as a way of solving the problem of hash conflicts, which is chained hashing.

You can read this article about resolving hash conflicts:Three ways to resolve hash conflicts

And redis isFirst through the zipper methodSettlement.And then by rehashto solve the hash conflict problem, i.e., the re-hash method

rehash

Redis defines a dict structure which has two hash tables (ht_table[2]) defined in it.

struct dict {
   //...
    dictEntry **ht_table[2]; //bothdictEntry,One starts with a blank.,rehashMigration is performed using the
    //...
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
};

During the normal service request phase, data that is inserted, is written to theHash table 1at this timeHash table 2 No space is allocated. As the data grows (as determined by the load factor), a rehash operation is triggered, which is a three-step process as follows:

in the event thatHash table 1The amount of data is very large, so there is a lot of work to be done before migrating to theHash table 2If you are using Redis, you may be unable to serve other requests because of the large number of data copies involved. Therefore, redis employs a progressive rehash

The incremental rehash procedure is as follows:

  1. in advanceHash table 2Allocation of space;
  2. During a rehash, every time a hash table element is added, deleted, looked up, or updated, Redis will sequentially move theHash table 1Migrate all key-values at index locations in theHash table 2Up;
  3. As the number of client-initiated requests for hash table operations is processed, eventually at some point in time theHash table 1to migrate all key-values of theHash table 2to complete the rehash operation.

This spreads the overhead of a one-time bulk data migration effort over multiple requests, avoiding the time-consuming operation of a one-time rehash.

During progressive rehash, there are two hash tables, so during progressive rehash, operations such as deletion, lookup, and update of hash table elements are performed in these two hash tables. For example, during progressive rehash, if you look up the value of a key, you will first look up the value of the key in theHash table 1Inside the search, if it is not found, it will continue to theHash table 2 It will be found inside the key-value. When a new key-value is added, it is saved to theHash table 2Inside, andHash table 1then no more add operations are performed, which ensures that theHash table 1The number of key-values will only decrease, and as the rehash operation completes, the finalHash table 1It will become an empty meter.

The hash table lookup process:

dictEntry *dictFind(dict *d, const void *key)
{
    dictEntry *he;
    uint64_t h, idx, table;

    if (dictSize(d) == 0) return NULL; /* dict is empty */
    if (dictIsRehashing(d)) _dictRehashStep(d);//Check to see if you are progressively rehash,If it is,thenrehashfirst step
    h = dictHashKey(d, key);//countkey(used form a nominal expression)hash(be) worth
	//哈希表元素(used form a nominal expression)删除、find、Operations such as updates are performed on both hash tables
    for (table = 0; table <= 1; table++) {
        idx = h & DICTHT_SIZE_MASK(d->ht_size_exp[table]);
        he = d->ht_table[table][idx];
        while(he) {
            void *he_key = dictGetKey(he);
            if (key == he_key || dictCompareKeys(d, key, he_key))
                return he;
            he = dictGetNext(he);
        }
        if (!dictIsRehashing(d)) return NULL;
    }
    return NULL;
}

The key lies in the hash table insertion will go to check are being Rehash, if not, then to the 0 hash table insertion; if so, then directly to the 1 hash table insertion, because if the Rehash is also inserted into the 0 hash table, then ultimately to rehash to the 1 hash table

int htidx = dictIsRehashing(d) ? 1 : 0;

Trigger conditions for rehash

Load factor = number of hash table saved nodes/hash table size

There are two main conditions that trigger a rehash operation:

  • A rehash operation is performed when the load factor is greater than or equal to 1 and Redis is not executing the bgsave command or the bgrewiteaof command, i.e., no RDB snapshots are being taken or no AOF rewrites are being performed.
  • When the load factor is greater than or equal to 5, this means that the hash conflict is very serious, regardless of whether there is no RDB snapshot or AOF rewrite, will force the rehash operation

set of integers (math.)

When a Set object contains only integer-valued elements and the number of elements is not large, the data structure of an integer set is used as the underlying implementation.

typedef struct intset {
    uint32_t encoding.
   // number of elements in the set
    uint32_t length.
    // Array holding the elements
    int8_t contents[]; //Array holding the elements
} intset.

The container that holds the elements is a contents array. Although contents is declared as an array of type int8_t, it does not actually hold any elements of type int8_t. The true type of the contents array depends on the value of the encoding attribute in the intset structure. For example:

  • If the value of the encoding attribute is INTSET_ENC_INT16, then contents is an array of type int16_t, and each element of the array is of type int16_t;
  • If the value of the encoding attribute is INTSET_ENC_INT32, then contents is an array of type int32_t, and each element of the array is of type int32_t;
  • If the value of the encoding attribute is INTSET_ENC_INT64, then contents is an array of type int64_t, and every element in the array is of type int64_t;

Integer set upgrade

To add a new element to the set of integers, if the type of the new element (int32_t) is longer than the type of all the existing elements of the set (int16_t), the set of integers needs to be upgraded, i.e., the contents array needs to be expanded according to the type of the new element (int32_t) before the new element can be added to the set of integers. Of course, the process of upgrading should also maintain the order of the integer collection.

Benefits of upgrading the set of integers:
If you want an array to hold elements of type int16_t, int32_t, and int64_t at the same time, the easiest thing to do is to use an array of type int64_t directly. However, this will result in a waste of memory when and if the elements are all of type int16_t.

The main idea of using integer collections is to save memory overhead.

stopwatch

Jump tables have the advantage of being able to support node lookups with an average O(logN) complexity.

The jump table is an improvement on the chain table, implementing a "multi-layered" ordered chain table, which has the advantage of being able to read and locate data quickly.

typedef struct zskiplist {
// Facilitates access to the header and tail nodes of the jump table within O(1) time complexity;
    struct zskiplistNode *header, *tail.
// length of the jump table
    unsigned long length.
// Maximum number of levels in the jump table
    int level;
} zskiplist.

typedef struct zskiplistNode {
    // Element value of the Zset object
    sds ele; }
    // Element weights
    double score; //backward pointer to the previous node.
    // Backward pointer to the previous node, intended to facilitate access to the node from the tail node of the jump table, and to facilitate reverse-order lookups.
    struct zskiplistNode *backward.

    // Array of node levels, holds forward pointers and spans on each level.
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode.

The jump table structure is as follows:

The optimal ratio of the number of nodes in the two neighboring layers of the jump table is 2:1, and the lookup complexity can be reduced to O(logN).

Is the jump table in Redis a two-step by two-step jump?

If you use the method of adding new nodes or deleting nodes to adjust the jump table nodes to maintain the ratio of 2:1, it is obvious that it will bring extra overhead.

Jump table in the creation of the node, will generate a random number in the range of [0-1], if this random number is less than 0.25 (equivalent to the probability of 25%), then the number of layers is increased by 1 layer, and then continue to generate the next random number, until the random number of the result is greater than the end of the 0.25, and ultimately to determine the number of layers of the node. Because the random number takes a value in the range of [0,0.25) probability will not exceed 25%, so this also shows that the probability of adding a layer will not exceed 25%. In this way, when inserting a new node, only the pointers to the front and back nodes need to be modified, while the layers of other nodes do not need to be changed accordingly, thus reducing the complexity of the insertion operation.

// #define ZSKIPLIST_P 0.25
int zslRandomLevel(void) {
    static const int threshold = ZSKIPLIST_P*RAND_MAX;
    int level = 1; //initialize to one level index
    while (random() < threshold)
        level += 1;//add one level if random number is less than 0.25
//If level does not exceed the maximum number of levels, then return, otherwise return the maximum number of levels
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL.
}

Why don't you use AVL trees?

  • When doing a range lookup, a jump table is simpler than AVL, which traverses in middle order after finding the minimum value, while a jump table traverses directly backward.
  • Jump tables are simpler to implement; AVL insertion and deletion may require left-right operations, which brings additional overhead, while insertion and deletion of jump tables only require modifying the pointers of neighboring nodes, which makes the operation simpler.

listpack

listpack, which is intended to be an alternative to compressed lists, is best characterized as aEach node in the listpack no longer contains the length of the previous nodeFixed the problem of chained updates to the compressed list.

listpack adopts many of the good design features of compressed lists, such as still using a contiguous block of memory space to hold data compactly, and to save memory overhead, listpack nodes are the same as compressed listsUse different encoding methods to save data of different sizes

typedef struct {
    /* When string is used, it is provided with the length (slen). */
    unsigned char *sval;
    uint32_t slen;
    /* When integer is used, 'sval' is NULL, and lval holds the value. */
    long long lval;
} listpackEntry;
  • encoding, which defines the encoding type of the element, will encode integers and strings of different lengths;
  • data, the actual data stored;
  • len, the total length of encoding+data;

listpack does not have a field to record the length of the previous node in the compressed list. listpack only records the length of the current node, and when adding a new element to listpack, it will not affect the length of other nodes, thus avoiding the chain update problem of the compressed list.

quicklist

The current version of quicklist is actually a combination of a bi-directional linked list + a listpack, because a quicklist is a linked list, and each element of the linked list is a listpack.

typedef struct quicklist {
	//quicklistchained list header
    quicklistNode *head;
	//quicklistend of a linked list
    quicklistNode *tail;
	//possesslistpacksThe total number of elements in the
    unsigned long count; /* total count of all entries in all listpacks */
    //quicklistNodesordinal number
	unsigned long len; /* number of quicklistNodes */
    signed int fill : QL_FILL_BITS; /* fill factor for individual nodes */
    unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
    unsigned int bookmark_count: QL_BM_BITS;
    quicklistBookmark bookmarks[];
} quicklist;

typedef struct quicklistEntry {
    const quicklist *quicklist;
    quicklistNode *node;
	//directionallistpackpointers
    unsigned char *zi;
    unsigned char *value;
    long long longval;
    size_t sz;
    int offset;
} quicklistEntry;

typedef struct quicklistNode {
	//previous onequicklistNode
    struct quicklistNode *prev;
	//the next onequicklistNode
    struct quicklistNode *next;
    unsigned char *entry;
	//listpackbyte size
    size_t sz; /* entry size in bytes */
	//listpackNumber of elements in the list
    unsigned int count : 16; /* count of items in listpack */
    unsigned int encoding : 2; /* RAW==1 or LZF==2 */
    unsigned int container : 2; /* PLAIN==1 or PACKED==2 */
    unsigned int recompress : 1; /* was this node previous compressed? */
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    unsigned int dont_compress : 1; /* prevent compression of entry that will be used later */
    unsigned int extra : 9; /* more bits to steal for future usage */
} quicklistNode;

BitMap

For more details, please seebitmapThe source code section will not be described in detail

HyperLogLog

The HyperLogLog algorithm is a very clever algorithm for approximating the number of elements in a sea of de-duplicated elements. It internally maintains 16384 buckets to keep track of the number of elements in their respective buckets. When an element arrives, it is hashed to one of the buckets, affecting the count value of that bucket with a certain probability. Because it is a probabilistic algorithm, the counts of individual buckets are not accurate, but when all the bucket counts are summed up, the result is very close to the real count.

In order to understand the HyperLogLog algorithm, let's simplify its counting logic. Because it is a de-emphasis count, if it is accurate de-emphasis, certainly need to use the set collection, use the set to record all the elements, and then use the scard command to get the size of the set to get the total number of counts. Because there are so many elements, a single set would be especially large, so the set is broken up into 16384 smaller sets. When an element arrives, the hash algorithm assigns the element to one of the minisets, and the same element is always hashed into the same miniset. This way the total count is the sum of the sizes of all the minisets. Exact counting in this way allows for the addition of elements as well as the subtraction of elements.

There is no obvious benefit to breaking up the set, as the total memory footprint is not reduced.HyperLogLog is certainly not the algorithm to do this, it needs to optimize this small set, compressing its storage space to make it very tiny.The space taken up by each bucket in the HyperLogLog algorithm is actually only 6 bits, which naturally can't hold The HyperLogLog algorithm uses only 6 bits of space per bucket, which naturally can't hold all the elements in the bucket, but records the logarithm of the number of elements in the bucket.

To illustrate exactly what this logarithmic value is, let's consider a little problem. A random integer value, the probability that this integer has a 0 on the tail is 50%, either a 0 or a 1. Similarly, the probability that there are two zeros on the tail is 25%, the probability that there are three zeros is 12.5%, and so on, and the probability that there are k zeros is 2^(-k). If we randomize a large number of integers, the number of integers is not known, but we record the maximum number of consecutive zeros at the end of an integer, K. We can approximate the number of integers from this K. This number is 2^K.

Of course the result is very inaccurate, because you might then randomize a very large number of integers, but the maximum number of consecutive zeros at the end, K, doesn't change, and the estimate is still 2^K. You might think that it would be nice if K were a floating point number, so that each time you randomize a new element, it could go up just a little bit, and then the estimate would be a lot more accurate.

HyperLogLog allocates 16384 buckets, and then averages the maximum number of buckets, K#, to get an average end-zero maximum number, K#, which is a floating-point number, so using the averaged 2^K# to estimate the total number of elements is relatively accurate. However, this is just a simplification of the algorithm, and the real algorithm has many correction factors, which will not be precisely described here because the math theory involved is too extensive.

Let's take a look at the specific implementation of the Redis HyperLogLog algorithm. We know that a HyperLogLog actually takes up about 13684 * 6bit / 8 = 12k bytes. But when the count is small, most of the buckets will have a count value of zero. If too many bytes in the 12k bytes are zeros, then this space can be saved appropriately.Redis uses sparse storage in the case of small counts, and the space occupied by sparse storage is much smaller than 12k bytes. The opposite of sparse storage is dense storage, which takes up a constant 12k bytes.

dense storage structure

Whether it's sparse or dense storage, Redis internally uses a string bitmap to store the counts of all the buckets in HyperLogLog. The structure of dense storage is very simple: a bitmap of 16384 consecutive 6-bit strings.

So given a bucket number, how do you get its 6bit count? The 6bit may be inside a byte, or it may be across a byte boundary. We need to shift and splice one or both of these bytes appropriately to get the count value.

Assuming that the bucket number is idx, the starting byte position offset of this 6bit count value is denoted by offset_bytes, and its starting bit position offset in this byte is denoted by offset_bits. We have

offset_bytes = (idx * 6) / 8
offset_bits = (idx * 6) % 8

The former is the quotient and the latter is the remainder. For example, bucket 2 has a byte offset of 1, which is the 2nd byte. Its bit offset is 4, which means that the 5th bit of the 2nd byte starts the count of bucket 2. Note that the byte bit order is left-low right-high, whereas normally we use bytes that are left-high and right-low, so we need to invert this in our minds.

If offset_bits is less than or equal to 2, then the 6 bits are inside a byte and the count value can be obtained directly using the following expression val

val = buffer[offset_bytes] >> offset_bits # shift to the right

If offset_bits is greater than 2, then it crosses a byte boundary, which requires splicing two byte bit fragments.

# low_val = buffer[offset_bytes] >> offset_bits
low_val = buffer[offset_bytes] >> offset_bits
# low_val = buffer[offset_bytes] >> offset_bits
low_bits = 8 - offset_bits
# Splicing, keeping the lower 6 bits
val = (high_val << low_bits | low_val) & 0b111111

The Redis source code below is a bit more obscure, though, in that it seems to only consider the case of crossing byte boundaries. This is because the value of high_val in the code above is zero if the 6bit is within a single byte, so this code takes care of both single and double bytes.

// Get the count value of the specified bucket
#define HLL_DENSE_GET_REGISTER(target,p,regnum) do { \
    uint8_t *_p = (uint8_t*) p; \
    unsigned long _byte = regnum*HLL_BITS/8; \
    unsigned long _fb = regnum*HLL_BITS&7; \ # %8 = &7
    unsigned long _fb8 = 8 - _fb; \
    unsigned long b0 = _p[_byte]; \
    unsigned long b1 = _p[_byte+1]; \
    target = ((b0 >> _fb) | (b1 << _fb8)) & HLL_REGISTER_MAX; \
} while(0)

// Set the count value for the specified bucket
#define HLL_DENSE_SET_REGISTER(p,regnum,val) do { \
    uint8_t *_p = (uint8_t*) p; \
    unsigned long _byte = regnum*HLL_BITS/8; \
    unsigned long _fb = regnum*HLL_BITS&7; \
    unsigned long _fb8 = 8 - _fb; \
    unsigned long _v = val; \
    _p[_byte] &= ~(HLL_REGISTER_MAX << _fb); \
    _p[_byte] |= _v << _fb; \
    _p[_byte+1] &= ~(HLL_REGISTER_MAX >> _fb8); \
    _p[_byte+1] |= _v >> _fb8; \
} while(0)

sparse storage structure

Sparse storage applies to the case where many count values are zero. The following figure represents the state of general sparse storage count values.

When multiple consecutive buckets have zero counts, Redis uses a byte to indicate how many buckets will have zero counts: 00xxxxxxx. The two zeros prefixed to the next 6-bit integer value plus one is the number of zero counters. Note that the reason for the one is that the number of zeros is meaningless if it is zero. For example, 00010101 means 22 consecutive zero counters. 6bit can only represent up to 64 consecutive zero counters, so Redis has designed more than 64 consecutive zero counters, which is represented by two bytes: 01xxxxxxx yyyyyyyyy, and the next 14 bits can represent up to 16384 consecutive zero counters. This means that the initial state of the 16384 buckets in the HyperLogLog data structure, where all counters are zero values, can be represented directly by 2 bytes.

If the count value of consecutive buckets is non-zero, then a byte like 1vvvvvvvxx is used. The middle 5 bits represent the count value and the last 2 bits represent the consecutive buckets. It means that (xx + 1) consecutive count values are (vvvvvvv + 1). For example, 10101011 means 4 consecutive count values are 11. Note that both values need to be increased by 1, because any one of them is zero means that the count value is zero, so it should be expressed in the form of a zero count value. Note that a count value can only be expressed up to 32, whereas a single count value in HyperLogLog's dense storage is expressed as a 6-bit value and can be expressed up to 63. When a count value in the sparse storage needs to be adjusted to a value greater than 32, Redis immediately converts HyperLogLog's storage structure, converting the sparse storage to dense storage.

img

Redis gives each of the above three byte representations a single directive in order to easily express sparse storage.

  1. ZERO:len Single byte representation00[len-1]The maximum number of zero count values is 64 consecutively.
  2. VAL:value,len Single byte representation1[value-1][len-1],progression len has the value of value estimated value
  3. XZERO:len Double-byte representation01[len-1]The maximum number of consecutive zero count values is 16,384.
#define HLL_SPARSE_XZERO_BIT 0x40 /* 01xxxxxx */
#define HLL_SPARSE_VAL_BIT 0x80 /* 1vvvvvxx */
#define HLL_SPARSE_IS_ZERO(p) (((*(p)) & 0xc0) == 0) /* 00xxxxxx */
#define HLL_SPARSE_IS_XZERO(p) (((*(p)) & 0xc0) == HLL_SPARSE_XZERO_BIT)
#define HLL_SPARSE_IS_VAL(p) ((*(p)) & HLL_SPARSE_VAL_BIT)
#define HLL_SPARSE_ZERO_LEN(p) (((*(p)) & 0x3f)+1)
#define HLL_SPARSE_XZERO_LEN(p) (((((*(p)) & 0x3f) << 8) | (*((p)+1)))+1)
#define HLL_SPARSE_VAL_VALUE(p) ((((*(p)) >> 2) & 0x1f)+1)
#define HLL_SPARSE_VAL_LEN(p) (((*(p)) & 0x3)+1)
#define HLL_SPARSE_VAL_MAX_VALUE 32
#define HLL_SPARSE_VAL_MAX_LEN 4
#define HLL_SPARSE_ZERO_MAX_LEN 64
#define HLL_SPARSE_XZERO_MAX_LEN 16384

The above diagram can be represented using the command form as follows

Redis is conditioned to convert from sparse to dense storage:

  1. Any one of the count values changes from 32 to 33 because the VAL instruction can no longer accommodate it, and the maximum number of count values it can represent is 32.
  2. The total number of bytes occupied by the sparse storage exceeds 3000 bytes, a threshold that can be adjusted with the hll_sparse_max_bytes parameter.

counting cache

As mentioned earlier, the total value represented by HyperLogLog is calculated by summing and averaging the counts of 16384 buckets and then calculating it based on the factor correction formula. It needs to traverse all the buckets to get this value, and involves a lot of floating point operations in the middle. This is a relatively large amount of computation.

So Redis uses an extra field to cache the totals, a 64-bit field where the highest bit is 1 to indicate whether the value has expired, and if it's 0, then the remaining 63 bits are the count value.

When the count value of any of the buckets in HyperLogLog changes, the count cache is set to expire, but the calculation is not triggered immediately. Instead, it waits until the user displays a call to the pfcount command before triggering a recalculation to flush the cache. Cache refreshing requires iterating through 16384 buckets of counts to reconcile and average them for dense storage, but it is not as computationally intensive for sparse storage. This means that a large computation is only possible when the count value is large. On the other hand, if the counts are large, then most pfadd operations will not cause the counts in the buckets to change at all.

This means that frequent calls to the pfcount command in a highly variable HLL counter may have a small performance issue. This performance concern is also mentioned in Redis author antirez's blog. However, the author did some careful stress testing and found that this is not a concern, as the average time complexity of the pfcount instruction is O(1).

After this change even trying to add elements at maximum speed using a pipeline of 32 elements with 50 simultaneous clients, PFCOUNT was able to perform as well as any other O(1) command with very small constant times.

source code analysis

The next step is to look at the exact flow of the pfadd and pfcount commands through the source code. The first thing we need to understand before this is the HyperLogLog header structure and the steps to create a HyperLogLog object.

HyperLogLog header structure

struct hllhdr {
    char magic[4];      /* "HYLL" */
    uint8_t encoding;   /* HLL_DENSE or HLL_SPARSE. */
    uint8_t notused[3]; /* Reserved for future use, must be zero. */
    uint8_t card[8];    /* Cached cardinality, little endian. */
    uint8_t registers[]; /* Data bytes. */
};

Creating HyperLogLog Objects

#define HLL_P 14 /* The greater is P, the smaller the error. */
#define HLL_REGISTERS (1<<HLL_P) /* With P=14, 16384 registers. */
#define HLL_SPARSE_XZERO_MAX_LEN 16384


#define HLL_SPARSE_XZERO_SET(p,len) do { \
    int _l = (len)-1; \
    *(p) = (_l>>8) | HLL_SPARSE_XZERO_BIT; \
    *((p)+1) = (_l&0xff); \
} while(0)

/* Create an HLL object. We always create the HLL using sparse encoding.
 * This will be upgraded to the dense representation as needed. */
robj *createHLLObject(void) {
    robj *o;
    struct hllhdr *hdr;
    sds s;
    uint8_t *p;
    int sparselen = HLL_HDR_SIZE +
                    (((HLL_REGISTERS+(HLL_SPARSE_XZERO_MAX_LEN-1)) /
                     HLL_SPARSE_XZERO_MAX_LEN)*2);
    int aux;

    /* Populate the sparse representation with as many XZERO opcodes as
     * needed to represent all the registers. */
    aux = HLL_REGISTERS;
    s = sdsnewlen(NULL,sparselen);
    p = (uint8_t*)s + HLL_HDR_SIZE;
    while(aux) {
        int xzero = HLL_SPARSE_XZERO_MAX_LEN;
        if (xzero > aux) xzero = aux;
        HLL_SPARSE_XZERO_SET(p,xzero);
        p += 2;
        aux -= xzero;
    }
    serverAssert((p-(uint8_t*)s) == sparselen);

    /* Create the actual object. */
    o = createObject(OBJ_STRING,s);
    hdr = o->ptr;
    memcpy(hdr->magic,"HYLL",4);
    hdr->encoding = HLL_SPARSE;
    return o;
}

Here sparselen=HLL_HDR_SIZE+2, because the initialization defaults to a count value of 0 for all buckets. the rest of the process is not difficult to understand, the storage method used is the sparse storage we mentioned earlier, and the object created is essentially a string object, which is why the string command can manipulate HyperLogLog objects.

PFADD command

/* PFADD var ele ele ele ... ele => :0 or :1 */
void pfaddCommand(client *c) {
    robj *o = lookupKeyWrite(c->db,c->argv[1]);
    struct hllhdr *hdr;
    int updated = 0, j;

    if (o == NULL) {
        /* Create the key with a string value of the exact length to
         * hold our HLL data structure. sdsnewlen() when NULL is passed
         * is guaranteed to return bytes initialized to zero. */
        o = createHLLObject();
        dbAdd(c->db,c->argv[1],o);
        updated++;
    } else {
        if (isHLLObjectOrReply(c,o) != C_OK) return;
        o = dbUnshareStringValue(c->db,c->argv[1],o);
    }
    /* Perform the low level ADD operation for every element. */
    for (j = 2; j < c->argc; j++) {
        int retval = hllAdd(o, (unsigned char*)c->argv[j]->ptr,
                               sdslen(c->argv[j]->ptr));
        switch(retval) {
        case 1:
            updated++;
            break;
        case -1:
            addReplySds(c,sdsnew(invalid_hll_err));
            return;
        }
    }
    hdr = o->ptr;
    if (updated) {
        signalModifiedKey(c->db,c->argv[1]);
        notifyKeyspaceEvent(NOTIFY_STRING,"pfadd",c->argv[1],c->db->id);
        ++;
        HLL_INVALIDATE_CACHE(hdr);
    }
    addReply(c, updated ?  : );
}

PFADD command will first determine whether the key exists or not, if not, then create a new HyperLogLog object; if it exists, it will call isHLLObjectOrReply() function to check whether this object is a HyperLogLog object or not, the checking method is mainly to check whether the magic number is correct or not, whether the storage structure is correct or not, and whether the length of header structure is correct or not, and so on. The checking method is mainly to check whether the magic number is correct, the storage structure is correct and the length of the header structure is correct.

After everything is in place, only then can you call the hllAdd() function to add elements. hllAdd function is very simple, just based on the storage structure to determine whether you need to call the hllDenseAdd() function or the hllSparseAdd() function.

The dense storage structure simply compares the old and new count values and replaces them if the new count value is greater than the on count value.

And the sparse storage structure is a bit more complicated:

  1. Determine whether it needs to be adjusted to a dense storage structure, if not proceed, otherwise adjust it to a dense storage structure first and then perform the add operation
  2. We need to locate the byte segment to be modified first, and calculate whether the range of buckets represented by each segment includes the bucket to be modified by looping through the
  3. After locating the bucket, if the bucket is already a VAL and the count value is greater than the current count value to be added, return 0. If it is less than the current count value, update it
  4. If it's a ZERO and the length is 1, then you can just replace it with a VAL and set the count value
  5. If either of these is not the case, the existing storage will need to be split up

PFCOUNT command

/* PFCOUNT var -> approximated cardinality of set. */
void pfcountCommand(client *c) {
    robj *o;
    struct hllhdr *hdr;
    uint64_t card;

    /* Case 1: multi-key keys, cardinality of the union.
     *
     * When multiple keys are specified, PFCOUNT actually computes
     * the cardinality of the merge of the N HLLs specified. */
    if (c->argc > 2) {
        uint8_t max[HLL_HDR_SIZE+HLL_REGISTERS], *registers;
        int j;

        /* Compute an HLL with M[i] = MAX(M[i]_j). */
        memset(max,0,sizeof(max));
        hdr = (struct hllhdr*) max;
        hdr->encoding = HLL_RAW; /* Special internal-only encoding. */
        registers = max + HLL_HDR_SIZE;
        for (j = 1; j < c->argc; j++) {
            /* Check type and size. */
            robj *o = lookupKeyRead(c->db,c->argv[j]);
            if (o == NULL) continue; /* Assume empty HLL for non existing var.*/
            if (isHLLObjectOrReply(c,o) != C_OK) return;

            /* Merge with this HLL with our 'max' HHL by setting max[i]
             * to MAX(max[i],hll[i]). */
            if (hllMerge(registers,o) == C_ERR) {
                addReplySds(c,sdsnew(invalid_hll_err));
                return;
            }
        }

        /* Compute cardinality of the resulting set. */
        addReplyLongLong(c,hllCount(hdr,NULL));
        return;
    }

    /* Case 2: cardinality of the single HLL.
     *
     * The user specified a single key. Either return the cached value
     * or compute one and update the cache. */
    o = lookupKeyWrite(c->db,c->argv[1]);
    if (o == NULL) {
        /* No key? Cardinality is zero since no element was added, otherwise
         * we would have a key as HLLADD creates it as a side effect. */
        addReply(c,);
    } else {
        if (isHLLObjectOrReply(c,o) != C_OK) return;
        o = dbUnshareStringValue(c->db,c->argv[1],o);

        /* Check if the cached cardinality is valid. */
        hdr = o->ptr;
        if (HLL_VALID_CACHE(hdr)) {
            /* Just return the cached value. */
            card = (uint64_t)hdr->card[0];
            card |= (uint64_t)hdr->card[1] << 8;
            card |= (uint64_t)hdr->card[2] << 16;
            card |= (uint64_t)hdr->card[3] << 24;
            card |= (uint64_t)hdr->card[4] << 32;
            card |= (uint64_t)hdr->card[5] << 40;
            card |= (uint64_t)hdr->card[6] << 48;
            card |= (uint64_t)hdr->card[7] << 56;
        } else {
            int invalid = 0;
            /* Recompute it and update the cached value. */
            card = hllCount(hdr,&invalid);
            if (invalid) {
                addReplySds(c,sdsnew(invalid_hll_err));
                return;
            }
            hdr->card[0] = card & 0xff;
            hdr->card[1] = (card >> 8) & 0xff;
            hdr->card[2] = (card >> 16) & 0xff;
            hdr->card[3] = (card >> 24) & 0xff;
            hdr->card[4] = (card >> 32) & 0xff;
            hdr->card[5] = (card >> 40) & 0xff;
            hdr->card[6] = (card >> 48) & 0xff;
            hdr->card[7] = (card >> 56) & 0xff;
            /* This is not considered a read-only command even if the
             * data structure is not modified, since the cached value
             * may be modified and given that the HLL is a Redis string
             * we need to propagate the change. */
            signalModifiedKey(c->db,c->argv[1]);
            ++;
        }
        addReplyLongLong(c,card);
    }
}

If you want to calculate the base of multiple HyperLogLogs, you need to merge multiple HyperLogLog objects. Here the merge method is to merge all the HyperLogLog objects into one object named max. max uses a dense storage structure. If the object being merged is also a dense storage structure, it loops through and compares the values of each count, and deposits the bigger one into max. If the object being merged also has a dense storage structure, the loop compares each count value and deposits the larger one into max.

If calculating the base of a single HyperLogLog object, it first determines whether the base cache in the object's header structure is valid, and if it is, it can be returned directly. If it is no longer valid, the base needs to be recalculated and the original cache modified, which is why the PFCOUNT command is not treated as a read-only command.

Recommended Tools

A recommendation for a tool to help understand the principles of HyperLogLog:/blog/, you can learn about it if you are interested.

GEO

Features:

  • The underlying use of geo is actually zset, where the value encoded based on latitude and longitude is used as the score of the set element
  • In a nutshell: converting 3D to 2D and then to 1D
  • Redis uses the widely used GeoHash encoding method, which is based on the principle of "bisecting intervals and encoding them as intervals".
  • By placing the longitude and latitude in odd and even bits, respectively, it is possible to make neighboring geohash encodings in a zset also geospatially neighboring

Similarities to zset

GEO's key needs to save each member and latitude and longitude, and latitude and longitude must be able to sort, so this structure and redis zset structure is actually quite similar, the only difference may lie in the zset only a score, while the GEO has a longitude and latitude, so a score to save the longitude and latitude can be used to solve the problem. In fact, redis does do the same thing, and the bottom of the GEO is actually in the results of the zset to do a layer of encapsulation, so according to the strict sense of the GEO is not a new data type of redis.

GEO's hash encoding method

In order to efficiently compare latitude and longitude, Redis employs the widely used GeoHash encoding method, which is based on the principle of "bisecting intervals and encoding intervals" GeoHash is a geolocation encoding method. GeoHash is a geolocation encoding method invented by Gustavo Niemeyer and . Invented by Gustavo Niemeyer and Morton in 2008, GeoHash encodes geolocation as a short string of letters and numbers. It is a hierarchical spatial data structure that subdivides space into grid-shaped buckets, which is one of the many applications of so-called z-order curves, usually space-filling curves.

When GeoHash encoding a set of latitude and longitude, the latitude and longitude are encoded separately, and then the latitude and longitude are combined into a final encoding.

First, look at the process of coding longitude and latitude individually. As an example, let's look at the longitude 116.37, 39.86

  • The first bisection operation divides the longitude interval [-180,180] into the left partition [-180,0) and the right partition [0,180], and at this time, the longitude value of 116.37 belongs to the right partition [0,180], so the coded value after the first bisection is denoted by one.
  • The longitude value 116.37 belongs to the interval [0,180], which is divided into [0,90) and [90, 180], and the value 116.37 still belongs to the right partition [90,180], so the value of the second partition is still one. At this time, the longitude value 116.37 still belongs to the right partition [90,180], so the code value after the second partition is still 1. When the third partition [90,180] is performed, the longitude value 116.37 falls into the left partition [90, 135) after the partition, so the code value after the third partition is 0.
  • According to this method, after 5 partitions, the longitude value 116.37 is located in the interval [112.5, 123.75], and the 5-digit coded value of the longitude value is obtained, i.e., 11010. This coding process is shown in the following table.

Latitude is coded in the same way as longitude, except that the range of latitude is [-90, 90]. The following table shows the coding process for the latitude value 39.86.

Just calculated the latitude and longitude (116.37, 39.86) of the respective code value is 11010 and 10111, cross-combination, the even number of bits is the longitude, the odd number of bits is the latitude, after the combination, the 0th bit is the longitude of the 0 bit 1, the 1st bit is the latitude of the 0 bit 1, the 2nd bit is the longitude of the 1 bit 1, the 3rd bit is the latitude of the 1 bit 0, and so on, you can get the final code value 1110011101, as shown below

With GeoHash encoding, a set of latitude and longitude (116.37, 39.86) that could not be represented by a single weight score can be represented by a single value of 1110011101, which can be saved as the weight score of the Sorted Set. Finally, according to the above binary value, to 5 bits as a group, for base32 encoding

The final result obtained is a set of geohash values for latitude and longitude.

Geolocation 2D to 1D

The above section describes the steps for calculating GeoHash, merely stating the what but not the why? Why encode the longitude and dimension separately? Why is it necessary to cross-combine two strings of codes for longitude and latitude into a single string of codes? This section attempts to answer this question.

As shown in the figure below, the result of the binary encoding is filled into the space, when the space is divided into four blocks, the order of the encoding is 00 in the lower left corner, 01 in the upper left corner, 10 in the lower right foot, and 11 in the upper right corner, i.e., it is similar to the curve of Z. When we recursively break down the individual blocks into smaller sub-blocks, the order of the encoding is self-similar (fractal), and each of the sub-blocks forms a Z-curve as well, and this type of curve is called a Peano space-filling curve.

The advantage of this type of space-filling curve is that it converts a two-dimensional space into a one-dimensional curve (in fact, a fractal dimension), and for the most part, encodings that are similar have similar distances. However, the biggest disadvantage of Peano space-filling curves is the abruptness, where some encodings are neighboring but the distances are far apart, as in the case of 0111 vs. 1000, where the encodings are neighboring but the distances are far apart.

img

So, to avoid query inaccuracy problem, we can query 4 or 8 squares around the square where the given latitude and longitude are located at the same time.

source code (computing)

Essentially redis geo is the encapsulation of geohash, specific geohash related code will not give you a list of (available on their own), to give you an introduction to the general process in the redis geo.

  • The geoadd command to see how geohash is stored in redis
/* GEOADD key [CH] [NX|XX] long lat name [long2 lat2 name2 ... longN latN nameN] */
void geoaddCommand(client *c) {
    int xx = 0, nx = 0, longidx = 2;
    int i;
    /* Parsing Optional Parameters */
    while (longidx &lt; c-&gt;argc) {
        char *opt = c-&gt;argv[longidx]-&gt;ptr;
        if (!strcasecmp(opt,&quot;nx&quot;)) nx = 1;
        else if (!strcasecmp(opt,&quot;xx&quot;)) xx = 1;
        else if (!strcasecmp(opt,&quot;ch&quot;)) {}
        else break;
        longidx++;
    }
    if ((c-&gt;argc - longidx) % 3 || (xx &amp;&amp; nx)) {
        /* Parses all latitude and longitude values andmember,and calibrate the number of */
            addReplyErrorObject(c,);
        return;
    }
    /* construct (sth abstract)zaddparameter array */
    int elements = (c-&gt;argc - longidx) / 3;
    int argc = longidx+elements*2; /* ZADD key [CH] [NX|XX] score ele ... */
    robj **argv = zcalloc(argc*sizeof(robj*));
    argv[0] = createRawStringObject(&quot;zadd&quot;,4);
    for (i = 1; i &lt; longidx; i++) {
        argv[i] = c-&gt;argv[i];
        incrRefCount(argv[i]);
    }
    /* in order to3The parameters are grouped into,Combine all the latitude and longitude andmemberInformation is parsed from the parameter list,put togetherzaddparameter array中 */
    for (i = 0; i &lt; elements; i++) {
        double xy[2];
        if (extractLongLatOrReply(c, (c-&gt;argv+longidx)+(i*3),xy) == C_ERR) {
            for (i = 0; i &lt; argc; i++)
                if (argv[i]) decrRefCount(argv[i]);
            zfree(argv);
            return;
        }
        /* Convert latitude and longitude coordinates intoscoretext */
        GeoHashBits hash;
        geohashEncodeWGS84(xy[0], xy[1], GEO_STEP_MAX, &amp;hash);
        GeoHashFix52Bits bits = geohashAlign52Bits(hash);
        robj *score = createObject(OBJ_STRING, sdsfromlonglong(bits));
        robj *val = c-&gt;argv[longidx + i * 3 + 2];
        argv[longidx+i*2] = score;
        argv[longidx+1+i*2] = val;
        incrRefCount(val);
    }
    /* convert intozaddFormat of parameters required for the command*/
    replaceClientCommandVector(c,argc,argv);
    zaddCommand(c);
}

The storage of geo is just zset wrapped in a layer, and the bottom of zet is the jump table, you can see the contents of the jump table above.

  • The general execution flow of georadius (with a lot of detailed code removed)
void georadiusGeneric(client *c, int srcKeyIndex, int flags) {
    robj *storekey = NULL;
    int storedist = 0; /* 0 for STORE, 1 for STOREDIST. */
    /* according tokeyLook for a counterpart.zojb */
    robj *zobj = NULL;
    if ((zobj = lookupKeyReadOrReply(c, c-&gt;argv[srcKeyIndex], )) == NULL ||
        checkType(c, zobj, OBJ_ZSET)) {
        return;
    }
    /* Parsing latitude and longitude values in requests */
    int base_args;
    GeoShape shape = {0};
    if (flags &amp; RADIUS_COORDS) {
    /*
     * Explanation of various mandatory parameters,Omit the detail code,Mainly parsing coordinate point information and radii
     */
    }
    /* Parses all optional parameters. */
    int withdist = 0, withhash = 0, withcoords = 0;
    int frommember = 0, fromloc = 0, byradius = 0, bybox = 0;
    int sort = SORT_NONE;
    int any = 0; /* any=1 means a limited search, stop as soon as enough results were found. */
    long long count = 0; /* Max number of results to return. 0 means unlimited. */
    if (c-&gt;argc &gt; base_args) {
    /*
     * Analysis of optional parameters,Omit the detail code
     */
    }
    /* Get all neighbor geohash boxes for our radius search
     * Gets all the9classifier for individual things or people, general, catch-all classifiergeo(math.) neighborhood (in a topological space) */
    GeoHashRadius georadius = geohashCalculateAreasByShapeWGS84(&amp;shape);
    /* establishgeoArrayStored Results List */
    geoArray *ga = geoArrayCreate();
    /* scanning9classifier for individual things or people, general, catch-all classifier区域center是否有满足条的点,If you have it, put it in.geoArraycenter */
    membersOfAllNeighbors(zobj, georadius, &amp;shape, ga, any ? count : 0);
    /* If there are no matches,Returns the empty object */
    if (ga-&gt;used == 0 &amp;&amp; storekey == NULL) {
        addReply(c,);
        geoArrayFree(ga);
        return;
    }
    long result_length = ga-&gt;used;
    long returned_items = (count == 0 || result_length &lt; count) ?
                          result_length : count;
    long option_length = 0;
    /*
     * Some subsequent parameter logic,For example, dealing with sorting,stockpile……
     */
    // liberate (a *er)geoArraySpace occupied
    geoArrayFree(ga);
}

The above code cuts out a lot of details, so interested students can check it out for themselves (src/ in redis). But you can see that the overall process of georadius is very clear:

  1. Parses the request parameters.
  2. Calculate the geohash and 8 neighbors where the target coordinates are located.
  3. Find all point sets in zset that satisfy the distance constraints in these 9 regions.
  4. Handles subsequent logic such as sorting.
  5. Clean up temporary storage space.

Interview questions column

Java interview questions columnIt's online, so feel free to visit.

  • If you don't know how to write a resume, resume projects don't know how to package them;
  • If there's something on your resume that you're not sure if you should put on it or not;
  • If there are some comprehensive questions you don't know how to answer;

Then feel free to private message me and I will help you in any way I can.