(H) Redis master-slave replication, slicing clusters

I. Master-Slave Replication

1. Master-slave relationship

Redis is said to be highly reliable, which has two meanings: one is to lose as little data as possible, and the other is to interrupt service as little as possible.AOF and RDB guarantee the former, and for the latter, Redis does this by storing a single copy of the data on multiple instances at the same time. In order to ensure data consistency, Redis provides a master-slave model and uses read-write separation, as shown in the following figure.

2、Master-Slave Replication - Full

When multiple Redis instances are started, they can form a master-slave relationship with each other using the replicaof (slaveof before Redis 5.0) command. For example, the command to make instance 1 (ip: 172.16.19.3) and instance 2 (ip: 172.16.19.5) a master-slave relationship:replicaof 172.16.19.3 6379, when the relationship is established, the first synchronization sub-data is in three phases:

(1) The slave repository sends a psync command to the master repository, indicating that it wants to synchronize data, including the master repository's runID (a random ID generated by the startup of the redis instance) and the replication progress offset two parameters, the initial replication runID is -1, and the master repository responds to the command with a FULLRESYNC response with the two parameters and returns it to the slave repository. The master repository will respond with a FULLRESYNC (first full replication) command with the two parameters and return it to the slave repository.
(2) The master library executes the bgsave command to generate the RDB file and sends it to the slave library, which will first empty the current database and then load the RDB file. This process will not be blocked by the master library, in order to ensure data consistency between the master and slave libraries, the master library will use a specialized replication buffer in memory to record all write operations received after the generation of the RDB file.
(3) After the master library finishes sending the RDB file, it sends the modification operations in the replication buffer at this time to the slave library, and the slave library re-executes these operations.

3. Master-slave replication - cascade

In full replication, the master library needs to complete two time-consuming operations: generating RDB files and transferring RDB files. If the number of slave libraries is large, the master thread is busy with fork subprocesses to generate RDB files, which will block the master thread from processing normal requests, resulting in the master library responding to slower requests from applications. In addition, transferring RDB files will also take up the network bandwidth of the master, which will also put pressure on the master's resource utilization.

We can use the "master-slave-slave" model to cascade the pressure of generating RDBs and transferring RDBs from the master to the slaves, thus reducing the pressure on the master. Simply put, when deploying a master-slave cluster, we can manually select a slave (e.g., a slave with a high memory resource configuration) to cascade to other slaves, which is equivalent to selecting a slave as the master of other slaves, and executing thereplicaof Selected Slave IP 6379, building relationships.

After the master-slave replication is completed, they will always maintain a network connection between them, the master library will be through this connection to the subsequent successive receipt of the command operation and then synchronized to the slave library, this process is also known as the command propagation based on a long connection, you can avoid the overhead of frequent establishment of the connection.

4. Network problems

After a network outage, the master and the slave will synchronize the commands received by the master during the outage to the slave using incremental replication, during which the commands are written to the replication buffer and the repl_backlog_buffer buffer. This is a ring buffer, and the master keeps track of where it has written to, while the slave keeps track of where it has read.

Initially, the two positions are the same, but as the master library continues to receive new write operations, the write position in the buffer will gradually deviate from the starting position, and the offset is usually used to measure the size of this offset distance; the more the offset, the larger the master_repl_offset.

When the connection between the master and the slave is restored, the slave will first send the psync command to the master to send its current slave_repl_offset to the master, and the master will form a command to send to the slave to synchronize the data according to the gap between master_repl_offset and slave_repl_offset.

It should be emphasized that because repl_backlog_buffer is a ring buffer, the master will continue to write after the buffer is full, and at this point, it will overwrite the operations written before. If the read speed of the slave is slow, it is possible that the operations that have not been read by the slave are overwritten by the newly written operations of the master, which will lead to data inconsistency between the master and the slave.

We can adjust the parameter repl_backlog_size to set the buffer size. The formula is: buffer_space_size = master write command speed * operation size - master-slave network transfer command speed * operation size. In practice, we can expand the buffer size by a certain factor to cope with unexpected request pressure.

If the number of concurrent requests is very large, in addition to increasing the repl_backlog_size value appropriately, you need to consider using a sliced cluster to share the pressure of requests from a single master repository.

II. Slicing clusters

When using RDB persistence, Redis will fork the child process to complete, fork operation time and Redis data volume is positively correlated, and fork in the execution of the main thread will block. The larger the amount of data, the longer the fork operation causes the main thread to block, resulting in a slower response from Redis. We use the INFO command to see the value of Redis's latest_fork_usec metric (which indicates how long the most recent fork took). So when the amount of data continues to grow, expanding memory is not a good idea, and slicing clusters should be used.

Slicing cluster, also called slicing cluster, means starting multiple Redis instances to form a cluster, and then according to certain rules, the received data is divided into multiple copies, and each copy is saved with one instance. For example, if you divide 25GB of data into 5 equal parts (of course, you can also not do the equalization), and use 5 instances to save, each instance only needs to save 5GB of data. When the instance generates the RDB for 5GB of data, the amount of data is much smaller, and the fork sub-processes generally do not block the main thread for a longer period of time.
The two approaches, scaling up memory and slicing clusters, correspond to scaling up vertically and scaling out horizontally:
(1) Vertical Scaling: Upgrading the resource configuration of a single Redis instance, including increasing memory capacity, increasing disk capacity, and using a higher configuration of CPUs.
(2) Horizontal scaling: increase the number of current Redis instances horizontally.
The advantage of vertical scaling is that it is simple and straightforward to implement, but it is limited by the blocking problems and hardware costs associated with increasing data volumes. Compared to vertical scaling, horizontal scaling is a more scalable solution. To save more data, you only need to increase the number of Redis instances, which is relatively more complex to manage.

We then need to address two major issues:
(1) How is the data distributed among multiple instances after slicing?
(2) How does the client determine on which instance the data it wants to access is located?

Redis slicing clusters are usually implemented as Redis Clusters, which use hash slots (which I'll refer to as slots next) to handle the mapping between data and instances. A slicing cluster has a total of 16,384 hash slots, which are analogous to data partitions, where each key-value pair is mapped to a hash slot based on its key. Each key-value pair is mapped to a hash slot based on its key. First according to the key of the key-value pairs in accordance with the CRC16 algorithm to calculate a 16-bit value, and then use this 16-bit value on the 16384 modulo, the modulus in the range of 0~16383, each modulus represents a corresponding number of hash slots.

When you create a cluster using the cluster create command, Redis automatically distributes these slots evenly across the cluster instances, with 16384/N slots per instance. You can also use the cluster meet command to manually establish connections between instances to form a cluster, and then use the cluster addslots command to specify the number of hash slots on each instance. For example, suppose 3 instances with 5 hash slots are configured according to the following figure based on instance memory:

redis-cli -h 172.16.19.3 –p 6379 cluster addslots 0,1
redis-cli -h 172.16.19.4 –p 6379 cluster addslots 2,3
redis-cli -h 172.16.19.5 –p 6379 cluster addslots 4

After key1 and key2 have computed the CRC16 value, the total number of hash slots is modulo 5, and then according to the result of the respective modulus, they can be mapped to the corresponding instances 1 and 3, and this process completes the problem of data distribution. (Note: When manually allocating hash slots, you need to allocate all 16384 slots, otherwise the Redis cluster will not work properly)

After the cluster is created, the Redis instance sends its hash slot information to other instances connected to it to accomplish the spread of hash slot allocation information. After a client establishes a connection with a clustered instance, the instance sends the hash slot allocation information to the client, and the client caches the hash slot information locally when it receives it. When the client requests a key-value pair, it will first calculate the hash slot corresponding to the key, and then it can send a request to the corresponding instance.

However, the correspondence between instances and hash slots is not static in clustering, and there are two most common variations:
(1) In a cluster, instances are added or deleted, and Redis needs to reallocate hash slots
(2) For load balancing, Redis needs to redistribute the hash slots over all instances

Instances can pass messages to each other to get the latest hash slot allocation information, but the client is not actively aware of these changes. The Redis Cluster solution provides a redirection mechanism, which means that when a client sends a data read or write operation to an instance that doesn't have the corresponding data, it will return the MOVED command response result, which contains the access address of the new instance. The MOVED command response result contains the access address of the new instance, and the client is redirected to the new instance, while updating the local cache of the corresponding relationship.

GET hello:key
(error) MOVED 13320 172.16.19.5:6379

In practice, if the instance data is being migrated just in time, only part of the data in Slot 2 accessed is migrated to instance 3, and some data is not migrated. In this case, the client will receive an ASK error message:

GET hello:key
(error) ASK 13320 172.16.19.5:6379

This indicates that the key-value pair requested by the client is in hash slot 13320, on the instance 172.16.19.5, but this hash slot is being migrated, and the client needs to send an ASKING command to the instance 172.16.19.5 first to get the instance to allow the execution of the commands that the client sends next. The client then sends a GET command to this instance to read the data. An example is shown in the figure:

Slot 2 is migrating from Instance 2 to Instance 3, key1 and key2 have already migrated, key3 and key4 are still in Instance 2. After the client requests key2 from Instance 2, it receives an ASK command back from Instance 2. The client will receive an ASK command from instance 2 after requesting key2 from instance 2, and then send an ASKING command to instance 3 in order to read the data of key2. Note that unlike the MOVED command, the ASK command does not update the hash slot allocation information cached by the client. If the client requests the data in Slot 2 again, it still sends a request to Instance 2 and repeats the above steps.