Redis cluster slot migration transformation practice

Author: from vivo internet storage team- Xu Xingbao

Redis clusters often need to perform online horizontal expansion and contraction, and during the actual operation, it is found that the service latency during the migration period is extremely jittery, which is obvious to the business side, in order to cope with the above problems, the native Redis cluster slot migration function is optimized and transformed.

I. Background

Redis clustering is widely used in Internet companies, and it is well known that service clustering can break through the capacity bottleneck of a single node and bring benefits in scale, availability, scalability, and other aspects. In the process of using Redis clusters, we found that when we performed horizontal expansion and contraction operations involving cluster data migration, the business side repeatedly reported that the latency of Redis requests increased, and even the availability failure of cluster nodes was caused by the expansion operation, which further triggered a series of serious impacts such as interruption of the migration process, and data cracks between nodes, which caused a great deal of trouble to the operation and maintenance colleagues and seriously affected the stability of the online service. This has caused great trouble to operation and maintenance colleagues and seriously affected the stability of online services.

II. Analysis of issues

2.1 Introduction to native migration

Redis clustering is a centerless architecture in which each node maintains its own view of the cluster topology and saves its own slice data, and the cluster nodes coordinate and notify each other of changes through the gossip protocol. Specifically, the Redis cluster uses a virtual hash slot partitioning mechanism for data management, in which the keys are mapped by a hash function to integer slots ranging from 0 to 16383, where the slots are referred to as slots in the design of the Redis cluster, so that each node only needs to be responsible for maintaining a portion of the key-value data mapped by the slot, which becomes the basic unit of the data management in the Redis cluster. The ability of a Redis cluster to scale horizontally is based on the slot dimension, as shown in the following diagram.

In the migration steps shown in the figure above, steps 1-2 are to mark the state of the slot to be migrated, which is convenient for data access during the migration process, and steps 3-4 are the core steps of the migration, which will be carried out continuously under the scheduling of step 5 until the key-value data of the slot to be migrated is completely migrated to the target node, and step 6 will be carried out after the completion of the data transfer, which is mainly to initiate cluster Step 6 is performed after the data transfer is complete, and mainly involves initiating a cluster broadcast message to update the topology of the node slots in the cluster.

Since normal migration is a continuous process, it is inevitable that the migrating slot data will be "split" between the two ends of the migration, and this state will continue to exist as the slot migration proceeds. To ensure that the migrated slot data can be read and written normally during migration, the Redis cluster implements an ask-move mechanism as shown in the following figure. If a request is made to access the migrated slot data, the request first accesses the migrated source node in accordance with the cluster topology, and then responds to the request normally if it finds the data at the source node. If the requested data is not found at the source node, the request is processed normally; if the requested data is not found at the source node, an ASK {ip}:{port} message is sent back to the client.

After receiving the packet, the Redis smart client will find a new node to retry the command according to the node information in the packet. However, since the target node does not have the ownership of the migrated slot at this point, the smart client will send an asking command to the destination node to ensure that the next request to access the data of the migrated slot will be accepted and processed. Since native migration is performed at the key granularity, and the data for a key either exists in the source node or in the destination node, the Redis cluster can ensure the consistency and integrity of the data access during the migration period by implementing the ask-move mechanism as described above.

2.2 Analysis of migration issues

(1) Delay analysis
Based on the above steps of native Redis cluster migration, it can be summarized that the native migration function is performed at the key granularity, i.e., it constantly scans the source node for the slot data being migrated and sends the data to the destination node, which is the core logic of cluster data migration. In micro terms, migrating a single key consists of the following operations on the server side:

Serialize the key-value pair data to be migrated;
Send serialized packets over a network connection;
Wait for a reply (the target will not return until it has received the packet and loaded it successfully);
Delete the local residual copy and free up memory.

The above operation involves a number of thread-consuming operations, first of all, serializing data is a very CPU time-consuming operation, if the key to be migrated is relatively large thread occupancy will also deteriorate, which is unacceptable for a single-threaded Redis service, and furthermore, the network sends the data to the target node will be synchronized to wait for the results to be returned, and the destination side of the migration will be in the data deserialization and repository operation before returning the results to the source node. The migration destination will return the results to the source node only after deserialization and repository operations. It is important to note that during the migration period, the above steps will be looped continuously, and these steps are processed continuously in the worker thread, during which normal requests cannot be processed, so this will lead to continuous spikes in service response latency, which can be verified by the monitoring data in slowlog, which captures a large number of migrate and restore commands during the migration period. restore commands.

(2) ask-move overhead
Under normal circumstances, the data of each slot being migrated will be distributed at both ends of the migration for a period of time. During the migration period, the slot data access request can be made through the ask-move mechanism to ensure data consistency, but it is not difficult to see that such a mechanism will lead to an exponential increase in the number of network accesses of a single request, and there is also a certain amount of overhead pressure on the client. In addition, for the possible existence of users using Lua or Pipline, such as the need for multiple keys within a single slot continuous access scenarios, most of the current cluster intelligent client support is limited, may encounter related requests during the migration period can not be normal execution of the error report. In addition, it should be noted that since the ask-move mechanism can only be triggered on the master node at both ends of the migration, the slave nodes cannot guarantee the consistency of the data request results during the migration period, which is also very unfriendly to users who use read-write separation to access the cluster data.

(3) Topology change overhead
In order to reduce the impact of the data ask-move mechanism on the request during migration, under normal circumstances, the native migration will only operate one slot migration at a time, which leads to each completed migration slot will trigger a topology update of the nodes in the cluster, and each time the cluster topology update will trigger the execution of the instructions of the business client to send a request to seek an update of the cluster topology at the same time, the topology refresh request results in high computational overhead, which greatly increases the processing overhead of the nodes, and also causes the delay of normal service request spikes, especially for larger connections and more cluster nodes. The high computational overhead and large result set of the topology refresh request greatly increase the processing overhead of the nodes, and also cause sudden spikes in the delay of normal service requests. Especially for clusters with a large number of connections and many cluster nodes, the centralized topology refresh request can easily cause the nodes to be strained for computing resources and network congestion, which can easily trigger a variety of service anomalies and alarms.

(4) Migration without high availability
The slot tag state of the native migration only exists in the master node of the migrating dual-end, and its corresponding slave nodes are not aware of the migration state, which leads to the interruption of the migration process and the residual slot state in the event of node failure during the migration period, and further leads to abnormalities in the access requests for migrated slot data that cannot be triggered normally by the ask-move mechanism. This will also lead to an exception when the access request to the migrated slot data fails to trigger the ask-move mechanism. For example, if the migration source node is abnormal and the slave node fails over, the new master node cannot synchronize to the migration status information, so the request for the migrated slot cannot trigger the ask reply. If it is a write request for the data that has been migrated to the target node, the new master node will add a new key in this node directly, which will result in the data being cracked. Similarly, if you are dealing with a read request for data that has already been migrated, there is no guarantee that the correct result will be returned.

III. Optimization programme

3.1 Reflections on the direction of optimization

Through the analysis of the native data migration mechanism, it can be found that the migration operation involves a large number of synchronous blocking operations that will occupy the work thread for a long time, as well as frequent topology refreshing operations, which will lead to a constant rise in the request latency. So can we consider transforming the synchronous operations that block the worker thread into asynchronous thread processing? This transformation has a very high risk, because the reason why the native migration can ensure the correctness of the data access during the migration, it is these synchronous interfaces to ensure consistency, if changed to asynchronous operation will need to introduce concurrency control, but also to consider the synchronization of the migration data request and the slave node to coordinate the issue of this solution can not solve the problem of topology change overhead. Therefore, vivo's own Redis gave up the native migration logic in accordance with the key granularity, combined with the real capacity expansion needs of the line, and adopted a master-slave synchronous data migration logic, which disguises the migration target node as the slave node of the migration source node, and transfers the data through the master-slave protocol.

3.2 Function realization principle

Redis master-slave synchronization refers to the process of synchronizing and replicating data between the Redis master and slave nodes. master-slave synchronization improves the availability of Redis clusters and avoids problems such as single points of failure and data loss. master-slave synchronization can be done in two ways: full synchronization and partial synchronization, where the slave node sends synchronization bits to the master. The slave node sends synchronization bits to the master node, and if it is the first time to synchronize, it needs to go through the full synchronization logic, and the master node synchronizes the data to the slave node by sending the RDB base data file and propagating the incremental commands; if it is not the first time to synchronize, the master node determines whether it meets the conditions for incremental synchronization through the bits in the synchronization request of the slave node, and prioritizes incremental synchronization in order to control the synchronization overhead. Since the master node is also continuously processing new command requests during the synchronization period, the data synchronization from the slave node to the master node is a dynamic catch-up process, and under normal circumstances, the master node will continuously send write commands to the slave node.

Based on the synchronization mechanism, we designed and implemented a set of Redis cluster data migration functions as shown in the following figure. The biggest difference between migrating data and synchronizing data is that under normal circumstances, what needs to be migrated is part of the slot data of the source node, and the target node does not need to replicate the full data of the source node, so reusing the synchronization mechanism completely will generate unnecessary overhead, and it is necessary to modify the master-slave synchronization logic to adapt to the problem. In order to solve this problem, we have made some targeted modifications to the relevant logic. First, in the interaction of synchronization commands, we added the interaction of slot information between migrating nodes for migration scenarios, so that the migrating source node can know which slots need to be migrated to which node. In addition, we also adjusted the RDB file structure according to the slot order, and recorded the file start offset data of each slot data as metadata in a fixed position at the end of the RDB file, so that the target slot data segments in the RDB file can be easily indexed during the RDB transfer step of the migration operation.

3.3 Analysis of the effect of modification

(1) Low delay impact
For the slot migration operation, it mainly involves the overheads of the migration source and destination. For the new slot migration based on the master-slave synchronization mechanism, the source node's main overhead is in generating RDBs and transmitting network packets, which normally has little impact on the request latency. However, because the destination node needs to receive and load larger RDB file fragments, and because the destination node also needs to respond to normal service requests when migrating, it can no longer use a similar slave node to save the local file after receiving all the data, and then carry out the blocking data loading scheme, so the new slot migration function carries out a targeted transformation of the data loading process of the migrating destination node. Therefore, the new slot migration function has modified the data loading process of the migration destination node, and the destination node will load the data incrementally according to the granularity of the received network packets as shown in the figure below, i.e., the target node of the slot migration will try to load every RDB data network packet it receives, and each time, it will only load the complete elements contained in the network packet this time, so that composite data can be loaded according to the granularity of the field, which reduces the drastic effect of the migration of multi-element and large-key data on the access latency. This reduces the drastic impact of multi-element, large key data migration on access latency. Through this design to maintain the original single-threaded simple architecture at the same time, effectively control the impact of latency, all data change operations are maintained in the working thread, do not need to concurrency control. The above transformation basically eliminates the impact of migrating a large key on the latency of the migration destination node.

(2) Stable data access
During the new slot migration operation, the data being migrated is still stored on the source node, requests continue to be processed normally on the source node, and the user-side requests will not trigger the ask-move forwarding mechanism. In this way, users don't need to worry about data inconsistency caused by read/write separation, and there will not be a large number of requests reporting errors when executing commands encapsulated in transactions, pipelines, and so on. Once the migration action is completed, the migrated slot data remaining at the source end will become the residual data of the node, and this part of the data will not be accessed again. The cleanup of the residual data mentioned above is designed to be carried out step by step in serverCron so that how much data is cleaned up at each time can be parametrically controlled, and personalized settings can be carried out according to the need to ensure that the impact of data cleanup on the normal service requests is completely controllable.

(3) Few topology changes
In order to reduce the impact of the ask-move mechanism on normal service requests, the native migration function only migrates data to one slot at a time, and immediately initiates a topology change notification to cluster nodes to switch the slot's owner after migration, which results in the number of topology changes increasing with the number of migrated slots, and the client sends commands to update the topology after each topology change. The client also sends a command to update the topology after each topology change is detected. The computational overhead of the command to update topology information is large, and if multiple commands to query the topology are processed centrally, it will lead to a strain on node resources. The new slot migration synchronizes data according to nodes and can support simultaneous migration of multiple slots or even all the data of the source node. Finally, the owners of multiple slots can be converted by a single topology change, which greatly reduces the impact of topology refreshing.

(4) Support for high availability
Migrating data from a cluster is a continuous process that can take up to several hours, during which time the service may experience various anomalies. Normal Redis clusters have a failover mechanism whereby slave nodes can sense a node exception and serve in place of the old master node. The new slot migration feature responds to this availability problem by synchronizing the slot migration state to the slave nodes, so that if the cluster migration node fails over during the migration period, the slave nodes can continue the data migration process instead of the old master node, ensuring the high availability of the migration process, avoiding manual intervention, and greatly simplifying the complexity of the operation and maintenance operations.

IV. Functional Test Comparison

In order to verify the effect of the transformed migration function and compare the impact of self-research migration and native migration on the request response, two clusters with the same topology of native and self-research were deployed on three physical machines with the same configuration, and the migration tests were conducted on the hash data types with two sizes of 100k and 1MB, and the data with a memory usage of about 5G was migrated between nodes in each round. The main purpose of the test is to compare the impact of the number transfer on node service delay before and after the transformation, so in the actual test there is no background traffic operation on the cluster nodes, and the node delay data is collected by pinging the node 10 times per second, and the delay monitoring data of the source node and the destination node during the migration period are shown in the table below (the vertical axis value unit: ms).

By comparing the above latency monitoring data during native and self-developed cluster slot migration, we can see that the request response latency of the nodes at both ends of the migration during the self-developed slot migration function's migration data is very smooth, which also demonstrates the advantages and value of the Redis cluster slot migration function transformed by the master-slave replication principle.

V. Summary and outlook

The expansion and contraction function of the native Redis cluster performs data migration at the key granularity. Larger keys will cause the work threads to be occupied for a long time, which in turn will cause high latency of normal service requests, and even lead to nodes not being able to reply to the heartbeat packets for a long time, which will lead to offline judgment, thus posing a risk to the stability of the Redis cluster. The new slot migration function realized through the transformation of the synchronization mechanism can significantly reduce the impact of data migration on user access latency and improve the stability and operation and maintenance efficiency of online Redis clusters. At the same time, there are still some problems with the new slot migration function, for example, the new migration causes the nodes to experience frequent bgsave pressure, and the nodes' memory consumption increases during the migration period, and so on. In the future, we will continue to optimize and summarize these specific issues.