(ix) Redis Sentinel Mechanism and Clustering

In a master-slave replication, if the slave fails, the client can continue to send requests to the master or other slaves, but what if the master fails? Read requests, that can be continued by the slave library to provide services, write requests on the way out. At this point, the sentinel mechanism on the scene to solve the three problems:
(1) Is the main library really hung up?
(2) Which slave should be selected as the master?
(3) How do you notify the slaves and clients about the new master repository?

A Sentinel is a Redis process that runs in a special mode, concurrently with the master and slave instances, and is responsible for 3 main tasks: monitoring, master selection, and notification:
(1) Monitoring: This means that the sentinel process periodically sends PING commands to all master and slave libraries to detect whether they are still running online, and if they don't respond to the sentinel's PING commands within a specified period of time, the sentinel will mark it as "offline", and if the master library goes offline, it will automatically start the process of switching the master library.
(2) Selecting master: It means that after the master library hangs, the sentinel needs to select a slave instance from many slave libraries according to certain rules, and take it as the new master library. After this step is completed, there is a new master repository in the cluster.
(3) Notification: It means that the Sentinel will send the connection information of the new master to the other slave libraries, so that they can execute the replicaof command to establish a connection with the new master and perform data replication. At the same time, the connection information of the new master is notified to the clients so that they can send the request operation to the new master.

Among these three tasks, the notification task is relatively simple; the sentinel only needs to send the new master information to the slaves and clients so that they can establish a connection with the new master, and there is no decision-making logic involved. In contrast, in the monitoring and master selection tasks, the sentinel needs to make two decisions:
(1) In the monitoring task, the Sentinel needs to determine whether the main library is offline or not
(2) In the master selection task, the Sentinel also has to decide which slave instance to choose as master

control
There are two kinds of offline status of master library, "subjective offline" and "objective offline", when the response of PING command is timeout, then the sentinel will mark it as "subjective offline" first. If it's a slave, it won't have much of an impact, but if it's a master, it's time to start selecting a master. Once the master-slave switch is initiated, the subsequent master selection and notification operations will incur additional computation and communication overheads. In order to avoid these unnecessary overheads, we have to pay special attention to the case of misclassification. Misclassification is when the master repository is not actually offline, but the sentinel mistakenly thinks that it is offline, which generally occurs when the cluster network is under high pressure, the network is congested, or the master repository itself is under high pressure.

Usually deployed in a cluster mode with multiple sentinels to reduce the probability of misjudgment, which is also known as a sentinel cluster. The majority of the sentinels judge that the main bank has been "subjectively offline", and the main bank will be marked as "objectively offline", the judgment principle is: the minority obeys the majority.Simply put, the criterion for "objective offline" is that when there are N sentinel instances, there should be N/2 + 1 instances to judge the main library as "subjective offline" before the main library can be finally judged as "objective offline ". In this way, you can reduce the probability of misjudgment, but also to avoid misjudgment brought about by the unnecessary master-slave switch. (Of course, the number of instances that can make the "subjective offline" judgment can be set by the Redis administrator).

selector
In the selection of the master, we must first ensure that the selected slave library is still running online, but also to determine its previous network connection status, if the slave library is always connected to the master library and the network status is not too good, we have to sieve out the slave library. Specific judgment rule is the configuration item down-after-milliseconds * 10, down-after-milliseconds is the maximum connection timeout time that we believe the master and slave libraries are disconnected. If neither the master nor the slave node is connected over the network within down-after-milliseconds milliseconds, we can assume that the master and slave nodes are disconnected. If the number of disconnections exceeds 10, the slave is in bad condition and is not suitable as a new master.

The remaining slave libraries are scored according to the 3 rules, as long as there is a slave library with the highest score in a round, then it is the master library and the master selection process ends here. If there is no slave with the highest score, then the process continues to the next round.