[Miscellaneous Talks] The structure of the main backup and the way of selecting the master

为什么需要主备结构？

为了确保服务的高可用性，系统不能因为某一个节点的故障而完全不可用。因此，我们需要通过主备结构来确保在主节点发生故障时，备份节点能够迅速接管，继续提供服务。

为什么不直接通过多个节点共同提供服务？

因为与应用服务不同，这种节点提供的是数据维护和存储服务，为了确保数据的连续性和一致性，就只能由一个节点集中处理。如果多个节点同时处理数据，可能会导致数据的不一致或混乱。

比如：Mysql主从复制，Kafka的分区副本，Redis的主从复制。

主备结构是如何运作的？

主备结构通常由两个角色组成：Leader（主节点）和Follower（备份节点）。在这个结构中，Leader节点负责处理所有的客户端请求，维护数据。而Follower节点则定期从Leader节点同步数据。当Leader节点发生故障或不可用时，系统就需要进行选主操作。选主的目的是从现有的Follower节点中选出一个新的Leader，以确保系统的正常运行。

选主的核心工作是什么？

选主的核心任务可以总结为以下几点：

监控Leader节点的状态：持续检测Leader节点的健康状况。
选举新的Leader：当Leader节点挂掉时，从Follower节点中选举出一个新的Leader。
通知其他Follower节点：新Leader选出后，需要通知其他Follower节点，让它们切换到新Leader同步数据。

谁来负责选主？

选主操作通常由两种方式来处理：

第三方协调者：通过引入一个独立的第三方系统来协调选主，比如Zookeeper、etcd等。这些工具能够监控节点状态并负责选主。
同级节点：在没有第三方协调者的情况下，主备结构的节点会通过彼此的协调进行选主，通常是由Follower节点通过一定的算法（如Paxos或Raft）选出新的Leader。

第三方协调者，Zookeeper如何选主？

Zookeeper通过心跳机制（会话超时检测）来检测每个服务节点的运行状态。启动时，多个服务节点会在Zookeeper上按顺序生成临时顺序节点，序号最小的成为Leader节点。每个节点会监听前一个节点的状态，如果前一个节点挂掉（即它的临时节点被删除），当前节点就会变成Leader节点，并通知其他Follower节点进行同步。

这种选举方案并没有传统意义上的“投票”过程，只是通过顺序节点的编号自动接替，最小编号的节点成为Leader。

//官网描述
///doc/current/

Leader Election
A simple way of doing leader election with ZooKeeper is to use the SEQUENCE|EPHEMERAL flags when creating znodes that represent "proposals" of clients. The idea is to have a znode, say "/election", such that each znode creates a child znode "/election/guid-n_" with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper automatically appends a sequence number that is greater than anyone previously appended to a child of "/election". The process that created the znode with the smallest appended sequence number is the leader.

That's not all, though. It is important to watch for failures of the leader, so that a new client arises as the new leader in the case the current leader fails. A trivial solution is to have all application processes watching upon the current smallest znode, and checking if they are the new leader when the smallest znode goes away (note that the smallest znode will go away if the leader fails because the node is ephemeral). But this causes a herd effect: upon a failure of the current leader, all other processes receive a notification, and execute getChildren on "/election" to obtain the current list of children of "/election". If the number of clients is large, it causes a spike on the number of operations that ZooKeeper servers have to process. To avoid the herd effect, it is sufficient to watch for the next znode down on the sequence of znodes. If a client receives a notification that the znode it is watching is gone, then it becomes the new leader in the case that there is no smaller znode. Note that this avoids the herd effect by not having all clients watching the same znode.

问题就是，接替上来的Follower节点，Their data may not be up-to-date

A variant: electing a controller using Zookeeper

Kafka利用Zookeeper选出控制器（Controller），然后由Controller来控制分区首领的选举。它可以进一步判断分区副本的同步偏移量，to select the most up-to-date copy as the division leader.。

Third-party coordinator-based implementations are simple but suffer from a single point of failure. zookeeper cannot elect a master node in time if it hangs.

Sibling nodes: how does the Raft protocol pick a master?

The Raft protocol's master selection process does not rely on a third-party coordinator, but rather the decision is made by the service nodes by communicating with each other.There are several key issues involved in the process of selecting a master for Raft:

1. How can a Follower node determine if a Leader node is OK?

The Raft protocol was adopted byHeartbeat mechanism（The Leader node periodically sends heartbeat signals to the Follower node (usually AppendEntries RPC) to keep the Leader node alive. If the Follower node does not receive a heartbeat signal from the Leader within a certain period of time, it will think that the Leader node may have hung up and initiate an election.

2. How to select the master after the Leader node hangs?

When the Leader node hangs or is unable to continue providing services, the Follower node begins toreferendum to elect a new Leader:

How Follower nodes vote: Each Follower node willVote for yourself.In the Raft protocol, every node has aterm of office(term) with a new term number for each election cycle. Nodes determine whether to accept each other's votes by comparing term numbers.
Voting rules: During the election process, the node with the most votes becomes the new Leader. each node can only cast one vote, and that vote can only go to a node whose term number is greater than its own current term number.Nodes with more than half of the votesBecome a Leader.

On "voting"

Follower nodesHow do I know what Follower nodes there are in total?

The Follower node obtains information about other nodes in the cluster through the Leader node, which periodically passes information about cluster members to the Follower node via heartbeat (AppendEntries RPC).The Follower node, in the absence of a Leader, elects the Leader based on the previous cluster information.
How do Follower nodes canvass?

The Follower node is passed through theRequest for a vote(RequestVote RPC) to request support from other nodes. Each Follower node initiates a vote request to ask other nodes to vote for it during its own election term. If the requesting node's term is larger and the data is updated, the requested party agrees to the vote request.
How does a Follower node determine that it has the most votes?

Follower Nodes do not need to know the exact votes of other nodes。Raft Guarantee of Agreement，When a node gets a majority of（more than half）When the node is supported，It's going to be Leader。During the election process，节点只需要确保自己获得more than half的选票，without needing to know the specific votes of the other nodes。

3. What happens after the original Leader node is up and running? Will there be two Leader nodes?

The Raft protocol was adopted byterm number(Term) to prevent two Leader nodes. When the original Leader node resumes operation, it realizes that its term has expired (i.e., the other nodes have elected a new Leader), so it automatically becomes a Follower node and accepts control from the new Leader.

term number: The term number is incremented at each election. All nodes keep track of the current tenure number, and the node with the larger tenure number is considered the legitimate Leader. when the original Leader node is restored, it recognizes that it is no longer the Leader due to its older tenure number, and transforms into a Follower.

Does the Raft agreement solve the Byzantine problem?

No. Suffice it to say.Raft Protocol The prerequisite assumption is that the nodes in the cluster are deployed and controlled by a trusted party and that these nodes will operate according to the protocol rules without performing malicious operations.

summarize

Master-standby architectures are not just for databases and caching systems; many microservice architectures, message queues, and distributed storage systems require a master-standby architecture to ensure high system availability. As technology advances, modern master/backup architectures are introducing smarter master selection mechanisms to ensure that services can be restored quickly, even under extreme conditions.