Lease Mechanisms Explained

summarize

The lease mechanism means that the node that owns the lease has the right to manipulate some predefined objects during the lease term as follows

A lease is a commitment granted by an authorized person for a period of time
Once a lease has been issued by the authorizer, the authorizer is obliged to comply with the commitment and to perform at the time and in accordance with the content of the commitment, irrespective of whether or not it has been received by the recipient and irrespective of the state of affairs of the subsequent recipients, as long as the lease does not expire.
The recipient may use the authorizer's lease for the duration of the validity period, and if the lease expires, then the authorizer will no longer be responsible for the commitments of the lease. To continue to use the lease, a new application is required.
You can determine whether a lease is valid by version number, time period, or a fixed point in time.

An analogy can be made between the lease mechanism and the decentralization of a company to help understand. The company has a board of directors, CEO, CTO and CFO, the board of directors of the company's different management rights in a certain period of time were authorized CEO, CTO, CFO, in a fixed period of time if there is a relevant matter, it is directly to the CEO, CTO, CFO to deal with, do not have to go through the board of directors of all things, because the board of directors has been authorized by the CEO, CTO, CFO, in a part of the time period The board of directors has authorized the CEO, CTO and CFO to execute the relevant matters within a certain period of time, and the board of directors cannot breach the contract within that period of time, so the CEO, CTO and CFO can execute the relevant rights in accordance with the line, and after the expiration of the agreed period of time, the CEO, CTO and CFO need to consider whether to renew or terminate the contract.

Issues addressed by the lease mechanism

1. State changes of nodes in distributed systems

Currently, most distributed systems are implemented using a master and standby approach, generally the master node is responsible for the management of the cluster, and at the same time is responsible for the data write operations and synchronize the data to the various standby nodes. The standby node receives read operations from users, and when the master node goes down, a master node is elected from the standby nodes to continue to serve the system.

So how does each node in the cluster determine the state of the other nodes? The answer is through the heartbeat mechanism. Suppose there are three nodes, respectively, Server-1, Server-2, Server-3, they are copies of each other, of which Server-1 is the master node, Server-2, Server-3 is the backup node. Another node, Server-Electer, is responsible for determining the state of the nodes, and after discovering the abnormality of the master node, it will re-elect a master node from the backup node to continue to serve the cluster.

Server-Electer communicates with other nodes at regular intervals through a heartbeat mechanism, and if it does not receive a heartbeat from a node for more than a certain period of time, the node is considered abnormal. This mechanism works well when the network between nodes in the cluster is normal, but it can cause problems when there is a network partition (abnormal network communication between nodes in the cluster). For example, if a Server-Electer node does not receive a heartbeat from the master node, this may be due to an abnormality on the master node itself, but also due to an abnormality in the network communication between the Server-Electer and the master node. If the communication between Server-Electer and Server-2 and Server-3 is normal, Server-Electer will elect a master node from the two backup nodes, and here we assume that Server-2 is elected as the master node, then there are two master nodes in the cluster, which is called the dual-master problem. If the cluster has a dual-master problem, after Server-1's network is restored, the standby node Server-3 receives data synchronization requests from the two master nodes, Server-1 and Server-2, and Server-3's data will be inconsistent

What should we do when there is a dual owner problem? The lease mechanism gives us a good solution. In the implementation of the lease mechanism, the lease is sent by the elected node to other nodes, and if the node holds a valid lease, the node is considered to be able to provide services normally. For example, three working nodes Server-1, Server-2, Server-3 still report their status to the election node Server-Electer through the heartbeat mechanism, the election node receives the heartbeat and sends a lease to the three working nodes, indicating the confirmation of the node's status, and allows the use of the power of the lease within the validity of the lease and to provide services to the outside world.

At this point, you can let the election node Server-Electer give a special lease to the master node Server-1, indicating that this node is the master node. Once a network partition or other problems occur, and the election node needs to switch the master node, it only needs to wait for the expiration of the lease of the previous master node, and then re-issue a new lease to the newly elected master node. Even if the previous master node's network is restored and other nodes find out that its lease has expired, they will not recognize it as the master node.

2. Distributed caching

In distributed systems, in order to accelerate the speed of the user to read data, we often will often be accessed in the client data cache, so that when the user reads the data, it will first be read from the local cache, if there is no in the cache, then get the latest data from the server and update the local cache.

However, this program has cache consistency problems, there are two common solutions to this problem, one program is polling, that is, each time the client reads the data, the client first asks the server side cache data is not the latest, if not, get the latest data from the server side. With this solution, every time you read data, you have to communicate with the server, which increases the pressure on the server and reduces the effect of the cache.

Another option is invalidation, where the server makes changes to the data and first notifies those clients that the data has been invalidated, allowing the clients to reload it. The problem with this approach is that the server needs to maintain the state of all clients and notify all clients every time a data update is made. This increases the complexity and operational burden of the server, and if the client cannot be contacted, the modification operation will not be successfully notified to the client, resulting in data inconsistencies on the client side.

So how can we use the lease mechanism to solve the cache consistency problem? We can let the server send a lease to the cache client, in the lease validity period, the client reads the data from the client, if the server wants to change the data, it first seeks the consent of the client of this piece of data lease, and only after that can modify the data. The client acquires the lease when it fetches data from the server, and if it does not receive a modification request from the server within the lease validity period, it is guaranteed that the content in the current cache is up-to-date. If a data modification request is received within the lease time limit and agreed, the cache needs to be emptied and reloaded. After the expiration of the lease, if the client still wants to read data from the cache, it must obtain a new lease, we call this process renewal.

This way, the client can be assured that the data in its cache is up-to-date during the term of the lease. At the same time, the lease can tolerate network segmentation problems, so that if a client crash or network outage occurs, the server simply waits for its lease to expire to perform the modification operation. If the server makes an error and loses all client information, it only needs to know the maximum duration of the lease and can safely modify the data after that period. In contrast to the invalidation approach, the server only needs to remember the clients that still have leases.

3. Relieving pressure on master nodes

In a distributed system, metadata information is maintained on the master node, and when users access the data, they first need to access the metadata information on the master node to locate the data node where the data is located, and then go to the data node to access the data, so that all the client's requests have to get the information of the source data from the master node, which leads to excessive pressure on the master node.

To solve this problem, we can cache the information of metadata in the client and ensure that the data of the master node and the client are the same during the lease validity period through the lease mechanism. When the client accesses the data, it will first look up from the local cache. If the local cache is not available, it will look up the data on the master node and update the cache and lease information to reduce the pressure on the master node.

Clock synchronization for lease mechanisms

1. The issuer's clock is faster than the receiver's clock

If the issuer's clock is faster than the receiver's clock, the receiver will continue to consider the lease valid when the issuer considers it to have expired, resulting in an invalid commitment and affecting the correctness of the system. It is common practice to set the issuer's expiration date slightly larger than the receiver's, as long as it is larger than the clock error, so as to avoid the impact on the validity of the lease

2. The issuer's clock is slower than the receiver's clock

If the issuer's clock is slower than the receiver's clock, the issuer still considers the lease to be valid when the receiver considers the lease to have expired. The receiver can work around this problem by applying for the lease again before it expires

Application of the lease mechanism

1. The lease mechanism in HDFS

In HDFS, when a client user writes data to a file, other clients are not allowed to write to that file in order to guarantee data consistency. So how does HDFS accomplish this? The answer is the lease mechanism, when the client wants to write an HDFS file, first from the HDFS service to obtain a lease to write the file, only the client holding the lease is allowed to write to the file, otherwise the client's request to write to the file will be rejected, the client to write to the file to complete the completion of the operation to release the lease!

2. Lease mechanisms in Eureka

Eureka implements the functions of service registration and service discovery.The roles of Eureka are divided into the service side (EurekaServer) and the client rui, the client refers to the service instances that are registered to the registry center (EurekaServer), which is further divided into the service provider and the service consumer, who obtains the service provider's service address from the enrollment center and invokes that service. Service Consumers obtain the service address of the service provider from the registry center and invoke the service. At startup, the service provider first registers its information with the EurekaServer and maintains a renewal request that continuously sends information to the EurekaServer indicating that it is functioning properly. If the EurekaServer does not receive a renewal request for an extended period of time, it removes the service instance from the service list

Characteristics of the lease mechanism

The issuance process of the lease mechanism only requires that the network can communicate in one direction, and the same lease issuer can repeatedly send leases to the receiver; even if the issuer occasionally fails to send a lease, the issuer can simply solve the problem by reissuing the lease.
Machine downtime has little effect on the lease mechanism; if the issuer goes down, the downed issuer is usually unable to change its previous commitments without affecting the correctness of the lease. After the issuer recovers from downtime, if the issuer recovers its previous lease information, the issuer can continue to honor its lease commitments. If the issuer is unable to recover the lease information, it simply waits for a maximum lease timeout period
The lease mechanism relies on the expiration date, which requires clock synchronization between the issuer and the receiver
In the actual implementation, we also need to consider the issue of releasing the resources of the issuer or master node after the lease expires