Author: from vivo internet server team- Li Fan
In this paper, we start by tracing the emergence of the time-wheel algorithm, describing the queue-based implementation of timed tasks before the time-wheel algorithm appeared, and the shortcomings of the queue-based implementation of timed tasks. Then we introduce the algorithmic idea of the time wheel algorithm and its data structure, and elaborate on the data structure and advantages and disadvantages of the three time wheel models.
Again, we introduce the Time Wheel algorithm in the Dubbo framework and give its main implementation in Dubbo.
Finally, we start with the optimization of a service architecture in a project, introduce the flaws in the current design, and give a way to optimize the design with the help of a delayed MQ from a middleware team that includes the implementation of the time wheel algorithm.
Chapter 1 Timed Tasks and Time Wheel Algorithm Development
1.1 Emergence of the time wheel algorithm
In computing programs, timers are used to specify a specific point in time to perform a given task. The Wheel of Time algorithm is one such clever algorithm that implements a delay function (timer). The Wheel of Time algorithm first appeared in 1997 in an IEEE journal by George Varghese and Anthony Lauck titled "Hashed and Hierarchical Timing Wheels: Efficient Data Structures for Implementing a Timer Facility". The paper points out that conventional algorithms for implementing operating system timer modules require O(n) time complexity to start and maintain the timer, which is a huge time overhead for larger problem sizes (n). The paper proposes and demonstrates that it is possible to start, stop, and maintain the timer using O(1) time complexity by means of a data structure of cyclic buckets, and introduces a treatment of the time interval division. partitioning is handled, the first way is to hash (hash) all timer intervals which are hashed into specific slots on the time wheel (Slot), the second way is to use multi-granularity timing wheels to form compositions with a hierarchical structure that extends over a larger time range. These two structures are described in detail in Chapter II.
1.2 Queue-Based Timed Task Execution Models
In the world of computers, the value of an algorithm can be maximized only when the problem to be solved is large-scale. Before introducing the time wheel algorithm, it is necessary to introduce another implementation of timed tasks, namely, queue-based timed tasks. The data structure of queue is used extensively in both operating systems and programming languages such as Java, and will not be discussed further in this paper.
The following is a detailed description of the thread model, the types of timed tasks and the data structure of the task queue:
(1) Thread Model
User thread: responsible for the registration of timed tasks; Polling thread: responsible for scanning the task queue for tasks that meet the execution conditions, for example, if the pending time of a task has been reached, the polling thread will take out the task from the queue and hand it over to the asynchronous thread pool to process the task. Asynchronous Thread Pool: Specialized for task execution.
(2) Timed tasks
Timed tasks are mainly divided into one-time execution of the timed task (Dubbo timeout judgment) and repeated execution of the timed task, these two kinds of timed tasks are very good to understand, one-time execution of the timed task in the stipulated future at a certain time or a fixed period of time away from the present time to carry out, corresponding to the concepts of absolute value and relative value.
Whereas, the repeated execution timed task is repeated multiple times on the basis of one time execution of the task, which means that, in the above thread coordination work, when the repeated execution of the task is completed once, it will be put back into the task queue.
(3) Task queue data structure
Starting from the simplest data structure, assuming that we choose the most basic queue, or considering the convenience of adding or subtracting tasks, choosing a bidirectional linked table as the task queue, and providing a timestamp field for each task in the task queue, what are some of the problems that will arise from this strategy of implementation?
The biggest problem is in the query, assuming that there are some tasks in the task queue, then in order to find out the pending tasks that have reached the specified moment, the polling thread needs to scan all the tasks, the time complexity of such a data structure is O(n), and there are a large number of empty polls, i.e., most of the tasks have not reached the time of execution, which is almost unacceptable efficiency.
In order to improve the efficiency of the query, you can try to start from the data structure, the use of ordered queues, in the computer's algorithms, ordering can significantly improve the efficiency of traversal, so that the timed task queue polling thread from the beginning to the end of the traversal, in the discovery of any task has not reached the specified execution timestamps, you can stop traversing.
But maintaining orderliness also requires a price, the time complexity of queuing up a task in an ordinary task queue is only O(1), while the time complexity of queuing up a task in an ordered task queue is O(nlogn). Second, we can draw on the idea of partitioning, the task queue is divided into n, the use of multi-threaded traversal, in the case of fully concurrent execution of threads, the problem size is simplified to the original 1/n. However, multi-threaded will also be CPU execution efficiency is reduced.
In summary, the timed task framework needs to have the following elements:
- Strictly efficient data structures do not store tasks based on a simple queue structure, otherwise the execution of polling will never be efficient.
- Simple concurrency model: CPU threads are very valuable and should not take up too many thread resources.
Time wheel algorithm solves the defects of the above queue-based timed task execution model, so the idea of time wheel algorithm has been widely used in the development of the Internet technology, we are familiar with the Linux Crontab, as well as Java development commonly used Dubbo, Netty, Quartz, Akka, ZooKeeper, Kafka, etc., almost all of the time task scheduling ideas are used. of the time task scheduling have adopted the idea of time wheel algorithm.
It is worth mentioning that in Dubbo, in order to enhance the system fault tolerance, many places need to use the task scheduling that only needs to be executed once, for example, the consumer needs to know whether each RPC call is timed out or not, and in the initial implementation of Dubbo, it is used to put all the return results (defaultFuture) into a collection, and through a timed task, the interval scanning and a timed task scans all the futures at regular intervals to determine if they have timed out or not. This is a simple logic, but a waste of performance. Later, Dubbo borrowed from Netty and introduced a time wheel.
Chapter 2: Introduction to the idea of time wheel algorithm and application scenarios
2.1 Introduction to the Time Wheel
Wheel of Time is essentially a task scheduling model that efficiently utilizes thread resources, integrating large batches of tasks all into a single scheduler so as to provide unified scheduling management of tasks, with very high scheduling efficiency for timed tasks, delayed tasks, and other events.
At the heart of the Time Wheel algorithm is the fact that the threads polling the task queue described in Chapter 1 are no longer responsible for traversing all the tasks, but only the timescale. The wheel of time algorithm is like a hand constantly rotating on the clock, traversing, if found at a certain moment on the task (task queue), then all the tasks on the task queue will be executed, thus dramatically reducing the additional scanning operations.
In Chapter 1, we proposed that an efficient framework for timed tasks needs to be characterized by both rigorous and efficient data structures and a simple concurrency model, which the time-wheel model possesses.
Based on the idea of time wheel algorithm, many kinds of time wheel models have emerged subsequently, and there are three kinds of popular ones, which are simple time wheel model, time wheel model with round, and hierarchical time wheel model, and these three kinds of time wheel models will be introduced in the following.
2.2 Time wheel model
2.2.1 Simple time wheel model
The simple time wheel model no longer uses a queue as a data structure, but an array plus a linked list (a very classic combination), as shown in the figure below, which is implemented through an array and can be easily located by subscripts to the timed task links, so the time complexity of adding, deleting, and executing timed tasks is O(1).
Obviously, this simple time wheel solves the problem of traversal inefficiency in the task queue. After the polling thread traverses to a certain time scale, it always executes all the tasks in the task queue on the corresponding scale (which are usually thrown to the asynchronous thread pool to be processed) and no longer needs to traverse to check that all the tasks' timestamps have reached the requirements.
By increasing the number of slots, it is possible to refine the time granularity as well as get a larger time span, but such an implementation has huge drawbacks:
- When the time granularity is small, the time span is large, and the tasks are few, the polling of time slots becomes inefficient.
- When time granularity is small, the number of time slots is large, and tasks are few, the memory space occupied by many slots is meaningless.
2.2.2 Wheel of Time Model with Round
Analogous to the idea of cyclic arrays, later generations designed a wheel of time with a round, the structure of which is shown in the following figure:
As shown in Figure 2.2.2, expire represents the expiration time, and round represents the number of times the time wheel rotates before executing the task, that is to say, when the pointer turns to a certain bucket, it can not directly execute all the tasks under the bucket like a simple single time wheel. In other words, when the pointer turns to a certain bucket, we can't just execute all the tasks under the bucket like a simple single time wheel. Instead, we have to traverse the chain table under the bucket to determine whether the number of times the time wheel rotates is equal to the round value in the node.
This structure of the time wheel significantly reduces the number of required scales, i.e., it compensates for the waste of memory space in simple time wheels with more time slots and fewer tasks.
However, this structure of the time wheel does not reduce the number of polling threads polled, and is relatively inefficient.
2.2.3 Hierarchical time-wheel modeling
The hierarchical time wheel is also an improved solution to the simple time wheel, its design concept can be analogized to the daily life of the clock, respectively, hours, minutes, seconds three levels, and each wheel has 24, 60, 60 scales, so that only 144 scales are needed to represent the time of day, and the advantage of this representation is that the multiplier level of the time representation of the addition of the only constant The advantage of this representation is that the addition of a multiplier level of time representation requires only a constant level of scale increase. The advantage of this representation is that the addition of a multiplicative level of time representation requires only a constant level of scale increase. For example, the time of a month can be finely represented by adding 30 scales to the time of day that can be represented by 144 scales.
Hierarchical time wheel works in a way that the time wheel of the lower level drives the time wheel of the higher level to rotate, the arrow in the figure is the "decentralization" of the task, for example, for the task to be executed at 8:40:0 on the 2nd day, when the time wheel rotates to the scale 2, it will decentralize the task of the 2nd day to the slot corresponding to the time wheel with the scale of 8. When the time wheel rotates to the 8, it will continue to decentralize the task to the slot with the scale of 40 until the lowest level time wheel rotates to that slot, it will execute all the tasks in that slot. When the wheel reaches 8, the task will continue to be placed in the slot with a minute wheel scale of 40, until the lowest level of the time wheel reaches that slot, and all the tasks in that slot will be executed.
For time complexity, this kind of time round is no longer traversed to compute the round of the comparison task compared to the time round with round, but is directly taken out and executed in its entirety.
For spatial complexity, hierarchical time wheels utilize the idea of dimensional ascent to hierarchize time wheels, with each level of temporal granularity corresponding to a time wheel, and cascading collaboration between multiple time wheels.
2.3 Introduction to Time Wheel Application Scenarios
As an efficient scheduling model, time wheel is widely used in various scenarios, and the common scenarios are as follows:
(1) Timer
Time wheels are commonly used to implement timers that can perform specific tasks at specified times. Timers can be used for periodic tasks, timeout tasks, etc., such as polling I/O events, periodically refreshing the cache, and cleaning up garbage data at regular intervals.
(2) Load Balancing
Time Wheel can be used to implement load balancing algorithms to distribute requests to different servers to avoid overloading a single server. The time wheel can dynamically adjust the allocation policy based on the load on the servers to achieve dynamic load balancing.
(3) Event-driven
Time wheel can be used to implement event-driven model to assign events to different processors and improve concurrent processing capability. The events can be I/O events, timing events, user events, etc. The time wheel can dynamically adjust the allocation strategy according to the type and priority of the events to realize an efficient event-driven model.
(4) Database management
Time Wheel can be used to realize database management, allocate data to different storage devices, and improve the efficiency of reading and writing data. Time wheel can dynamically adjust the data allocation strategy according to the type, size and access frequency of the data to realize efficient database management.
(5) Other applications
Time wheel can also be used for a number of other applications, such as message queuing, task scheduling, network traffic control, etc. The specific application depends on the specific needs and scenarios.
Chapter 3 Application and Implementation of Wheel of Time in Dubbo
3.1 Application of Time Wheel in Dubbo
In Dubbo's design, when the client calls the server, the task is timed, and if the task times out, it is detected and the request is retried. In the original implementation of Dubbo, the defaultFuture is to put all the returned results into a collection and scan all the futures at regular intervals with a timed task to determine if they have timed out one by one.
This is a simple logic, but a waste of performance. Later, Dubbo borrowed from Netty and introduced a time wheel. Tasks are managed by the time wheel and scheduled by a specialized thread.
3.2 Time Wheel Implementation in Dubbo
The implementation of the Time Wheel algorithm in Dubbo has one main class and three interfaces:
First is the Timer interface, this a scheduling core interface, mainly used for background one-time scheduling, we only introduce the newTimeOut method, this method is to throw a task to the scheduler to execute, the first parameter type TimerTask, that is, the need to execute the task.
Next is the TimeTask interface, which has only one method, run, whose parameter type is Timeout. We notice that the parameter returned by the newTimeout method of the Timer interface above is Timeout, which is the same as the input parameter here, so the Timeout parameter passed in here is the return value of newTimeout.
The relationship between the Timeout object and the TimerTask object is similar to the relationship between the Future object returned by the thread pool and the task objects submitted to the thread pool.
Finally, there is the TimeOut interface, which represents the processing of a task. There are several methods, each of which can be seen from the introduction, so I won't repeat them here.
The above interfaces logically form a task scheduling system. The following is the core of the task scheduling system, i.e. the implementation of the time wheel scheduler - HashedWheelTimer.
A close look at the comments on its class reveals that the method does not provide precise timing, but rather detects in each tick (that is, a time slot in the time wheel) whether there is a TimerTask whose expected execution time has fallen behind the current time, and executes the task if so. The accuracy of the task execution time can be improved by refining the time slots.
The default tick duration is 100 milliseconds. In most network applications, the I/O timeout does not have to be precise, e.g., a 5-second timeout, but it is actually possible to be a little bit later, so there is no need to change this default value.
This class maintains a data structure called "wheel", which is what we call a time wheel. Simply put, a wheel is a hash table whose hash function is the deadline for a task, that is, we use the hash function to put the task into the time slot it should be in, so that over time, when we enter a certain time slot, the task in the slot will also arrive at the time it should be executed.
This avoids the need to check whether all tasks need to be executed in each slot. In the constructor of HashedWheelTimer, the most important one is the createWheel method. Ignoring the basic parameter checking, we just look at the main flow of the method, first of all, it is the normalization of the number of time slots, which is handled by modifying the number of time slots passed during the constructor to be greater than or equal to the smallest power of two. Why this processing and the specific way of processing, interested can study the source code.
The next step is to create the time slot array, and the last step is to initialize each parameter of the time slot array.
The following is an introduction to the newTimeout method, the main role of this method is to add a pending task to the scheduler, again ignoring the basic parameter checksum, the main flow is:
- The first step +1s the number of tasks waiting to be scheduled, and -1s and throws an exception if the maximum limit is exceeded.
- The second step calls the start method, which starts the time wheel.
- The third step calculates the deadline for the current task and does the overflow prevention.
- Constructs a TimeOut and puts it in the wait queue.
Here we expand the more important start method, first we get the running state of the worker, if it's initialized, we update it to the started state and start the workThread, if it's in any other state, we do something else accordingly. The next step is to wait for the workThread to finish initializing the startTime (which is done in the worker's run method). The reason why we need to wait for the startTime to finish initializing is that in the newTimeout method, the startTime is also used in the start method, and if we don't do this, there will be a problem in calculating the deadline of the task. If you don't do this, there will be problems with the task's deadline calculation.
So far, we've covered the main process of adding a task using HashedWheelTimer, followed by the inner workings of the time wheel.
First is the internal class Worker of HashedWheelTimer, in which the main flow of the run method is as follows:
1. Initialize startTime, which corresponds to the internal start method above. After initialization, we use the closure CountDownLatch to notify the waiting thread to go down.
2. When the timer is in the started state, keep advancing the ticket, the process of advancing is broken down into:
- Wait for the next ticket to arrive.
- When the ticket arrives, the slot of the time round corresponding to the ticket is calculated (modulo operation).
- Processes a queue of canceled tasks.
- Gets the current time slot and puts the tasks in the pending task queue into the slot.
- Executes the task in the current time slot.
3. If the time wheel has been stopped, perform the following process:
- Clears unprocessed task schedules from all time slots.
- Clears the pending task scheduling queue and adds the uncanceled ones to the Unprocessed collection.
- Processes a queue of canceled tasks.
We focus on the timer start state of the third step, get the current time slot, and the pending task queue of tasks into the slot method transferTimeoutsToBuckets, the process for the following steps (here the provisions of the loop a finite number of times, to prevent the pending queue is too large, resulting in the current add to the slot takes too long):
- The first task from the pending task scheduling queue is taken and calibrated.
- A slot is calculated based on the taken out pending task scheduling.
- Set the number of laps left for this task to schedule (you can see here that Dubbo is using the "time wheel with round" that we introduced in 2.2.2).
- Take the larger of the calculated slot and the current slot and take the mold.
- Add this task schedule to the corresponding slot.
Summary: In this section, we briefly introduced the implementation of the time wheel in Dubbo from the main process of adding tasks to the scheduler and the inner workings of the time wheel.
If you are interested, you can study the source code, which contains many code design is very clever, such as startTime initialization and initialization is completed after the implementation of inter-thread communication, these design ideas for beginners like me is very beneficial.
Chapter 4 Perspectives on the Application of the Time Wheel Algorithm
I just started working, designed a service called download center, the function of this service for the export and download of data files in the project, the actual positioning is to reduce the asynchronous thread is too much and affect the various core business, so the function will be extracted, so as to achieve the goal of reducing the pressure on the core business.
The initial design of the download center, taking into account concurrent requests as well as the memory overflow problem brought about by oversized files, in addition to taking various ways to avoid, the overall idea is to anticipate particularly large files, the first task record for persistence, and through the background thread pool to slowly execute these tasks, through the task record, the initiative to pull data, generate files, etc..
The model is similar to the BOSS-WORKER pattern in Netty, where the BOSS thread is responsible for checking out unconsumed tasks from the database at regular intervals and assigning them to a pool of worker threads to consume, as shown in Figure 4.1.
Figure 4.1 Schematic diagram of the application service task consumption pattern before modification
While the current design can do a good job of preventing problems such as memory overflows, there are some drawbacks to such a design:
- If there is a subsequent increase in the number of users, the number of services can be considered to be expanded horizontally, but the database used to persist the task records will become a bottleneck.
- Even though it is not difficult for BOSS threads to avoid duplicate consumption of tasks, the query efficiency of the pending tasks will be greatly reduced.
- The whole service is too dependent on BOSS threads.
Therefore, to consider an alternative to this BOSS-WORKER model, one of the approaches that comes to mind is the MQ message queue, where the task information to be executed is cast to the MQ queue and then consumed by the service. This approach solves the above three problems and has the advantage of maintaining the order of tasks.
However, the problem of memory overflow still exists, so consider limiting the number of concurrent threads consuming tasks. If this number is exceeded, the task is no longer consumed and is re-delivered to the MQ queue. Here, we have a better approach, i.e., when we need to re-deliver tasks to the MQ queue, we do some delay processing to prevent repeated re-delivery of tasks. The overall flow is shown in Figure 4.2:
In the figure, the green module is the task scheduling module for scheduling timed tasks in the whole system design, using the time wheel to manage these timed tasks in a unified way.
For both short-delay and long-delay messages, we expect the delay to be as accurate as possible, and for long-delay messages, we also want to persist them, that is, staging them. When the message is about to expire, it is taken out again and delivered.
And the persistence of such long-latency messages is consistent with the bottleneck we encountered with the timed fetch task from the database shown in Figure 4.1.
We would prefer a mature framework that provides 1. persistence of long-latency tasks and 2. task scheduling capabilities. From the MQ provided by the middleware platform group, we found that it currently supports deferred MQ functionality that includes these two capabilities. The delayed MQ architecture is shown in the following diagram:
The flow of sending and processing a specific delayed message is shown below:
In fact, the implementation of this deferred MQ, with scheduling implemented by the time wheel and persistence implemented using the MongoDB database, is exactly what we expected and meets our needs.
summarize
In this paper, the time wheel algorithm is introduced starting from the origin of timed tasks and time wheel algorithms. The algorithmic idea of time wheel is elaborated in detail, as well as three common time wheel models, namely, simple time wheel, time wheel with round, and hierarchical time wheel, and the corresponding data structure implementations are given.
Then the application of the time wheel model in Dubbo is introduced as an example, and the main implementation of the algorithm in Dubbo is described from the source code.
Finally, we introduce a small module that the author himself has worked on, expand and analyze the bottlenecks currently encountered in the module's functionality, and give ideas for optimizing the current design by incorporating a delayed MQ with a time-wheel algorithm.
References:
Hashed and Hierarchical Timing Wheels: EfficientData Structures for Implementing a Timer Facility.