Interviewer: limit 100w,10 Why is it slow? How to optimize?

In MySQL, theIn a limit X,Y query, the larger the X value, the slower the query will be.For example, the following example:

limit 0,10: the query time is around 20 milliseconds.
limit 1000000,10: The query time may be around 15 seconds (1 second equals 1000 milliseconds) or even longer.

So, it can be seen that the larger the value of X in limit, then the slower the query are.

This problem is actually a typical deep paging problem in MySQL.The question is, why does the limit get slower as it goes on? The question is, why is limit slower and slower the further the query goes? How to optimize the query speed?

Why is LIMIT getting slower and slower?

In a database query, when using a LIMIT x, y paging query, the query may become slower if the x value is larger.This is mainly due to the fact that the database needs to scan and skip x records to return y results. As x increases, the number of records that need to be scanned and skipped increases, resulting in a performance degradation.

For example, limit 1000000,10 needs to scan 1000010 rows of data and then discard the previous 1000000 rows, so the query will be very slow.

Optimization tools

There are two typical optimizations for MySQL deep paging:

Starting ID Positioning Method: Use the ID of the last query as the ID of the starting query.
Index Coverage + Subqueries。

1. Starting ID positioning method

The starting ID positioning method means to specify the starting ID of the limit query, which is the last ID of the previous query, for example, if the ID of the last data of the previous query is 6800000, then we will start scanning the table from 6800001 and skip the previous 6800000 data, so that the query efficiency will be high, and the specific implementation of the SQL is as follows:

select name, age, gender
from person
where id > 6800000 -- Core Realization SQL
order by id limit 10;

where the id field is the primary key field of the table.

Why is the starting ID query efficient?

Therefore, this kind of query is the last ID of the last query as the starting ID for the query, and the ID of the last query has already been located in a specific location, so you only need to traverse the bi-directional chained table of the leaf nodes of the B+ (the underlying data structure of the primary key index) in order to query the next data, so the query is more efficient, as shown in the following figure:

If the result of the last query is 9, then when you query again, you only need to iterate through N pieces of data after 9 to get the result, so it is very efficient.

Advantages and disadvantages analysis

This kind of query is only suitable for one-page data query, such as the kind of waterfall flow when you brush the news in the cell phone APP.

However, if the user is skipping paging, for example, after querying page 1, directly querying page 250, then this implementation will not work.

2. Index coverage + subqueries

At this point we can use index coverage plus subqueries for query efficiency, the specific implementation is as follows.

Let's say that our unoptimized SQL looks like this:

select name, age, gender
from person
order by createtime desc 
limit 1000000,10;

In the above SQL, the createtime field is indexed, but the query efficiency is still slow because it takes out 100w complete data and requires theReading a large number of index pages, and performing frequent table lookups back, so the implementation will be inefficient.

At this point, we can do the following optimization:

SELECT , , 
FROM person p1
JOIN (
    SELECT id FROM person ORDER BY createtime desc LIMIT 1000000, 10
) AS p2 ON  = ;

Compared to the pre-optimization SQL, theOptimized SQL will eliminate the need for frequent table lookupsBecause only the primary key ID is queried in the subquery, this can be accomplished using index coverage. Then the subquery can first query a small portion of the primary key ID, and then query, which can greatly improve the efficiency of the query.

Index Coverage is a database query optimization technique that means that when executing a query, the database engine can get all the data it needs directly from the index, without having to go back to the table (accessing either the primary key index or the actual rows of data in the table) to get additional information. This approach reduces disk I/O operations and thus improves query performance.

Post-lesson Reflections

What other deep pagination optimization tools do you know? Feel free to leave your answers in the comments section.

This article has been included in my interview mini-site, which contains modules such as Redis, JVM, Concurrency, Concurrency, MySQL, Spring, Spring MVC, Spring Boot, Spring Cloud, MyBatis, Design Patterns, Message Queuing and more.