Location>code7788 >text

Cross-pagination query after challenging database and table division

Popularity:342 ℃/2025-02-18 16:29:02

Imagine you have a library (single library and single table) with all books placed on the bookshelf in order. When you are looking for books 100-110, you can get it by counting directly to the 100th. But after the library's books exploded, the curator decided:

  1. Distribution: Divide the book into 10 rooms (10 databases)

  2. Table: Each room is divided into 20 bookshelves (20 tables)
    Each bookshelves only have books with specific rules (such as taking the model by ID:ID % 200

The problem is here
When the user asks to "arrange in reverse order of time and display the 1000-1010th data":

  • Bookshelf in each room are sorted independently

  • Can't directly know where the 1000th data in the global area is

Solution:

Imagine you manage 10 express sorting stations (distribution and table), each site has its own parcel number. Now we need to find out the 1000-1010 packages issued in the country:

  • Stupid method: Let all sites send the first 1010 packages to the headquarters, and select the required ones after the headquarters is sorted (global sorting method)

  • Smart way

    1. First ask the shipping time of the 1010th parcel on each site

    2. Time to find the 1010th parcel in the country

    3. Let each site send only packages later than this time and sort them again (quadratic query method)

  • The lazy but effective way: Copy all the package information to the headquarters’ smart blackboard (Elasticsearch) and check the blackboard at any time!

The following are the solutions, advantages, disadvantages and practices:

plan Applicable scenarios Difficulty in technology implementation shortcoming
Global sorting method Small data volume, few shards Low Poor performance, deep paging explosion
Quadratic query method There are indexes in the sorting field, and there are many shards middle Need two inquiries
Cursor paging method There are globally ordered fields (such as ID, timestamp) Low Random page jump is not supported
Search engine assistance High-frequency complex query, tolerating second-level delay high

High maintenance costs and data delays

(1) Global sorting method (simple but inefficient)

Implementation steps

  1. Each shard is executed independentlyORDER BY create_time DESC LIMIT 0, 1010

  2. Summarize the results of all shards into a central node

  3. The central node sorts all data globally, and then selects items 1000-1010.

shortcoming

  • The more shards, the more exponentially the data transmission volume increases

  • Performance disasters when deep paging (offset is large) (for example: offset=1 million)

(2) Quadratic query method (optimization performance)

Core idea: Avoid full data transmission, narrow the scope in two queries

Step Example(Query Articles 1000-1010):

  1. First query: Each shard returns the first 1010 data piecesMinimum time stamp

    • For example, the minimum timestamp of shard 1 is T1, and shard 2 is T2...

  2. Take the 1010th smallest global timestamp in all shards T_global

  3. Second query: Query of each fragmentcreate_time >= T_globalData of

  4. After summarizing, sort the final result again

advantage

  • Significantly reduce data transmission

  • Suitable for the case where the sorting field has an index (such as a timestamp)

(3) Cursor paging method (avoid deep paging)

Core idea: Use continuous and unique fields (such as self-increment ID, timestamp) as cursors

Advantages

  • Avoid completelyoffsetPerformance issues

  • Naturally suitable for library and table division, each fragment needs to be filtered according to the cursor conditions

limit

  • There must be a globally ordered and unique field

  • Users cannot jump directly to any page number (such as page 1000)

(4)Search Engine Assistance (Ultimate Weapon)

Applicable scenarios: High frequency complex paging query
Implementation method

  1. Synchronize the data of the database and tables to Elasticsearch/Solr in real time

  2. All paging queries go directly to the search engine

  3. Search engine internal maintenance global sorting

Advantages

  • Completely block the complexity of library and tables

  • Support complex condition filtering + high concurrent query

cost

  • Additional systems are required to be maintained, and data consistency delays are increased

Suggestions in actual development

  1. Priority to cursor paging

    • For example, the waterfall flow page of the App will only continue to decline and do not need to jump to pages

  2. Limit maximum paging depth

    • For example, you can view the first 100 pages at most to prevent malicious deep paging attacks

  3. Combined with business optimization

    • When e-commerce sorts by price, the price can be segmented in advance (0-100 yuan, 100-200 yuan...)

  4. The ultimate solution

    • The combination of library and table + Elasticsearch, sacrifices certain real-time quality for high-performance query