Location>code7788 >text

Optimization techniques for a large table of tens of millions

Popularity:807 ℃/2025-03-31 10:06:47

Hello everyone, I am Su San, and I’m meeting you again.

 

Preface

Big table optimization is a cliché topic, but as the business scale grows, there are always people who will "get hit".

Many friends' databases perform well at the beginning and the queries are smooth, but once the amount of data in the table reaches tens of millions, performance problems begin to emerge: slow querying, writing cards, paging drags, and even occasionally crashing directly.

At this time, you may wonder, is the database not working? Do you need to upgrade to stronger hardware?

In fact, in many cases, the fundamental problem lies inNot optimized properly

Today, we willThe nature of the problemLet’s start by gradually analyzing the common performance bottlenecks in large tables and how to optimize them step by step. I hope it will be helpful to you.

1 Why are big tables slow?

Before doing optimization, first understand the root cause of the performance problem of large tables. Why is the database slower if the amount of data is large?

1.1 Disk IO Bottleneck

Data in large tables are stored on disk, and database queries usually involve reading data blocks.

When the amount of data is large, a single query may require reading a large amount of data from multiple disk blocks, and the disk read and write speed will directly limit the query performance.

For example:

Suppose there is an order formorders, 50 million pieces of data are stored in it, and you want to query a user's last 10 orders:

SELECT * FROM orders WHERE user_id = 123 ORDER BY order_time DESC LIMIT 10;

If there is no index, the database will scan all data in the entire table and then sort it, and the performance will definitely be bad.

1.2 Index invalid or no index

If the table's query does not hit the index, the database will performFull table scan(Full Table Scan), that is, read all the data in the table row by row.

This kind of operation consumes resources very much under tens of millions of data, and its performance will drop sharply.

For example:

For example, when you query, you wrote the following conditions:

SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';

Used hereDATE()Functions, databases need to all recordsorder_timeThe fields are calculated, causing the index to fail.

1.3 Paging performance degraded

Pagination query is a common scenario in large tables, but deep paging (such as after page 100) can cause performance problems.

Even if you only need 10 pieces of data, the database still needs to scan out all previous records first.

For example:

Query 10 pieces of data on page 1000:

SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;

This SQL actually lets the database fetch the first 9990 pieces of data first, then throw it away, and then return the next 10 pieces.

As the page number increases, the performance of query will become worse and worse.

1.4 Lock contention

In high concurrency scenarios, multiple threads perform addition, deletion, modification and search operations on the same table at the same time, which will lead to competition for row locks or table locks, which will affect performance.

2 Overall idea of ​​performance optimization

The essence of performance optimization isReduce unnecessary IO, computing and lock competition, the goal is to make the database do as little "useless" as possible.

The overall optimization idea can be summarized as follows:

  1. Table structure design should be reasonable: Try to avoid unnecessary fields, if the data can be split, split if it can be split.
  2. Indexing should be efficient: Design a reasonable index structure to avoid index failure.
  3. SQL needs to be optimized: Accurate query conditions and minimize full table scanning.
  4. Library and table: Reduce the amount of single table data through horizontal splitting and vertical splitting.
  5. Cache and asynchronous: Reduce direct pressure on the database.

Next, we unfold one by one.

3 Table structure design optimization

Table structure is the basis for database performance optimization. Designing an unreasonable table structure will lead to subsequent query and storage performance problems.

3.1 Simplified field types

The type of field determines the size of the storage and the performance of the query.

  • Can be usedINTDon't use itBIGINT
  • Can be usedVARCHAR(100)Don't use itTEXT
  • Recommended time fieldsTIMESTAMPorDATETIME, don't use itCHARorVARCHARSave time.

For example:

-- Not recommended
 CREATETABLE orders (
 idBIGINT,
 user_id BIGINT,
 order_status VARCHAR(255),
 remarks TEXT
 );

 -- After optimization
 CREATETABLE orders (
 idBIGINT,
 user_id INTUNSIGNED,
 order_status TINYINT, -- state is represented by enumeration
 remarks VARCHAR(500)--Limit maximum length
 );

This saves storage space and is more efficient when querying.

If you are interested in table design, you can check out another article from me before18 Catch Rules for Table Design》, with a detailed introduction.

3.2 Table splitting: vertical splitting and horizontal splitting

Vertical split

When there are too many fields in the table and some fields are not frequently queried, the table can be split into multiple small tables according to business logic.

Example: Divide the order table into two tables:orders_basicandorders_details

-- Basic information table
 CREATETABLE orders_basic (
   idBIGINT PRIMARY KEY,
 user_id INTUNSIGNED,
 order_time TIMESTAMP
 );

 -- Details table
 CREATETABLE orders_details (
   idBIGINT PRIMARY KEY,
 remarks VARCHAR(500),
 shipping_address VARCHAR(255)
 );

Horizontal split

When the amount of data in a single table is too large, it can be split into multiple tables according to certain rules.

Example: Suppose we split the order table horizontally by user ID:

orders_0 --Save orders with user_id % 2 = 0
 orders_1 -Save orders with user_id % 2 = 1

After splitting, the amount of data in each table will be greatly reduced, and the query performance will be significantly improved.

4 Index optimization

Indexing is the "first killer" of database performance optimization, but many people are not familiar with the use of indexes, which leads to performance drops instead of rising.

4.1 Create a suitable index

Create indexes for fields that are queried in high frequency, such as primary keys, foreign keys, and query condition fields.

Example:

CREATE INDEX idx_user_id_order_time ON orders (user_id, order_time DESC);

The above compound index can be accelerated at the same timeuser_idandorder_timequery.

4.2 Avoid index failure

  • Don't use functions or operations on index fields
    mistake:

    SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';

    optimization:

    SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00'
      AND order_time < '2023-01-02 00:00:00';
  • Note the implicit type conversion
    mistake:

    SELECT * FROM orders WHERE user_id = '123';

    optimization:

    SELECT * FROM orders WHERE user_id = 123;

If you are interested in index failure problems, you can check out another article from me beforeTalking about 10 scenarios of index failure, it's too crap》, with a detailed introduction.

5 SQL optimization

5.1 Reduce query fields

Query only the required fields to avoidSELECT *

-- mistake
 SELECT * FROM orders WHERE user_id = 123;

 -- optimization
 SELECT id, order_time FROM orders WHERE user_id = 123;

5.2 Pagination optimization

When in-depth paging, use the "delay cursor" method to avoid scanning too much data.

-- Deep paging (poor performance)
 SELECT * FROM order ORDER BY order_time DESC LIMIT 9990, 10;

 -- Optimization: Use cursors
 SELECT * FROM order WHERE order_time < '2023-01-01 12:00:00'
   ORDER BY order_time DESC LIMIT 10;

If you are interested in SQL optimization, you can check out another article from me beforeTalk about 15 tips for SQL optimization》, with a detailed introduction.

6 library and table

6.1 Horizontal library division table

When the performance needs cannot be met after a single table is split, the data can be dispersed into multiple databases through the database split.

Common rules for storing and tables:

  • Take the model by user ID.
  • Partition by time.

If you are interested in the database and table division, you can check out another article from me beforeAlibaba’s second side: Why should we divide libraries and tables?》, with a detailed introduction.

7 Cache and asynchronous

7.1 Use Redis to cache hotspot data

Data queried for high-frequency can be stored in Redis to reduce direct access to the database.

Example:

// Read data from cache
 String result = ("orders:user:123");
 if (result == null) {
 result = ("SELECT * FROM orders WHERE user_id = 123");
 ("orders:user:123", result, 3600); // Set cache for 1 hour
 }

7.2 Use message queues to handle write operations asynchronously

When writing in high concurrently, you can put the write operation into a message queue (such as Kafka) and then write it to the database in batches asynchronously to reduce the pressure on the database.

If you are interested in some of Kafka's problems, you can check out another article in my previous articleSome unusual pitfalls I've stepped on with kafka for two years》, with a detailed introduction.

8 Practical cases

question:

The order table of an e-commerce system stores 50 million records. When users query order details, the page loading time exceeds 10 seconds.

Solution:

  1. Vertical split order table: Split the order details field into another table.
  2. Create a composite index:foruser_idandorder_timeCreate an index.
  3. Using Redis Cache: Cache orders from the last 30 days into Redis.
  4. Pagination optimization:usesearch_afterreplaceLIMITDeep pagination.

Summarize

Large table performance optimization is a systematic project that requires comprehensive consideration from table structure, index, SQL to architectural design.

The amount of data at the tens of millions seems huge, but through reasonable splitting, indexing design and caching strategies, the database can be easily dealt with.

Most importantly,Choose appropriate optimization strategies based on business characteristics and never blindly pursue "high-end" solutions

Hope these experiences can help you!

 

Finally, I would like to say (please pay attention, don't mess with me for free)

If this article is helpful or inspiring to you, please help me follow my official account of the same name: Su San Talks about Technology. Your support is my greatest motivation for persisting in writing.

Please ask for three consecutive one click: like, forward, and watch.

Follow the official account: [Su San Talks about Technology], reply in the official account: When you enter a large factory, you can get the 100,000-word interview book I have compiled for free. Many friends have obtained offers from many large factories through this book.