How to optimize the performance of a large table with tens of millions of tables?

Preface

Large table optimization is a commonplace topic, but as the business scale grows, there will always be people who will "fall into the trap".

Many partners' databases performed well at the beginning, and queries were smooth. However, once the amount of data in the table reached tens of millions, performance problems began to emerge, such as slow queries, stuck writing, delayed paging, and even occasional direct Downtime. this

At this time, you may wonder, is the database not good? Do I need to upgrade to stronger hardware?

In fact, in many cases, the fundamental problem lies inNot optimized。

Today, we will start fromNature of the problemLet’s start by gradually analyzing the common performance bottlenecks of large tables and how to optimize them step by step.

I recently open sourced a mall project based on SpringBoot+Vue+uniapp. It has many technical highlights. Welcome to visit and star. [/dvsusan/susan_mall]

1. Why are large watches slow?

Before optimizing, first understand the root cause of large table performance problems. The amount of data is huge, why is the database slow?

1. Disk IO bottleneck

The data of large tables is stored on disk, and database queries usually involve reading data blocks.

When the amount of data is large, a single query may need to read a large amount of data from multiple disk blocks, and the read and write speed of the disk will directly limit the query performance.

Example:

Suppose there is an order formorders, which stores 50 million pieces of data. You want to query the last 10 orders of a certain user:

SELECT * FROM orders WHERE user_id = 123 ORDER BY order_time DESC LIMIT 10;

If there is no index, the database will scan all the data in the entire table and then sort it, which will definitely slow down the performance.

2. Invalid index or no index

If the query for the table does not hit the index, the database willFull table scan(Full Table Scan), that is, reading all the data in the table row by row.

This kind of operation consumes a lot of resources when dealing with tens of millions of data, and performance will drop sharply.

Example:

For example, if you write a condition like this when querying:

SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';

used hereDATE()function, the database needs to store all recordsorder_timeFields are calculated, resulting in index failure.

3. Decreased paging performance

Paging queries are a very common scenario in large tables, but deep paging (such as after page 100) can cause performance problems.

Even if you only need 10 pieces of data, the database still needs to scan out all previous records first.

Example:

Query 10 pieces of data on page 1000:

SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;

This SQL actually asks the database to fetch the first 9990 pieces of data, then discard them, and then return the next 10 pieces.

As the page number increases, query performance will become worse and worse.

4. lock contention

In high-concurrency scenarios, multiple threads performing addition, deletion, modification, and query operations on the same table at the same time will lead to contention for row locks or table locks, thereby affecting performance.

2. The overall idea of performance optimization

The essence of performance optimization isReduce unnecessary IO, calculation and lock competition, the goal is to make the database do as little "waste work" as possible.

The overall idea of optimization can be summarized as the following points:

Table structure design must be reasonable: Try to avoid unnecessary fields, and split the data if it can be split.
Indexing needs to be efficient: Design a reasonable index structure to avoid index failure.
SQL needs to be optimized: Query conditions are precise and full table scans are minimized.
Sub-database and sub-table: Reduce the amount of data in a single table through horizontal splitting and vertical splitting.
Caching and asynchronization: Reduce direct pressure on the database.

Next, we will expand one by one.

3. Table structure design optimization

Table structure is the basis for database performance optimization. An unreasonably designed table structure will lead to subsequent query and storage performance problems.

1. Simplified field types

The type of field determines storage size and query performance.

Can be usedINTDon't useBIGINT。
Can be usedVARCHAR(100)Don't useTEXT。
It is recommended to use the time fieldTIMESTAMPorDATETIME, do not useCHARorVARCHARCome save time.

Example:

-- Not recommended
 CREATE TABLE orders (
     id BIGINT,
     user_id BIGINT,
     order_status VARCHAR(255),
     remarks TEXT
 );

 -- After optimization
 CREATE TABLE orders (
     id BIGINT,
     user_id INT UNSIGNED,
     order_status TINYINT, -- the status is represented by an enumeration
     remarks VARCHAR(500) -- limit the maximum length
 );

This saves storage space and makes querying more efficient.

2. Table splitting: vertical splitting and horizontal splitting

split vertically

When there are too many fields in the table, and some fields are not frequently queried, the table can be split into multiple small tables according to business logic.

Example：
Split the order table into two tables:orders_basicandorders_details。

--Basic information table
 CREATE TABLE orders_basic (
     id BIGINT PRIMARY KEY,
     user_id INT UNSIGNED,
     order_time TIMESTAMP
 );

 --Details table
 CREATE TABLE orders_details (
     id BIGINT PRIMARY KEY,
     remarks VARCHAR(500),
     shipping_address VARCHAR(255)
 );

split horizontally

When the amount of data in a single table is too large, it can be split into multiple tables according to certain rules.

Example：
Suppose we split the orders table horizontally by user ID:

orders_0 -- save orders with user_id % 2 = 0
 orders_1 -- save orders with user_id % 2 = 1

After splitting, the amount of data in each table is greatly reduced, and query performance will be significantly improved.

4. Index optimization

Indexes are the "first killer" for database performance optimization, but many people are not familiar with the use of indexes, causing performance to decrease instead of increasing.

1. Create appropriate indexes

Create indexes for frequently queried fields, such as primary keys, foreign keys, and query condition fields.

Example:

CREATE INDEX idx_user_id_order_time ON orders (user_id, order_time DESC);

The above composite index can speed up bothuser_idandorder_timequery.

2. Avoid index failure

Do not use functions or operations on indexed fields。
mistake:

SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';

optimization:

SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00'
  AND order_time < '2023-01-02 00:00:00';

Be aware of implicit type conversions。
mistake:

SELECT * FROM orders WHERE user_id = '123';

optimization:

SELECT * FROM orders WHERE user_id = 123;

5. SQL optimization

1. Reduce query fields

Only query the required fields to avoidSELECT *。

-- mistake
 SELECT * FROM orders WHERE user_id = 123;

 -- optimization
 SELECT id, order_time FROM orders WHERE user_id = 123;

2. Pagination optimization

When deep paging, use the "delayed cursor" method to avoid scanning too much data.

-- Deep paging (poor performance)
 SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;

 -- Optimization: use cursors
 SELECT * FROM orders WHERE order_time < '2023-01-01 12:00:00'
   ORDER BY order_time DESC LIMIT 10;

6. Sub-library and sub-table

1. Horizontal sub-database and table

When a single table still cannot meet performance requirements after splitting, the data can be dispersed into multiple databases through database and table sharding.

Common sharding rules for databases and tables:

Take the modulo by user ID.
Partition by time.

7. Caching and Asynchronization

1. Use Redis to cache hotspot data

High-frequency query data can be stored in Redis to reduce direct access to the database.

Example:

//Read data from cache
 String result = ("orders:user:123");
 if (result == null) {
     result = ("SELECT * FROM orders WHERE user_id = 123");
     ("orders:user:123", result, 3600); // Set cache for 1 hour
 }

2. Use message queue to process write operations asynchronously

When writing with high concurrency, you can put write operations into a message queue (such as Kafka), and then write asynchronously to the database in batches to reduce database pressure.

8. Practical cases

question:

The order table of an e-commerce system stores 50 million records. When users query order details, the page loading time exceeds 10 seconds.

Solution:

Split order table vertically: Split the order details fields into another table.
Create composite index:foruser_idandorder_timeCreate index.
Using Redis cache: Cache the orders of the last 30 days into Redis.
Pagination optimization:usesearch_afterreplaceLIMITDeep paging.

9. Summary

Large table performance optimization is a systematic project that requires all-round consideration from table structure, index, SQL to architecture design.

The amount of data in the tens of millions may seem huge, but through reasonable splitting, index design and caching strategies, the database can handle it easily.

The most important thing is,Choose appropriate optimization strategies based on business characteristics, and do not blindly pursue “high-end” solutions.。

Hope these experiences can help you!

The last thing to say (please pay attention to me, don’t prostitute me in vain)

If this article is helpful or inspiring to you, please help me follow my public account of the same name: Su San talks about technology. Your support is my biggest motivation to keep writing.

Please click three links: like, retweet, and watch.

Follow the public account: [Su San talks about technology], and reply in the public account: If you enter a big factory, you can get a free interview guide of 100,000 words that I recently compiled. Many friends have obtained offers from many big companies by relying on this guide.

How to optimize the performance of a large table with tens of millions of tables?

Preface

1. Why are large watches slow?

1. Disk IO bottleneck

Example:

2. Invalid index or no index

Example:

3. Decreased paging performance

Example:

4. lock contention

2. The overall idea of ​​performance optimization

3. Table structure design optimization

1. Simplified field types

Example:

2. Table splitting: vertical splitting and horizontal splitting

split vertically

split horizontally

4. Index optimization

1. Create appropriate indexes

Example:

2. Avoid index failure

5. SQL optimization

1. Reduce query fields

2. Pagination optimization

6. Sub-library and sub-table

1. Horizontal sub-database and table

Common sharding rules for databases and tables:

7. Caching and Asynchronization

1. Use Redis to cache hotspot data

Example:

2. Use message queue to process write operations asynchronously

8. Practical cases

question:

Solution:

9. Summary

The last thing to say (please pay attention to me, don’t prostitute me in vain)

2. The overall idea of performance optimization