Preface
Large table optimization is a commonplace topic, but as the business scale grows, there will always be people who will "fall into the trap".
Many partners' databases performed well at the beginning, and queries were smooth. However, once the amount of data in the table reached tens of millions, performance problems began to emerge, such as slow queries, stuck writing, delayed paging, and even occasional direct Downtime. this
At this time, you may wonder, is the database not good? Do I need to upgrade to stronger hardware?
In fact, in many cases, the fundamental problem lies inNot optimized。
Today, we will start fromNature of the problemLet’s start by gradually analyzing the common performance bottlenecks of large tables and how to optimize them step by step.
I recently open sourced a mall project based on SpringBoot+Vue+uniapp. It has many technical highlights. Welcome to visit and star. [/dvsusan/susan_mall]
1. Why are large watches slow?
Before optimizing, first understand the root cause of large table performance problems. The amount of data is huge, why is the database slow?
1. Disk IO bottleneck
The data of large tables is stored on disk, and database queries usually involve reading data blocks.
When the amount of data is large, a single query may need to read a large amount of data from multiple disk blocks, and the read and write speed of the disk will directly limit the query performance.
Example:
Suppose there is an order formorders
, which stores 50 million pieces of data. You want to query the last 10 orders of a certain user:
SELECT * FROM orders WHERE user_id = 123 ORDER BY order_time DESC LIMIT 10;
If there is no index, the database will scan all the data in the entire table and then sort it, which will definitely slow down the performance.
2. Invalid index or no index
If the query for the table does not hit the index, the database willFull table scan(Full Table Scan), that is, reading all the data in the table row by row.
This kind of operation consumes a lot of resources when dealing with tens of millions of data, and performance will drop sharply.
Example:
For example, if you write a condition like this when querying:
SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';
used hereDATE()
function, the database needs to store all recordsorder_time
Fields are calculated, resulting in index failure.
3. Decreased paging performance
Paging queries are a very common scenario in large tables, but deep paging (such as after page 100) can cause performance problems.
Even if you only need 10 pieces of data, the database still needs to scan out all previous records first.
Example:
Query 10 pieces of data on page 1000:
SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;
This SQL actually asks the database to fetch the first 9990 pieces of data, then discard them, and then return the next 10 pieces.
As the page number increases, query performance will become worse and worse.
4. lock contention
In high-concurrency scenarios, multiple threads performing addition, deletion, modification, and query operations on the same table at the same time will lead to contention for row locks or table locks, thereby affecting performance.
2. The overall idea of performance optimization
The essence of performance optimization isReduce unnecessary IO, calculation and lock competition, the goal is to make the database do as little "waste work" as possible.
The overall idea of optimization can be summarized as the following points:
- Table structure design must be reasonable: Try to avoid unnecessary fields, and split the data if it can be split.
- Indexing needs to be efficient: Design a reasonable index structure to avoid index failure.
- SQL needs to be optimized: Query conditions are precise and full table scans are minimized.
- Sub-database and sub-table: Reduce the amount of data in a single table through horizontal splitting and vertical splitting.
- Caching and asynchronization: Reduce direct pressure on the database.
Next, we will expand one by one.
3. Table structure design optimization
Table structure is the basis for database performance optimization. An unreasonably designed table structure will lead to subsequent query and storage performance problems.
1. Simplified field types
The type of field determines storage size and query performance.
- Can be used
INT
Don't useBIGINT
。 - Can be used
VARCHAR(100)
Don't useTEXT
。 - It is recommended to use the time field
TIMESTAMP
orDATETIME
, do not useCHAR
orVARCHAR
Come save time.
Example:
-- Not recommended
CREATE TABLE orders (
id BIGINT,
user_id BIGINT,
order_status VARCHAR(255),
remarks TEXT
);
-- After optimization
CREATE TABLE orders (
id BIGINT,
user_id INT UNSIGNED,
order_status TINYINT, -- the status is represented by an enumeration
remarks VARCHAR(500) -- limit the maximum length
);
This saves storage space and makes querying more efficient.
2. Table splitting: vertical splitting and horizontal splitting
split vertically
When there are too many fields in the table, and some fields are not frequently queried, the table can be split into multiple small tables according to business logic.
Example:
Split the order table into two tables:orders_basic
andorders_details
。
--Basic information table
CREATE TABLE orders_basic (
id BIGINT PRIMARY KEY,
user_id INT UNSIGNED,
order_time TIMESTAMP
);
--Details table
CREATE TABLE orders_details (
id BIGINT PRIMARY KEY,
remarks VARCHAR(500),
shipping_address VARCHAR(255)
);
split horizontally
When the amount of data in a single table is too large, it can be split into multiple tables according to certain rules.
Example:
Suppose we split the orders table horizontally by user ID:
orders_0 -- save orders with user_id % 2 = 0
orders_1 -- save orders with user_id % 2 = 1
After splitting, the amount of data in each table is greatly reduced, and query performance will be significantly improved.
4. Index optimization
Indexes are the "first killer" for database performance optimization, but many people are not familiar with the use of indexes, causing performance to decrease instead of increasing.
1. Create appropriate indexes
Create indexes for frequently queried fields, such as primary keys, foreign keys, and query condition fields.
Example:
CREATE INDEX idx_user_id_order_time ON orders (user_id, order_time DESC);
The above composite index can speed up bothuser_id
andorder_time
query.
2. Avoid index failure
-
Do not use functions or operations on indexed fields。
mistake:SELECT * FROM orders WHERE DATE(order_time) = '2023-01-01';
optimization:
SELECT * FROM orders WHERE order_time >= '2023-01-01 00:00:00' AND order_time < '2023-01-02 00:00:00';
-
Be aware of implicit type conversions。
mistake:SELECT * FROM orders WHERE user_id = '123';
optimization:
SELECT * FROM orders WHERE user_id = 123;
5. SQL optimization
1. Reduce query fields
Only query the required fields to avoidSELECT *
。
-- mistake
SELECT * FROM orders WHERE user_id = 123;
-- optimization
SELECT id, order_time FROM orders WHERE user_id = 123;
2. Pagination optimization
When deep paging, use the "delayed cursor" method to avoid scanning too much data.
-- Deep paging (poor performance)
SELECT * FROM orders ORDER BY order_time DESC LIMIT 9990, 10;
-- Optimization: use cursors
SELECT * FROM orders WHERE order_time < '2023-01-01 12:00:00'
ORDER BY order_time DESC LIMIT 10;
6. Sub-library and sub-table
1. Horizontal sub-database and table
When a single table still cannot meet performance requirements after splitting, the data can be dispersed into multiple databases through database and table sharding.
Common sharding rules for databases and tables:
- Take the modulo by user ID.
- Partition by time.
7. Caching and Asynchronization
1. Use Redis to cache hotspot data
High-frequency query data can be stored in Redis to reduce direct access to the database.
Example:
//Read data from cache
String result = ("orders:user:123");
if (result == null) {
result = ("SELECT * FROM orders WHERE user_id = 123");
("orders:user:123", result, 3600); // Set cache for 1 hour
}
2. Use message queue to process write operations asynchronously
When writing with high concurrency, you can put write operations into a message queue (such as Kafka), and then write asynchronously to the database in batches to reduce database pressure.
8. Practical cases
question:
The order table of an e-commerce system stores 50 million records. When users query order details, the page loading time exceeds 10 seconds.
Solution:
- Split order table vertically: Split the order details fields into another table.
-
Create composite index:for
user_id
andorder_time
Create index. - Using Redis cache: Cache the orders of the last 30 days into Redis.
-
Pagination optimization:use
search_after
replaceLIMIT
Deep paging.
9. Summary
Large table performance optimization is a systematic project that requires all-round consideration from table structure, index, SQL to architecture design.
The amount of data in the tens of millions may seem huge, but through reasonable splitting, index design and caching strategies, the database can handle it easily.
The most important thing is,Choose appropriate optimization strategies based on business characteristics, and do not blindly pursue “high-end” solutions.。
Hope these experiences can help you!
The last thing to say (please pay attention to me, don’t prostitute me in vain)
If this article is helpful or inspiring to you, please help me follow my public account of the same name: Su San talks about technology. Your support is my biggest motivation to keep writing.
Please click three links: like, retweet, and watch.
Follow the public account: [Su San talks about technology], and reply in the public account: If you enter a big factory, you can get a free interview guide of 100,000 words that I recently compiled. Many friends have obtained offers from many big companies by relying on this guide.