How to Optimize SQL Server with Too Much Data

Hello everyone, I am V brother. Speak a lot of databases, a small partner said, SQL Server also talk about ah, well, V brother to do an obedient porter, today to talk about SQL Server.

In SQL Server, when the amount of data increases, the performance of the database may be affected, resulting in slower queries, longer response times, and other problems. In order to cope with the large amount of data, the following are some commonly used optimization strategies and case details, writing and writing on the 15,000, the original is not easy, the first praise to see, to raise good habits:

1. Index optimization

Creating Indexes: Indexing can significantly improve query speed, especially when using theWHERE、JOIN cap (a poem)ORDER BY clause. Create appropriate indexes for frequently used query fields (especially filter condition fields).
Choosing the right type of index: Use Clustered Index and Non-clustered Index to optimize query performance. Aggregated indexes are used for sorting, range queries, etc., while non-aggregated indexes are used for queries with a single column or a combination of columns.
Avoid excessive indexing: While indexes can improve query performance, too many indexes can increase the cost of update, insert, and delete operations, so balance the number of indexes with performance.

In SQL Server, index optimization is an important tool to improve query performance. Here is a specific business scenario. Suppose we have a sales order system with the order tableOrders Index optimization needs to be done based on different query requirements.

business scenario

Query Requirement 1: PressCustomerID cap (a poem)OrderDate Check order information.
Query Request 2: PressProductID Check all related orders.
Query Requirement 3: Query the details of an order (via theOrderID）。

Based on these needs, we will provideOrders table to create an index and shows how to choose the right type of index.

1. Creation of tables`Orders`

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY, -- primary key index, automatically creates a clustered index
    CustomerID INT, -- customer ID
    OrderDate DATETIME, -- Order Date
    ProductID INT, -- Product ID
    TotalAmount DECIMAL(18, 2), -- total amount of the order
    Status VARCHAR(20) -- Order Status
);

2. Creation of indexes

2.1. Creating Clustered Indexes (Clustered Indexes)

Aggregate indexes are usually created based on primary keys or unique constraints. It stores the data in the order of the index, so that theOrderID Creating a clustered index on theOrderID Lookup queries.

-- OrderID is the primary key, and aggregated indexes are created by default.
-- so in this case there is no need to create additional aggregated indexes

2.2. Creating Non-clustered Indexes (Non-clustered Indexes)

insofar asCustomerID cap (a poem)OrderDate The query requirements for the combined field, for which we can create a composite unaggregated index. This speeds up the process of querying based on theCustomerID cap (a poem)OrderDate The query.

CREATE NONCLUSTERED INDEX idx_Customer_OrderDate
ON Orders (CustomerID, OrderDate);

Usage Scenarios: This index helps speed up the process of pressingCustomerID cap (a poem)OrderDate Performance of queries, especially when the amount of order data is large.

2.3. Creating Single-Column Non-Aggregated Indexes

For query requirement 2, if we need to pressProductID Find all related orders we can forProductID Create a single-column unaggregated index. This will improve query efficiency.

CREATE NONCLUSTERED INDEX idx_ProductID
ON Orders (ProductID);

Usage Scenarios: When querying all orders related to a product, query performance can be significantly improved with this index.

3. Remove redundant indexes

If it is found that a particular query frequently accesses multiple columns on which we have created multiple single-column indexes, this may lead to performance degradation. For example, creating multiple unaggregated indexes against a single column may reduce the efficiency of insert and update operations. To avoid this, you can periodically check and remove redundant indexes.

Suppose we findProductID cap (a poem)CustomerID often appear together in query conditions, we can consider removing theidx_ProductID index, instead create a combined index.

-- Remove redundant single-column indexes
DROP INDEX idx_ProductID ON Orders.

4. Query optimization

Now, assuming we have the following queries, we will show how we can optimize query performance using the created indexes.

4.1.`CustomerID` cap (a poem)`OrderDate` consult (a document etc)

-- utilization idx_Customer_OrderDate indexing
SELECT OrderID, ProductID, TotalAmount
FROM Orders
WHERE CustomerID = 1001 AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31';

4.2.`ProductID` consult (a document etc)

-- using the idx_ProductID index
SELECT OrderID, CustomerID, TotalAmount
FROM Orders
WHERE ProductID = 500;

4.3. Querying specific order details

-- Query by OrderID, using the default aggregated indexes.
SELECT CustomerID, ProductID, TotalAmount, Status
FROM Orders
WHERE OrderID = 123456.

5. Cautions

Maintenance costs for indexing: While indexes can significantly improve query performance, they can be used whenever aINSERT、UPDATE maybeDELETE The indexes also need to be maintained when the operation is performed. This will increase the cost of the operation. Therefore, indexes should not be too many and need to be optimized according to the query requirements.
Index Override: Try to create a covering index, i.e., the index contains all the columns needed for the query, so that you can avoid the query back to the table operation and improve the efficiency of the query.

To summarize.

By providingOrders table to create suitable indexes, we can significantly optimize the query performance. In index optimization, we need to consider the query requirement, index type (aggregated index, unaggregated index), number of indexes and their maintenance cost.

2. Query optimization

Optimizing SQL Queries: Ensure that SQL queries are as efficient as possible. Avoid usingSELECT *, instead selecting only the columns that are needed; avoiding duplicate calculations and minimizing subqueries.
Using the Execution Plan: Use the Execution Plan tool in SQL Server Management Studio (SSMS) to view the execution plan of a query and analyze and optimize the bottleneck portion of the query.
Avoid complex nested queries: Complex subqueries may cause performance problems, consider using joins (JOIN) to replace it.

Query optimization is the process of improving query performance by crafting SQL query statements and optimizing indexes. Based on the business scenario you provided, we will base an order system on aOrders table, showing several common approaches to query optimization.

business scenario

Suppose we have a sales order system.Orders The table includes the following fields:

OrderID: Order ID, primary key.
CustomerID: Customer ID.
OrderDate: Order Date.
ProductID: Product ID.
TotalAmount: Total amount of the order.
Status: Order status (e.g. paid, unpaid, etc.).

We have the following query requirements:

Queries all orders for a particular customer over a certain period of time.
Queries the sales of a product across all orders.
Queries the details of an order.
Query order information for multiple customers.

1. Query Optimization: Press`CustomerID` cap (a poem)`OrderDate` Check Orders

Query Request:

Queries all orders for a particular customer over a certain period of time.

Query statement:

SELECT OrderID, ProductID, TotalAmount, Status
FROM Orders
WHERE CustomerID = 1001
  AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31';

Optimization Recommendations:

Index Optimization: forCustomerID cap (a poem)OrderDate Create a composite index because this is a common query pattern. Compound indexes speed up queries based on both fields.

CREATE NONCLUSTERED INDEX idx_Customer_OrderDate
ON Orders (CustomerID, OrderDate);

Execution plan optimization:

utilizationEXPLAIN maybeSET STATISTICS IO ON to view the execution plan and confirm that the query is using an index.

2. Query Optimization: Press`ProductID` Check all related orders

Query Request:

Queries all orders for a particular product.

Query statement:

SELECT OrderID, CustomerID, TotalAmount, Status
FROM Orders
WHERE ProductID = 500;

Optimization Recommendations:

Index Optimization: forProductID Create an index because this field is often used as a query condition.

CREATE NONCLUSTERED INDEX idx_ProductID
ON Orders (ProductID);

Execution plan optimization:

Ensure that queries can utilizeidx_ProductID indexes to avoid full table scans.

3. Query optimization: query details of an order

Query Request:

Queries the details of an order.

Query statement:

SELECT CustomerID, ProductID, TotalAmount, Status
FROM Orders
WHERE OrderID = 123456;

Optimization Recommendations:

Index Optimization: becauseOrderID is the primary key field, SQL Server automatically creates the aggregated index. The queryOrderID field, the query will directly utilize the aggregated index.

-- Clustered indexes are automatically created, no need to create additional ones

Execution plan optimization:

Ensure that the query scans only one row of data by utilizing theOrderID Primary key index.

4. Query optimization: query order information for multiple customers

Query Request:

Query order information for multiple customers.

Query statement:

SELECT OrderID, CustomerID, ProductID, TotalAmount, Status
FROM Orders
WHERE CustomerID IN (1001, 1002, 1003);

Optimization Recommendations:

Index Optimization: forCustomerID Create indexes to quickly filter out orders from targeted customers.

CREATE NONCLUSTERED INDEX idx_CustomerID
ON Orders (CustomerID);

Execution plan optimization:

assureIN clause uses theidx_CustomerID indexes to optimize queries.

5. **Query optimization: avoid using`SELECT *`**

Query Request:

Query all fields (not recommended, usually used for debugging or checking table structure).

Query statement:

SELECT * FROM Orders;

Optimization Recommendations:

Clearly select the required columns: Avoid usingSELECT *, explicitly list the fields needed for the query to avoid reading unnecessary columns.

SELECT OrderID, CustomerID, TotalAmount FROM Orders;

6. Query optimization: use`JOIN` Perform multi-table queries

Query Request:

Query the order information of a particular customer and related product information. Suppose there is aProducts table containing theProductID cap (a poem)ProductName。

Query statement:

SELECT , , 
FROM Orders o
JOIN Products p ON  = 
WHERE  = 1001
  AND  BETWEEN '2024-01-01' AND '2024-12-31';

Optimization Recommendations:

Index Optimization: forOrders homogeneousCustomerID、OrderDate cap (a poem)ProductID Create a composite index for theProducts homogeneousProductID Create indexes to speed upJOIN Query.

CREATE NONCLUSTERED INDEX idx_Orders_Customer_OrderDate_Product
ON Orders (CustomerID, OrderDate, ProductID);

CREATE NONCLUSTERED INDEX idx_Products_ProductID
ON Products (ProductID);

Execution plan optimization:

Ensure that the execution plan uses theJOIN related indexes to avoid full table scans.

7. Query Optimization: Paged Queries

Query Request:

Queries customer orders within a certain time period and implements the paging feature.

Query statement:

SELECT OrderID, CustomerID, TotalAmount, Status
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY OrderDate
OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY;

Optimization Recommendations:

Index Optimization: Ensure that inOrderDate Having the right index on it can speed up the sorting operation.
utilizationOFFSET cap (a poem)FETCH statement to implement a paged query to avoid loading a large amount of data at once.

CREATE NONCLUSTERED INDEX idx_OrderDate
ON Orders (OrderDate);

8. Avoid too many subqueries

Query Request:

Queries the total amount of orders for a particular customer over a period of time.

Query statement:

SELECT CustomerID, 
       (SELECT SUM(TotalAmount) FROM Orders WHERE CustomerID = 1001 AND OrderDate BETWEEN '2024-01-01' AND '2024-12-31') AS TotalSpent
FROM Customers
WHERE CustomerID = 1001;

Optimization Recommendations:

Avoid subqueries: Try to avoid theSELECT statement with a subquery can be changed toJOIN maybeGROUP BY to increase efficiency.

SELECT , SUM() AS TotalSpent
FROM Orders o
WHERE  = 1001
  AND  BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY ;

To summarize.

By optimizing SQL query statements, using indexes wisely, and reducing unnecessary operations, we can significantly improve query performance. Specific practices include:

Create appropriate indexes (single column and compound).
Optimize query statements to avoid usingSELECT * and too many subqueries.
Use appropriate paging techniques andJOIN Optimize multi-table queries.
Analyze query execution plans to ensure efficient query execution.

These optimizations help SQL Server maintain efficient query performance in the face of large amounts of data.

3. Data partitioning and tabulation

table partitioning: For very large tables, consider using table partitioning. Table partitioning allows you to split the data into multiple physical files based on certain conditions (e.g., time, ID ranges, etc.), so that only the relevant partitions are accessed when querying, reducing the overhead of a full table scan.
Horizontal Splitting (Sharding): Spreads data across multiple independent tables or databases, usually based on some kind of rule (e.g., region, date, etc.). Each table contains a subset of the data and can be queried more efficiently.

Data partitioning (Partitioning) and table sharing (Sharding) are key means to optimize database performance, especially when dealing with large data volumes. By partitioning or splitting tables, you can effectively reduce the pressure of queries and writes, and improve the efficiency of data access. The following is a specific code example based on a business scenario showing how to use data partitioning and table sharing to optimize the performance of SQL Server.

business scenario

Suppose we have an order system.Orders table records all order information. As the volume of orders increases, it becomes increasingly difficult to query and maintain a single table. Therefore, we need to use partitioning and table splitting techniques to optimize database performance.

1. Data Partitioning

Data partitioning is logical partitioning on a single table, which allows a large table to be divided into multiple physical segments (partitions) according to some rule (e.g., time range, value interval, etc.). Each partition can be managed independently and queries can be performed within a specific partition, thus improving query performance.

business need

According to the order date (OrderDatewillOrders Table partitioning to quickly locate orders within a specific time period when querying.

Steps:

Create Partition Function and Partition Scheme.
existOrders Apply partitioning on the table.

Creating a Partition Function (Partition Function)

-- CREATE PARTITION FUNCTION: Partition by year
CREATE PARTITION FUNCTION OrderDatePartitionFunc (DATE)
AS RANGE RIGHT FOR VALUES ('2023-01-01', '2024-01-01', '2025-01-01');

This partitioning function will be based on the date of the order (OrderDate) divides the data into intervals, with the range of each interval divided by year.

Creating a Partition Scheme (Partition Scheme)

-- Creating a partition scheme: applying partition functions to physical file groups
CREATE PARTITION SCHEME OrderDatePartitionScheme
AS PARTITION OrderDatePartitionFunc
TO ([PRIMARY], [FG_2023], [FG_2024], [FG_2025]);

This scheme assigns a physical file group to each partition (e.g.PRIMARY、FG_2023 (etc.).

Creating a partitioned table

-- Creating a partitioned table：Application partitioning scheme
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
)
ON OrderDatePartitionScheme (OrderDate);

Orders statistical tables and figuresOrderDate fields are partitioned and the data is distributed to different physical file groups based on date.

Query Optimization

-- Query the orders in 2024, the query will only access the corresponding partition to improve query efficiency.
SELECT OrderID, CustomerID, ProductID, TotalAmount
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-12-31'.

By partitioning, the query will only scan the data in the relevant partition, thus increasing the speed of the query.

2. Data Sharding

Split-tabling is the horizontal splitting of data into multiple physical tables, each storing a portion of the data. Common split table strategies include splitting tables by range, splitting tables by hash, and so on. Splitting tables can significantly improve query performance, but requires managing multiple tables and their relationships.

business need

check or refer toCustomerID commander-in-chief (military)Orders The tables are subdivided and the data is assigned to different tables on the basis of customer ID.
The range of customer IDs is uniform, so we can use the hash-split-table strategy.

Steps:

Create multiple sub-tables.
The split table logic is handled at the application level.

Creating Subtables

Suppose we decide to putOrders statistical tables and figuresCustomerID hashes into 4 tables. You can create 4 split tables in the following way:

-- establish Orders_1 spreadsheet
CREATE TABLE Orders_1
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

-- establish Orders_2 spreadsheet
CREATE TABLE Orders_2
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

-- establish Orders_3 spreadsheet
CREATE TABLE Orders_3
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

-- establish Orders_4 spreadsheet
CREATE TABLE Orders_4
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

table partitioning logic

At the application layer, we need to implement a split-table routing logic that determines which table should be inserted into or queried for data via a hash value.

-- typical example：according to CustomerID hash value selection sub-table
DECLARE @CustomerID INT = 1001;
DECLARE @TableSuffix INT;

-- Use a hash algorithm to determine the table
SET @TableSuffix = @CustomerID % 4;

-- insert data
IF @TableSuffix = 0
BEGIN
    INSERT INTO Orders_1 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123456, 1001, '2024-01-01', 101, 150.00, 'Paid');
END
ELSE IF @TableSuffix = 1
BEGIN
    INSERT INTO Orders_2 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123457, 1002, '2024-01-02', 102, 250.00, 'Pending');
END
ELSE IF @TableSuffix = 2
BEGIN
    INSERT INTO Orders_3 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123458, 1003, '2024-01-03', 103, 350.00, 'Shipped');
END
ELSE
BEGIN
    INSERT INTO Orders_4 (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
    VALUES (123459, 1004, '2024-01-04', 104, 450.00, 'Delivered');
END

query logic

In order to query the orders of a particular customer, we also need to decide at the application level which sub-table to query:

-- Enquire about a customer's order
DECLARE @CustomerID INT = 1001;
DECLARE @TableSuffix INT;
SET @TableSuffix = @CustomerID % 4;

-- Query Data
IF @TableSuffix = 0
BEGIN
    SELECT * FROM Orders_1 WHERE CustomerID = @CustomerID;
END
ELSE IF @TableSuffix = 1
BEGIN
    SELECT * FROM Orders_2 WHERE CustomerID = @CustomerID;
END
ELSE IF @TableSuffix = 2
BEGIN
    SELECT * FROM Orders_3 WHERE CustomerID = @CustomerID;
END
ELSE
BEGIN
    SELECT * FROM Orders_4 WHERE CustomerID = @CustomerID;
END

3. Selection of Partitions and Tables

allocated area (for housing, industry etc): Applies to a table that is physically partitioned but still maintains the logical unity of the data. For example, partitioning by time (e.g., order date) can effectively improve the performance of time-range queries.
spreadsheet: For cases where the data volume is particularly large, the data is split into multiple tables to reduce the query pressure on a single table. Usually use hash table or range table.

To summarize.

allocated area (for housing, industry etc)Allows you to logically partition a large table and access only relevant partitions during queries, improving performance.
spreadsheetOn the other hand, the data is split horizontally into multiple physical tables, which is usually used in scenarios that deal with extremely large amounts of data.
Implementing partitioning and table splitting in SQL Server requires comprehensive consideration of table design, index design, and query strategy to ensure data access efficiency and ease of maintenance.

4. Data archiving

Archiving old data: For data that is no longer frequently queried, it can be archived to a separate historical table or database, thus reducing the burden on the primary database. Optimize query performance by keeping only recent data in the main table.
Compression of old data: Archived data can be stored using compression techniques to save storage space.

Data archiving is the process of removing historical data that is no longer frequently accessed from the primary database and storing it in an archiving system or table, thus improving the performance of the primary database. Data archiving is typically used for older data, historical records, and other data that is no longer active but needs to be retained.

business scenario

Suppose we have an order system.Orders The table records all order information. The amount of order data increases dramatically over time, but in real business, the frequency of querying order data beyond a certain period of time decreases. In order to improve the database performance, we decided to remove the order data older than 1 year from the master table and archive it to the archive table.

Steps:

Create a master table (Orders) and filing forms (ArchivedOrders）。
Periodically, order data older than 1 year is removed from theOrders The table is moved to theArchivedOrders Table.
Ensure that queries on archived data do not affect the performance of the primary table.

1. Creating master and archive tables

-- Creating the Master Order Table Orders
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

-- Creating an archive table ArchivedOrders
CREATE TABLE ArchivedOrders
(
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    TotalAmount DECIMAL(10, 2),
    Status VARCHAR(20)
);

2. Archiving operations (moving orders older than 1 year to the archive table)

To periodically move expired orders to the archive table, you can use a timed task, such as a SQL Server Agent job, to perform this operation.

-- exceed 1 The order data for the year has increased from Orders The table is moved to the ArchivedOrders a meter (measuring sth)
INSERT INTO ArchivedOrders (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
SELECT OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status
FROM Orders
WHERE OrderDate < DATEADD(YEAR, -1, GETDATE());

-- removing Orders a meter (measuring sth)中超过 1 Order data for the year
DELETE FROM Orders
WHERE OrderDate < DATEADD(YEAR, -1, GETDATE());

This code will set theOrders statistical tablesOrderDate Order data that is less than 1 year old than the current date is inserted into theArchivedOrders table, and to transfer these data from theOrders Delete from the table.

3. Timed archiving tasks (using SQL Server Agent)

We can use SQL Server Agent to create a timed task that performs a data archiving operation at regular intervals. For example, run it once a day to archive order data from 1 year ago:

-- exist SQL Server Agent Create a job to perform an archive operation in
USE msdb;
GO

EXEC sp_add_job
    @job_name = N'ArchiveOldOrders';
GO

EXEC sp_add_jobstep
    @job_name = N'ArchiveOldOrders',
    @step_name = N'ArchiveOrdersStep',
    @subsystem = N'TSQL',
    @command = N'
        INSERT INTO ArchivedOrders (OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status)
        SELECT OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status
        FROM Orders
        WHERE OrderDate < DATEADD(YEAR, -1, GETDATE());

        DELETE FROM Orders
        WHERE OrderDate < DATEADD(YEAR, -1, GETDATE());
    ',
    @database_name = N'VGDB';
GO

-- Setting up job scheduling，For example, run it once a day
EXEC sp_add_schedule
    @schedule_name = N'ArchiveOrdersDaily',
    @enabled = 1,
    @freq_type = 4, -- everyday
    @freq_interval = 1, -- everyday执行一次
    @active_start_time = 0;
GO

EXEC sp_attach_schedule
    @job_name = N'ArchiveOldOrders',
    @schedule_name = N'ArchiveOrdersDaily';
GO

-- priming task
EXEC sp_start_job @job_name = N'ArchiveOldOrders';
GO

4. Access to archived data

The archived data can still be queried, but it will not affect the query performance of the main table. To find the history of orders for a particular customer, you can query the archived table:

-- Query a customer's order history
SELECT OrderID, CustomerID, OrderDate, ProductID, TotalAmount, Status
FROM ArchivedOrders
WHERE CustomerID = 1001
ORDER BY OrderDate DESC;

5. Optimization and caveats

Archiving strategy: A suitable timeframe (e.g., 3 months, 6 months, or 1 year) can be selected based on actual business needs. This can be done by adjusting theWHERE conditions to modify the archiving rules.
performance optimization: Periodic archive operations can reduce the burden on the primary table and improve query performance. Regular deletion of old data also reduces the storage space of the master table.
Backup and recovery of archived data: Archived data also needs to be backed up regularly and be able to be restored when needed. Ensure that the archive table also includes an adequate backup policy.

6. Another option for archiving and cleansing data: soft deletion

In some cases, data is not completely deleted from the database after archiving, but rather marked as "archived" or "deleted". The advantage of this approach is that data can be restored at any time without loss.

-- Add the Archived flag to the Orders table
ALTER TABLE Orders
ADD Archived BIT DEFAULT 0.

-- Mark data as archived
UPDATE Orders
SET Archived = 1
WHERE OrderDate < DATEADD(YEAR, -1, GETDATE());

-- Query unarchived data
SELECT * FROM Orders WHERE Archived = 0;

-- Query archived data
SELECT * FROM Orders WHERE Archived = 1; -- query archived data

With this method, the archived orders remain in the main table, but the orders that have been archived through theArchived field can distinguish between archived and unarchived orders.

To summarize.

Data archiving operations are an effective strategy for managing databases with large data volumes. By periodically migrating historical data from the primary database tables to the archive tables, database query performance can be significantly improved, while ensuring that historical data is retained for future querying and auditing.

5. Storage and hardware optimization

Disk I/O Optimization: Database performance is limited by disk I/O, especially when working with large amounts of data. Using SSD storage provides faster I/O performance than traditional hard disk drives (HDDs).
Increase memory: Increasing SQL Server's memory allows for a larger database buffer pool, which reduces disk I/O and improves query performance.
Using RAID Configuration: Use RAID 10 or other RAID configurations to ensure efficient and reliable data reads and writes.

Storage and hardware optimization is a key part of improving database performance, especially in large-scale data processing environments. Performance can be significantly improved through reasonable hardware resource allocation, storage structure optimization, and database configuration. In the following, we will explain how to optimize SQL Server at the storage and hardware level for an order system of an e-commerce platform.

Business Scenarios:

Suppose you have an e-commerce platform where order data is stored in SQL Server and the increasing number of orders is leading to degradation of query performance. In this scenario, we can perform storage and hardware optimization in the following ways.

Optimization Strategies:

Disk I/O Optimization：
- Use SSDs to replace traditional hard disk drives (HDDs) to increase read and write speeds.
- Store data files, log files, and temporary files on separate physical disks.
Table and Index Storage：
- Use appropriate storage formats and file organization, such as partitioned tables and table compression.
- Place frequently accessed tables and indexes on high-performance disks.
Hardware resourcing：
- Increase memory to support more data caching and reduce disk access.
- Use multi-core CPUs to increase the processing power of concurrent queries.
data compression：
- Enable data compression in SQL Server to reduce disk space usage and improve I/O performance.

1. Create tables and optimize storage

First, we create the Orders table and add a new order table for theOrderID columns to create aggregated indexes.

-- Create Orders table and optimize storage
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY CLUSTERED, -- clustered index
    CustomerID INT, -- clustered index
    CustomerID INT, OrderDate DATETIME, -- Clustered Index
    CustomerID INT, OrderDate DATETIME, ProductID INT, -- Clustered Index
    TotalAmount DECIMAL(10, 2), Status VARCHAR(20), -- Clustered indexes.
    Status VARCHAR(20)
)
ON [PRIMARY]
WITH (DATA_COMPRESSION = PAGE); -- enable data page compression to save space

-- Enable non-aggregated indexes for query optimization
CREATE NONCLUSTERED INDEX idx_OrderDate
ON Orders(OrderDate)
WITH (DATA_COMPRESSION = PAGE); -- also enable data compression

By using theDATA_COMPRESSION = PAGEWe have enabled SQL Server's data compression feature to save storage space and improve disk I/O performance.PAGE compression ratioROW Compression is more efficient and suitable for large data tables.

2. Partition table optimization

In case of increasing amount of order data, we can partition the order table. According to theOrderDate column divides the data into different partitions to reduce the scanning range during the query and improve the query efficiency.

-- Create Partition Functions
CREATE PARTITION FUNCTION pf_OrderDate (DATETIME)
AS RANGE RIGHT FOR VALUES ('2022-01-01', '2023-01-01', '2024-01-01');

-- Create partition scheme
CREATE PARTITION SCHEME ps_OrderDate
AS PARTITION pf_OrderDate
TO ([PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY]); -- Create partition scheme.

-- Create a partitioned table
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY CLUSTERED,
    OrderDate DATETIME, OrderID INT
    OrderDate DATETIME,
    CustomerID INT, OrderDate DATETIME, ProductID INT,
    TotalAmount DECIMAL(10, 2), Status VARCHAR(20), and
    Status VARCHAR(20)
)
ON ps_OrderDate(OrderDate); -- partition by OrderDate column

In this code, we are based on theOrderDate The columns are divided into partitions by year (for example, order data for 2022, 2023, and 2024). This allows the query to perform better for a particular time range because SQL Server only needs to scan the data in the relevant partition, not the entire table.

3. Hardware optimization

3.1. Ensuring the use of SSD disks

SSD disks are faster to read and write to than traditional hard disks, so storing the database's primary data files, log files, and temporary files on separate disks (preferably SSDs) can improve performance.

-- Storing SQL Server Data Files (.mdf) on SSD Disks
-- Storing Log Files (.ldf) on SSD Disks
-- Storing Temporary Database Files (.ndf) on SSD Disks

3.2. Configuring SQL Server Memory

Set SQL Server's memory to maximum so that more data can be cached in memory, thus reducing disk I/O.The following shows how to set SQL Server's maximum memory configuration:

-- Viewing Current Memory Settings
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'max server memory (MB)';

-- Set the maximum memory to 16 GB
EXEC sp_configure 'max server memory (MB)', 16384;
RECONFIGURE;

With proper memory configuration, SQL Server can cache more data in memory, resulting in fewer disk accesses and faster query responses.

3.3. Configuring SQL Server Parallel Processing

If the server has a multi-core CPU, you can increase the processing power of multi-threaded queries by setting SQL Server to allow more parallel query operations.

-- View current parallelism configuration
EXEC sp_configure 'max degree of parallelism';

-- Set to 4 to allow up to 4 CPUs to process queries in parallel
EXEC sp_configure 'max degree of parallelism', 4; -- Set to 4 to allow up to 4 CPUs to process queries in parallel.
RECONFIGURE.

4. Disk I/O optimization: separate storage of data files, log files and temporary files

Disk I/O is one of the bottlenecks in database performance. To improve database performance, it is best to store data files, log files, and temporary files on separate physical disks.

-- Data files (.mdf) stored on disk A
-- Log files (.ldf) stored on disk B
-- Temporary database files (.ndf) stored on disk C

5. Data backup and recovery optimization

Ensure that you back up your data on a regular basis and use incremental backups, differential backups, etc. to reduce the disk load during backups.

-- Make a full backup
BACKUP DATABASE VGDB TO DISK = 'D:\Backups\VGDB_full.bak';

-- Perform a differential backup
BACKUP DATABASE WGDB TO DISK = 'D:\Backups\VGDB_diff.bak' WITH DIFFERENTIAL; -- Make a differential backup.

-- Perform a transaction log backup
BACKUP LOG VGDB TO DISK = 'D:\Backups\VGDB_log.trn'; -- Make a transaction log backup.

With this approach, data can be restored quickly in the event of a system crash while reducing the impact on hard disk I/O performance during the backup process.

6. Monitoring and maintenance

Regularly monitor SQL Server performance and adjust accordingly to hardware and storage requirements. Monitor I/O performance, query execution plans, index usage, and more through SQL Server's Dynamic Management View (DMV).

-- Check disk I/O status
SELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL);

-- View query execution plan cache
SELECT * FROM sys.dm_exec_query_stats;

-- View current index usage
SELECT * FROM sys.dm_db_index_usage_stats;

To summarize.

The performance of a SQL Server database can be significantly improved through storage and hardware optimizations. Key optimizations include using SSD disks, storing data files, log files, and temporary files separately, enabling data compression, using partitioned tables to improve query efficiency, and adjusting memory and parallel processing configurations. Regular maintenance and monitoring can also help you identify performance bottlenecks and make adjustments accordingly.

6. Optimization of database parameters and configuration

Adjusting the maximum number of concurrent connections: Ensure that SQL Server is configured with a sufficient maximum number of concurrent connections to avoid performance degradation when there are too many connections.
Setting appropriate memory limits: Configure enough memory for SQL Server (max server memory) to avoid memory overflow or overuse of disk swapping.
Automatic update of statistical information: Ensure that SQL Server automatically updates statistics for queries (AUTO_UPDATE_STATISTICS) for the query optimizer to select the optimal execution plan.

Database parameter and configuration optimization is an important step in ensuring optimal database system performance. In high concurrency and high load scenarios, a reasonable configuration can significantly improve database performance and reduce response time and latency. The following is a complete code example of how to improve performance by optimizing database parameters and configuration based on a business scenario of an e-commerce platform order system.

Business Scenarios:

Assuming that the e-commerce platform has a very high volume of orders and the system processes millions of orders per day, database performance and responsiveness are critical to the proper operation of the system. To ensure database performance, parameter and configuration optimization in SQL Server is critical.

Optimization Strategies:

Adjusting Memory Configuration: Reduce disk I/O by configuring SQL Server to use more memory to cache data.
Setting Maximum Parallelism: Adjusts the parallel query processing power of SQL Server based on the number of CPU cores.
Optimize disk and storage configurations: Ensure that log files, data files and temporary files are stored separately.
Enable automatic database optimization: Ensure that the database is capable of automating tasks such as defragmentation and updating statistical information.
Adjustment of the transaction log and recovery mode: Ensure that the database can be quickly recovered in the event of a failure.

1. Adjustment of memory configuration

Memory allocation optimization is a key part of improving SQL Server performance. By increasing the maximum memory for SQL Server, you can ensure that query operations do not cause performance problems due to disk I/O bottlenecks.

-- View the current maximum memory configuration
EXEC sp_configure 'show advanced options', 1;
RECONFIGURE;
EXEC sp_configure 'max server memory (MB)';

-- Set the maximum memory to 16 GB
EXEC sp_configure 'max server memory (MB)', 16384; -- 16 GB
RECONFIGURE;

In the above code, we have set the maximum memory for SQL Server to 16 GB.Properly configured memory can improve query performance and reduce disk access.

2. Setting maximum parallelism

SQL Server can utilize multiple CPU cores for parallel query processing. By setting the degree of parallelism appropriately, you can improve the processing capability of large queries.

-- View current max degree of parallelism setting
EXEC sp_configure 'max degree of parallelism';

-- Set max degree of parallelism to 4 (for machines with 4-core CPUs)
EXEC sp_configure 'max degree of parallelism', 4; -- set max degree of parallelism to 4 (for machines with 4-core CPUs)
RECONFIGURE.

With this setting, SQL Server can utilize up to 4 CPU cores for parallel processing during queries. If your server has more cores, you can adjust this parameter accordingly.

3. Adjustment of the transaction log and recovery model

For e-commerce platforms, transaction log optimization is critical. Ensure that log files are processed efficiently when large-scale transactional operations are performed, and that recovery patterns match business requirements.

-- View the recovery model of the database
SELECT name, recovery_model_desc
FROM
WHERE name = 'VGDB';

-- Set the recovery model to simple recovery mode
ALTER DATABASE VGDB
SET RECOVERY SIMPLE.

For databases that do not require a full backup, using Simple Recovery Mode reduces the growth of log files and alleviates disk I/O pressure.

4. Configuring automatic database optimization

Ensure that the database is able to perform automatic optimization tasks, such as rebuilding indexes and updating statistics, on a regular basis. Regular optimization improves database query performance and avoids fragmentation problems.

-- Enable auto update statistics
EXEC sp_configure 'auto update statistics', 1;
RECONFIGURE.

-- Enable automatic creation of statistics
EXEC sp_configure 'auto create statistics', 1;
RECONFIGURE; -- Enable automatic creation of statistics

By enabling automatic updating of statistics and automatic creation of statistics, you can ensure that SQL Server uses the most up-to-date execution plan when executing queries, reducing the burden on the query optimizer.

5. Configuration of disks and storage

Make sure that SQL Server's data files, log files, and temporary files are stored on separate disks, and in particular, store log files and data files on high-speed disks (such as SSDs).

-- Store data files (.mdf) on disk A (SSD)
-- Store log files (.ldf) on disk B (SSD)
-- Store temporary database files (.ndf) on disk C (SSD)

By storing data files, log files, and temporary files on separate disks, you can avoid disk I/O contention and improve overall database performance.

6. Enabling database compression

For e-commerce platforms that need to store large amounts of data, enabling data compression can reduce storage space and improve query performance, especially on disk I/O.

-- Enable table compression
ALTER TABLE Orders REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE);

-- Enable index compression
ALTER INDEX ALL ON Orders REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE); -- Enable index compression.

By enabling data compression, we can effectively save storage space, reduce disk I/O operations, and increase query speed.

7. Configuring automated maintenance tasks

SQL Server provides automated maintenance tasks, such as index rebuilding, database defragmentation, etc. These tasks can be automated through SQL Server Agent timed tasks to keep the database running efficiently.

-- Create a job that executes on a regular basis，Perform index rebuilding tasks
EXEC sp_add_job @job_name = 'RebuildIndexes', @enabled = 1;
EXEC sp_add_jobstep @job_name = 'RebuildIndexes',
    @step_name = 'RebuildIndexStep',
    @subsystem = 'TSQL',
    @command = 'ALTER INDEX ALL ON Orders REBUILD',
    @retry_attempts = 3,
    @retry_interval = 5;

-- Setting the job run frequency：in the wee hours of the morning 2 point implementation
EXEC sp_add_schedule @schedule_name = 'RebuildIndexSchedule',
    @enabled = 1,
    @freq_type = 4,
    @freq_interval = 1,
    @active_start_time = 20000;

EXEC sp_attach_schedule @job_name = 'RebuildIndexes', @schedule_name = 'RebuildIndexSchedule';

This operation will be executed at 2:00 a.m. each day to rebuild theOrders all indexes on the table, thus avoiding query performance degradation due to index fragmentation.

8. Enable instant log backup

For production environments, especially e-commerce platforms, it is critical to ensure that log backups are performed in a timely manner. Enabling log backups ensures fast recovery in case of database failure.

-- Setting up a transaction log backup
BACKUP LOG VGDB TO DISK = 'D:\Backups\YourDatabase_log.trn';

By performing regular transaction log backups, you can ensure that in the event of a failure, the database can be restored to an up-to-date state.

9. Enabling database caching

SQL Server caches query results and data pages to optimize performance by adjusting caching policies.

-- View the number of cached pages
DBCC SHOW_STATISTICS('Orders');

-- Force the cache to be cleared (can sometimes be used for testing)
DBCC FREEPROCCACHE; -- see the number of pages in the cache
DBCC DROPCLEANBUFFERS; -- View the number of pages in the cache.

In day-to-day operations, we don't recommend clearing the cache often, but you can clear the cache when needed to test performance optimizations.

To summarize.

Database performance of an e-commerce platform can be significantly improved by optimizing the configuration and parameters of SQL Server. Key optimization measures include adjusting memory and parallelism, optimizing disk storage and logging configurations, enabling data compression, performing automatic database optimization tasks on a regular basis, configuring database compression and regular backups. Reasonable configurations are made based on business requirements and hardware resources to ensure that the database can run stably and efficiently in a highly concurrent, high-load environment.

7. Batch data processing

Batch insert/update operations: When working with large amounts of data, you can use batch insert or update operations instead of one line at a time. This can significantly improve the speed of loading data.
Avoidance of big business: Avoid using large transactions for large amounts of data modifications, as they can lead to lock contention, large log files, and other problems. Use small batch transactions for operations.

Batch data processing is inevitable in large-scale applications, especially in business scenarios like e-commerce platforms, financial systems, etc., which usually require large-volume orders, user information processing, and so on. Batch operation can significantly improve the efficiency of data processing, but also requires careful design to ensure performance and stability.

Business Scenarios:

Suppose in the e-commerce platform, order information needs to be processed in batch, such as updating order status in batch, deleting invalid orders in batch, inserting order data in batch, and so on. By designing appropriate batch operations, it can effectively reduce the number of database accesses for a single operation and improve the responsiveness of the system.

Optimization program:

Batch insertion of data: ByBULK INSERT orINSERT INTO Multi-line insertion method to reduce the performance bottleneck caused by multiple separate insertion operations.
Batch update data: UseUPDATE operation updates multiple records at once.
Batch Delete Data: Batch delete expired orders, or batch delete invalid user information.

The following are code examples of specific SQL Server batch data processing.

1. Batch insertion of data

Batch insertion can reduce the time overhead of a large number of individual insert operations byINSERT INTO statement inserts multiple pieces of data at once.

Example: Batch insertion of order data

-- suppose that... Orders The table structure is as follows：OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20)
DECLARE @OrderData TABLE (OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20));

-- Insert order data into temporary table
INSERT INTO @OrderData (OrderID, CustomerID, OrderDate, OrderStatus)
VALUES
    (1, 101, '2024-11-01', 'Pending'),
    (2, 102, '2024-11-02', 'Shipped'),
    (3, 103, '2024-11-03', 'Delivered'),
    (4, 104, '2024-11-04', 'Cancelled');

-- Batch insert data into the Orders a meter (measuring sth)
INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderStatus)
SELECT OrderID, CustomerID, OrderDate, OrderStatus
FROM @OrderData;

In this example, we first insert the data into the temporary table@OrderDataand then through theINSERT INTO SELECT Statement Batch InsertOrders tables. This approach can greatly reduce the number of database accesses.

2. Bulk update of data

Batch update operations are typically used to modify certain fields in multiple records to avoid multiple individual updates.

Example: Batch Update Order Status

Assuming that you need to bulk update the status of all unshipped orders to "Shipped", you can do this with the following SQL:

-- Bulk Update Order Status
UPDATE Orders
SET OrderStatus = 'Shipped'
WHERE OrderStatus = 'Pending' AND OrderDate < '2024-11-01'.

This operation updates all eligible records at once, avoiding performance problems associated with multiple individual update operations.

3. Bulk deletion of data

In some scenarios, we need to batch delete some expired or invalid data. For example, deleting expired orders that are more than 30 days old.

Example: Batch Delete Expired Orders

-- Delete expired orders
DELETE FROM Orders
WHERE OrderDate < DATEADD(DAY, -30, GETDATE()) AND OrderStatus = 'Completed';

In this example, we are deleting all completed orders with order dates older than 30 days. This batch deletion is much more efficient than deleting them one by one.

4. Optimization of batch processing logic

Sometimes the amount of data in a batch operation is so large that direct processing may lead to performance problems or database lock contention. You can consider executing operations in batches to reduce the burden on the system.

Example: Processing order data by batch

DECLARE @BatchSize INT = 1000;
DECLARE @StartRow INT = 0;
DECLARE @TotalRows INT.

-- Calculate the total number of rows
SELECT @TotalRows = COUNT(*) FROM Orders WHERE OrderStatus = 'Pending';

-- Loop to batch process data
WHILE @StartRow < @TotalRows
BEGIN
    -- Batch update 1000 rows
    UPDATE TOP (@BatchSize) Orders
    SET OrderStatus = 'Shipped'
    WHERE OrderStatus = 'Pending' AND OrderDate < '2024-11-01' AND OrderID > @StartRow.

    -- Update the number of rows processed
    SET @StartRow = @StartRow + @BatchSize;
END

By processing in batches (1000 records at a time), you can avoid performance bottlenecks or database locking problems caused by processing large amounts of data at once. This is suitable for situations where a large number of records need to be updated in a batch.

5. Use of transactions to ensure data consistency

For batch operations, it is often necessary to use transactions to ensure data consistency, i.e. either all succeed or all fail.

Example: Inserting Orders in Bulk and Using Transactions

BEGIN TRANSACTION;

BEGIN TRY
    -- suppose that... Orders table structure：OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20)
    DECLARE @OrderData TABLE (OrderID INT, CustomerID INT, OrderDate DATETIME, OrderStatus VARCHAR(20));

    -- Batch insertion of order data
    INSERT INTO @OrderData (OrderID, CustomerID, OrderDate, OrderStatus)
    VALUES
        (5, 105, '2024-11-05', 'Pending'),
        (6, 106, '2024-11-06', 'Pending');

    INSERT INTO Orders (OrderID, CustomerID, OrderDate, OrderStatus)
    SELECT OrderID, CustomerID, OrderDate, OrderStatus
    FROM @OrderData;

    -- Submission of transactions
    COMMIT TRANSACTION;
END TRY
BEGIN CATCH
    -- Error handling and rolling back transactions
    ROLLBACK TRANSACTION;
    PRINT 'Error occurred: ' + ERROR_MESSAGE();
END CATCH;

In this example, the bulk insert operation is included in a transaction, ensuring that the insert operation is atomic, i.e., either all succeed or all fail. If an error occurs during execution, the transaction is rolled back to avoid data inconsistencies.

To summarize.

Batch data processing is an effective means to improve the performance of SQL Server, especially in business scenarios such as e-commerce platforms with huge amounts of data. Through the rational use of batch insert, batch update and batch delete operations, you can significantly improve the processing efficiency of the database and reduce the number of I/O operations and lock contention in the database. When performing batch operations, remember to ensure data consistency through transactions, and batch processing can further optimize the processing performance of large-scale data.

8. Cleaning up useless data

Delete expired data: Periodically purging data that is outdated or no longer needed reduces the size of the database and the complexity of queries.
Clean up database fragmentation: As data is added or deleted, table and index fragmentation increases, affecting performance. Periodically rebuild indexes or reorganize indexes to reduce fragmentation.

Cleaning useless data is a common task in database maintenance, especially when dealing with historical data, expired records or redundant data. Regular cleaning of useless data not only saves storage space, but also improves database performance and avoids unnecessary impact of useless data on queries, indexes, and so on.

Business Scenarios:

Suppose we are in an e-commerce platform where users' order data generates a large number of records every year. In order to avoid the order table being too large and the order records that are no longer in use (e.g., orders made 3 years ago) taking up a lot of storage space, we need to periodically clean up this expired order data.

Optimization program:

Delete expired data: Order data older than a certain period of time (e.g. 3 years old) is periodically deleted.
Archiving expired data: Move expired order data to a history table or external storage that retains the necessary historical information.

code example

1. Periodic deletion of obsolete data

Assuming that ourOrders Table with fieldsOrderDate to record when the order was created.OrderStatus to identify the order status. We can purge completed or canceled orders from 3 years ago on a monthly basis.

-- Delete an order that was completed or canceled 3 years ago.
DELETE FROM Orders
WHERE OrderDate < DATEADD(YEAR, -3, GETDATE())
    AND OrderStatus IN ('Completed', 'Canceled');

In this example, theDATEADD(YEAR, -3, GETDATE()) The date 3 years before the current date is calculated, and all dates prior to this date that have a status of'Completed' maybe'Cancelled' The order will be deleted.

2. Periodic archiving of outdated data

If deleting the data doesn't meet business needs, you can choose to archive the data. For example, moving an order from 3 years ago to theArchivedOrders Table.

-- commander-in-chief (military) 3 Completed or canceled orders from years ago are moved to the ArchivedOrders a meter (measuring sth)
INSERT INTO ArchivedOrders (OrderID, CustomerID, OrderDate, OrderStatus)
SELECT OrderID, CustomerID, OrderDate, OrderStatus
FROM Orders
WHERE OrderDate < DATEADD(YEAR, -3, GETDATE())
    AND OrderStatus IN ('Completed', 'Cancelled');

-- Deleting Archived Orders
DELETE FROM Orders
WHERE OrderDate < DATEADD(YEAR, -3, GETDATE())
    AND OrderStatus IN ('Completed', 'Cancelled');

First insert the eligible order data into theArchivedOrders table and then delete the originalOrders table with this data. This keeps the master table clean, reduces storage pressure, and preserves historical data.

3. Use of triggers to automatically clean up useless data

To automate cleanup operations, you can use database triggers (Trigger), for example, to check whether data is out of date each time you insert data and trigger a cleanup operation if it is out of date. Triggers can perform cleanup tasks periodically.

-- Create a trigger that checks and deletes orders 3 years old every day
CREATE TRIGGER CleanOldOrders
ON Orders
AFTER INSERT, UPDATE
BEGIN
BEGIN
    -- CleanOldOrders: Deletes completed or canceled orders that are more than 3 years old.
    DELETE FROM Orders
    WHERE OrderDate < DATEADD(YEAR, -3, GETDATE())
        AND OrderStatus IN ('Completed', 'Canceled');
END;

This trigger will be activated in theOrders Triggered each time the table performs an insert or update operation, it automatically checks for and cleans up expired orders.

4. Batch cleansing of useless data

If the amount of order data is very large, direct deletion may cause performance bottlenecks or database locking issues. In this case, the data can be deleted in batches to reduce the load of a single delete operation.

DECLARE @BatchSize INT = 1000;
DECLARE @StartRow INT = 0;
DECLARE @TotalRows INT.

-- Calculate the number of rows to be deleted
SELECT @TotalRows = COUNT(*) FROM Orders
WHERE OrderDate < DATEADD(YEAR, -3, GETDATE())
    AND OrderStatus IN ('Completed', 'Cancelled');

-- Delete in batches
WHILE @StartRow < @TotalRows
BEGIN
    -- Delete 1000 rows in a batch
    DELETE TOP (@BatchSize) FROM Orders
    WHERE OrderDate < DATEADD(YEAR, -3, GETDATE())
        AND OrderStatus IN ('Completed', 'Cancelled')
        AND OrderID > @StartRow.

    -- Update the number of rows that have been deleted
    SET @StartRow = @StartRow + @BatchSize;
END

By processing deletion operations in batches, a small number of records are deleted at a time, reducing the impact on database performance and avoiding locking tables for long periods of time.

5. Periodic cleansing of useless data using the job scheduler

If you are using SQL Server, you can use the job scheduler (SQL Server Agent) to perform periodic cleanup tasks. First, you can create a stored procedure to perform a data cleanup operation.

CREATE PROCEDURE CleanOldOrders
AS
BEGIN
    DELETE FROM Orders
    WHERE OrderDate < DATEADD(YEAR, -3, GETDATE()) 
        AND OrderStatus IN ('Completed', 'Cancelled');
END;

Then, set up a periodic job in SQL Server Management Studio (e.g., run the procedure every day at midnight), which ensures that useless data is cleaned up periodically.

To summarize.

Cleaning up useless data not only helps to save storage space, but also improves database performance. According to the actual business requirements, we can choose to delete, archive or batch processing to clean up the data. Especially for tables with large data volume, batch cleaning and regular job scheduling can effectively reduce the burden on the system.

9. Use of caches

Cache common query results: For high-frequency queries, the results can be cached in memory to avoid going to the database for each query.
Application layer caching: Use a caching system such as Redis or Memcached to reduce the frequency of database accesses by caching some frequently used data in memory.

In actual business, caching is a common means to improve system performance, especially for high-frequency access to hot data, by storing it in the cache, it can reduce the number of database queries and pressure, and improve response speed.

business scenario

Suppose we have an e-commerce platform where users frequently query the basic information of a product (e.g., price, inventory, description, etc.) while browsing the product details. Since the product information changes less and the query requests are frequent, caching the product information can effectively improve the performance of the system.

We use Redis as a caching database. A common practice is that when querying a product, we first check if the details of the product exist in the cache, and if they do, we return the cached data directly; if they don't exist in the cache, we query them from the database and store the query results in the cache for next time's use.

prescription

Use Redis to store product information.
Set the appropriate expiration time (TTL, Time To Live) to avoid expiration of cached data.
Use an appropriate cache update strategy (e.g., update the cache every time you update the product information).

code example

1. Setting up the Redis Cache

First, use a Redis client library (such as theredis-py) to connect to the Redis service. Suppose the product information table isProductsFields availableProductID, ProductName, Price, Stock, Description。

# Install the Redis client
pip install redis

2. Product query and caching logic

import redis
import
import json

# grout Redis
redis_client = (host='localhost', port=6379, db=0, decode_responses=True)

# grout MySQL comprehensive database
def get_db_connection():
    return (
        host="localhost",
        user="root",
        password="password",
        database="ecommerce"
    )

# Get product details
def get_product_details(product_id):
    # Checking the cache
    cached_product = redis_client.get(f"product:{product_id}")
    
    if cached_product:
        print("Getting product information from the cache")
        return (cached_product) # deserialization JSON digital
    
    # If there is no，查询comprehensive database
    print("从comprehensive databasecenter获取商品信息")
    connection = get_db_connection()
    cursor = (dictionary=True)
    ("SELECT * FROM Products WHERE ProductID = %s", (product_id,))
    product = ()
    
    # If the commodity exists，Cache to Redis center
    if product:
        redis_client.setex(f"product:{product_id}", 3600, (product))  # (computing) cache 1 hourly
    ()
    ()
    
    return product

# Updated products信息并更新(computing) cache
def update_product_details(product_id, name, price, stock, description):
    # 更新comprehensive database
    connection = get_db_connection()
    cursor = ()
    ("""
        UPDATE Products
        SET ProductName = %s, Price = %s, Stock = %s, Description = %s
        WHERE ProductID = %s
    """, (name, price, stock, description, product_id))
    ()
    ()
    ()
    
    # 更新(computing) cache
    updated_product = {
        "ProductID": product_id,
        "ProductName": name,
        "Price": price,
        "Stock": stock,
        "Description": description
    }
    redis_client.setex(f"product:{product_id}", 3600, (updated_product))  # (computing) cache 1 hourly

# typical example：Search for products 101 such information
product_info = get_product_details(101)
print(product_info)

# typical example：Updated products 101 such information
update_product_details(101, "New Product Name", 199.99, 50, "Updated description")

Code Description

Connecting to Redis and MySQL: utilizationredis-py To connect to Redis, use the Connect to the MySQL database.
Inquire about merchandise: existget_product_details method, we first query the Redis cache to see if the product information is already cached. If it exists in the cache, we return the cached data directly; if it does not exist in the cache, we query it from the MySQL database and cache the query result into Redis.
Update product information: When product information changes (e.g., updates to product name, price, inventory, etc.), we update the Redis cache at the same time as we update the product information in the database to ensure that the cached data is up-to-date.
The cache sets the expiration time: utilizationsetex method caches the product information into Redis and sets an expiration time (TTL) for the cached data. This avoids caching expired data.

further optimization

Cache Penetration: In addition to checking whether the cache exists when querying, you can also add some mechanisms to prevent cache penetration, such as checking whether the commodity exists when querying the database. If the commodity does not exist, you can set it toNone or null to avoid querying the database multiple times.
Cache Elimination Strategy: Redis has a variety of cache elimination strategies (e.g., LRU, LFU), you can configure the caching strategy of Redis instances according to the actual business needs, to ensure that the hot data can be kept in the cache for a long time.
Asynchronously update the cache: In highly concurrent scenarios, the operation of updating the cache can lead to performance problems. Queues and asynchronous processing can be used to optimize the timing of cache updates and avoid frequent cache updates.

To summarize.

By using Redis cache, the e-commerce platform can effectively improve the performance of querying product information and reduce the burden on the database. According to business requirements, we can further optimize the caching strategy and update mechanism.

10. Parallel queries and concurrency

Enabling Parallel Queries: SQL Server allows multiple CPU cores to be used in queries for parallel processing. Adjust the settings for parallel queries appropriately (e.g.max degree of parallelism) can improve query performance, especially when working with large amounts of data.
Optimized locking strategy: Ensure that the database has a sound locking strategy to avoid long lock contention. You can use row-level locks instead of table-level locks to reduce blocking.

In highly concurrent scenarios, the use of parallel queries can significantly improve the speed of data queries. The core idea of parallel querying is to split a complex query into multiple subtasks and utilize multiple CPU cores to process these subtasks at the same time, thus improving the overall query performance. Concurrency refers to switching between multiple tasks to make more efficient use of the CPU, and in some scenarios, higher performance can be achieved by concurrently executing multiple query tasks.

business scenario

Suppose we have an e-commerce platform in which a large amount of order data is stored. When users query the order data, it may involve complex query operations such as joining multiple tables and filtering multiple conditions. In order to improve query performance, we can optimize for different query tasks by parallel querying and concurrency.

For example, when querying order data, the query conditions include order status, order date range, and user ID. We split the query into multiple parallel queries, querying different conditions and then combining the results.

prescription

Parallel queries: Split the query task into multiple subtasks and execute each subtask in parallel using multithreading or multiprocessing.
Concurrent queries: Use asynchronous IO or thread pooling to concurrently execute multiple query operations.

We'll use Python's library to implement parallel queries and utilize MySQL databases to perform query operations.

code example

1. Parallel queries

We divide the query condition into multiple parts and perform the query operation in parallel. For example, separately querying the order status ofCompleted cap (a poem)Pending of order data with parallel queries.

# mounting MySQL client library
pip install mysql-connector-python

import
from import ThreadPoolExecutor
import time

# grout MySQL comprehensive database
def get_db_connection():
    return (
        host="localhost",
        user="root",
        password="123123",
        database="VGDB"
    )

# perform a search：Queries orders with the specified status
def query_orders_by_status(status):
    connection = get_db_connection()
    cursor = (dictionary=True)
    query = "SELECT * FROM Orders WHERE OrderStatus = %s"
    (query, (status,))
    result = ()
    ()
    ()
    return result

# Execute parallel queries
def fetch_orders():
    statuses = ['Completed', 'Pending'] # Define the order status we need to query
    # utilization ThreadPoolExecutor parallel query
    with ThreadPoolExecutor(max_workers=2) as executor:
        # Submit a query
        futures = [(query_orders_by_status, status) for status in statuses]
        # Getting Query Results
        results = [() for future in futures]
    
    return results

# typical example：perform a search
if __name__ == "__main__":
    start_time = ()
    orders = fetch_orders()
    print("Inquiry results：", orders)
    print(f"query duration: {() - start_time}unit of angle or arc equivalent one sixtieth of a degree")

Code Description

query_orders_by_status: This method performs a database query for orders with the specified status.
fetch_orders: This method uses theThreadPoolExecutor to execute multiple query tasks in parallel. Here, we have combined the order statusCompleted cap (a poem)Pending respectively, are submitted as tasks to the thread pool for parallel querying.
ThreadPoolExecutor: We create a thread pool with a maximum number of working threads of 2 and use thesubmit Submit a query task. Each query is executed in a separate thread.
(): Get the returned results of the parallel query task.

2. Concurrent queries

We can perform concurrent queries via asynchronous queries or multithreading for cases where database queries will not depend on each other.

import asyncio
import
from import ThreadPoolExecutor

# Asynchronous Querying of Databases
async def query_orders_by_status_async(status, loop):
    # utilization ThreadPoolExecutor Making Database Queries Execute Asynchronously
    result = await loop.run_in_executor(None, query_orders_by_status, status)
    return result

# perform a search：Queries orders with the specified status
def query_orders_by_status(status):
    connection = get_db_connection()
    cursor = (dictionary=True)
    query = "SELECT * FROM Orders WHERE OrderStatus = %s"
    (query, (status,))
    result = ()
    ()
    ()
    return result

# Asynchronous Concurrent Queries
async def fetch_orders_concurrently():
    loop = asyncio.get_event_loop()
    statuses = ['Completed', 'Pending', 'Shipped'] # Query orders with multiple statuses
    tasks = [query_orders_by_status_async(status, loop) for status in statuses]
    orders = await (*tasks) # Waiting for all tasks to be completed
    return orders

# typical example：Execute concurrent queries
if __name__ == "__main__":
    start_time = ()
    (fetch_orders_concurrently())
    print(f"query duration: {() - start_time}unit of angle or arc equivalent one sixtieth of a degree")

Code Description

query_orders_by_status_async: This method usesloop.run_in_executor to asynchronize database query operations. In this way, we can execute multiple queries concurrently even though database queries are blocking operations.
: Combine multiple asynchronous tasks and wait for all of them to complete before returning the results.
: Used to start the event loop and execute asynchronous queries.

further optimization

Thread Pool Size: Adaptation to business needsThreadPoolExecutor hit the nail on the headmax_workers Parameters. If there are very many tasks, you can increase the thread pool size appropriately, but be careful not to overdo it so as not to affect system performance.
connection pool: For database operations, database connection pooling can be used to optimize the management of database connections. This avoids creating a new database connection for each query and improves performance.
paging search: If the query results are very large, you can page through the query to reduce the amount of data in each query and further improve performance.

summarize

parallel query: Query performance can be significantly improved by splitting the query task into multiple subtasks and processing them in parallel.
Concurrent queries: For concurrent execution between multiple query tasks, no need to wait for each query task to complete one by one, which can speed up the overall query speed.

By combining parallel query and concurrent query strategies, we can significantly improve the query response speed of e-commerce platforms or other business systems, especially in highly concurrent environments, to ensure the efficiency of the system.

11. SQL Server instance optimization

Restarting a SQL Server Instance Periodically: If SQL Server runs for long periods of time, which can lead to problems such as excessive caching or memory leaks, regular restarts can help free up resources and optimize performance.
Enable Compression: SQL Server provides data compression to save storage space and improve query performance, especially when reading data.

SQL Server instance optimization is an important aspect of improving overall database performance. In large-scale business systems, the performance of SQL Server often directly affects the response speed and stability of the whole application. Instance optimization includes reasonable allocation of hardware resources, optimization of SQL Server configuration parameters, memory and I/O management, query optimization and monitoring.

Suppose we have an online e-commerce platform with a high volume of business containing a large amount of data on products, orders, users, etc. We need to optimize our SQL Server instance to ensure efficient query performance, stable transaction processing and fast data readability.

1. Hardware configuration optimization

The performance of a SQL Server instance depends heavily on the configuration of the underlying hardware, especially resources such as memory, CPU, and disk.

random access memory (RAM): SQL Server is a memory-intensive application. The more memory you have, the higher the cache hit rate and the better the query performance.
CPU: More CPU cores can handle more concurrent requests.
(computer) disk: SSD drives outperform traditional hard drives in terms of disk I/O performance, especially in large database read and write operations.

2. SQL Server configuration optimization

SQL Server provides a number of configuration parameters to tune the behavior of the instance, which can be used to optimize performance.

Example of Configuration Parameters

max degree of parallelism: Controls the parallelism of SQL Server queries. The query efficiency of a multi-core CPU system can be improved by setting the degree of parallelism appropriately.
max server memory: Limit the maximum amount of memory used by SQL Server to prevent SQL Server from consuming too much memory and causing operating system performance degradation.
cost threshold for parallelism: Sets a cost threshold for query execution so that SQL Server will use parallel execution only if the cost of the query exceeds that value.

3. Index optimization

Indexes are key to improving query performance, and can be created for frequently queried fields based on business scenarios. However, too many indexes can affect the performance of insert, update and delete operations, so you need to find a balance between query performance and maintenance costs.

4. Query optimization

For large business systems, query optimization is especially important. Optimizing queries can reduce the burden on the database and improve response speed.

business scenario

Suppose the e-commerce platform needs to handle a large amount of order data, and the query often involves joining multiple tables, such as querying all the orders of a user in a certain time period. We can improve the query speed by optimizing the SQL query.

code example

1. Setting SQL Server Instance Configuration Parameters

In a SQL Server instance, we can set some basic optimization parameters with the following T-SQL statements:

-- Set maximum memory usage to 16 GB
EXEC sp_configure 'max server memory', 16384; -- in MB
RECONFIGURE.

-- Sets maximum degree of parallelism to 8-core CPUs
EXEC sp_configure 'max degree of parallelism', 8; -- in MB RECONFIGURE; -- Set the maximum degree of parallelism to 8 cores.
RECONFIGURE.

-- Set the cost threshold for queries to 10
EXEC sp_configure 'cost threshold for parallelism', 10; -- set query cost threshold to 10
RECONFIGURE;

2. Query optimization

To improve query performance, you can use the following techniques while querying:

Avoid SELECT * and select only the fields you need.
Use JOIN instead of subqueries to avoid unnecessary nested queries.
Create appropriate indexes to speed up queries.
Reduce the amount of data in a single query using paged queries.

The following is an example of an optimized query:

-- Let's say we need to query the order information of a certain user, the optimized SQL query
SELECT , , , ,
FROM Orders o
JOIN Users u ON =
WHERE BETWEEN '2024-01-01' AND '2024-12-31'
  AND = 12345
ORDER BY DESC.

3. Index optimization

To optimize the query, we can use theOrders homogeneousUserID、OrderDate Create an index on the field:

-- Create an index for the UserID column
CREATE INDEX idx_user_id ON Orders(UserID).

-- Create an index for the OrderDate column
CREATE INDEX idx_order_date ON Orders(OrderDate).

-- Create a composite index for the combination of UserID and OrderDate.
CREATE INDEX idx_user_order_date ON Orders(UserID, OrderDate); -- Create composite index for combination of UserID and OrderDate.

4. Database backup and maintenance

Regularly backing up and maintaining the database ensures that the system remains efficient under high load. Regular database optimization tasks include:

Backup data.
Updated statistical information.
Rebuild the index.

The following is an example of a periodic index rebuild:

-- Rebuild all table indexes
ALTER INDEX ALL ON Orders REBUILD;
ALTER INDEX ALL ON Users REBUILD.

5. Using SQL Server's Performance Monitoring Tools

SQL Server provides a number of performance monitoring tools to help identify performance bottlenecks. For example.SQL Server Profiler cap (a poem)Dynamic Management Views (DMVs) It helps us to monitor the performance of SQL Server instances in real time and tune them accordingly.

-- View the current resource usage of a SQL Server instance
SELECT * FROM sys.dm_exec_requests;

-- View the memory usage of a SQL Server instance.
SELECT * FROM sys.dm_os_memory_clerks;

-- View disk I/O usage of a SQL Server instance
SELECT * FROM sys.dm_io_virtual_file_stats(NULL, NULL);

To summarize.

Hardware Optimization: Properly configure CPU, memory, and disks to improve the performance of your SQL Server instance.
Instance Configuration Optimization: Optimize performance by configuring SQL Server parameters such as memory limits, parallelism, and so on.
Index Optimization: Rational design of index structure to improve query efficiency.
Query Optimization: Use efficient SQL query statements to avoid unnecessary computation and I/O operations.
Regular maintenance and backup: Regular database maintenance and backup to ensure stable system operation.

By optimizing the SQL Server instance, the performance of the database can be significantly improved to ensure that the e-commerce platform remains highly responsive despite high concurrency and high load.

ultimate

The above 11 optimization schemes are for your reference. Optimizing SQL Server database performance has to start from many aspects, including hardware configuration, database structure, query optimization, index management, partitioning table, parallel processing and so on. Through reasonable indexing, query optimization, data partitioning and other techniques, you can maintain better performance when the data volume increases. At the same time, regular database maintenance and cleaning to ensure efficient database operation. Focus on Vigo love programming, V to be your technical porter.

How to Optimize SQL Server with Too Much Data

1. Index optimization

business scenario

1. Creation of tablesOrders

2. Creation of indexes

2.1. Creating Clustered Indexes (Clustered Indexes)

2.2. Creating Non-clustered Indexes (Non-clustered Indexes)

2.3. Creating Single-Column Non-Aggregated Indexes

3. Remove redundant indexes

4. Query optimization

4.1.CustomerID cap (a poem)OrderDate consult (a document etc)

4.2.ProductID consult (a document etc)

4.3. Querying specific order details

5. Cautions

To summarize.

2. Query optimization

business scenario

1. Query Optimization: PressCustomerID cap (a poem)OrderDate Check Orders

Query Request:

Query statement:

Optimization Recommendations:

Execution plan optimization:

2. Query Optimization: PressProductID Check all related orders

Query Request:

Query statement:

Optimization Recommendations:

Execution plan optimization:

3. Query optimization: query details of an order

Query Request:

Query statement:

Optimization Recommendations:

Execution plan optimization:

4. Query optimization: query order information for multiple customers

Query Request:

Query statement:

Optimization Recommendations:

Execution plan optimization:

5. Query optimization: avoid usingSELECT *

Query Request:

Query statement:

Optimization Recommendations:

6. Query optimization: useJOIN Perform multi-table queries

Query Request:

Query statement:

Optimization Recommendations:

Execution plan optimization:

7. Query Optimization: Paged Queries

Query Request:

Query statement:

Optimization Recommendations:

8. Avoid too many subqueries

Query Request:

Query statement:

Optimization Recommendations:

To summarize.

3. Data partitioning and tabulation

business scenario

1. Data Partitioning

business need

Steps:

Creating a Partition Function (Partition Function)

Creating a Partition Scheme (Partition Scheme)

Creating a partitioned table

Query Optimization

2. Data Sharding

business need

Steps:

Creating Subtables

table partitioning logic

query logic

3. Selection of Partitions and Tables

To summarize.

4. Data archiving

business scenario

Steps:

1. Creating master and archive tables

2. Archiving operations (moving orders older than 1 year to the archive table)

3. Timed archiving tasks (using SQL Server Agent)

4. Access to archived data

5. Optimization and caveats

1. Creation of tables`Orders`

4.1.`CustomerID` cap (a poem)`OrderDate` consult (a document etc)

4.2.`ProductID` consult (a document etc)

1. Query Optimization: Press`CustomerID` cap (a poem)`OrderDate` Check Orders

2. Query Optimization: Press`ProductID` Check all related orders

5. **Query optimization: avoid using`SELECT *`**

6. Query optimization: use`JOIN` Perform multi-table queries