Single 48TB size SQL Server database backup causes log files to be untruncated

SQL Server version: SQL Server 2019
contexts
In a 48T-sized single-database environment with simple recovery mode and a log file size limit of 600 G. Performing a full backup takes more than 12 hours, resulting in the log file not being able to be truncated and reaching the upper limit, and subsequent transactions cannot be written normally, rendering the entire database unavailable.

problematic phenomenon
The virtual log files (VLFs) in the LDF log files are all active, resulting in the logs not being truncated properly. As the log file size reaches the 600G limit, subsequent transaction writes fail, causing database operations to stall.

Ideas for investigation
The troubleshooting ideas are generally the following:
1, large transactions lead to the log can not be truncated: may be uncommitted large transactions prevent the log truncation.
2、Special environments: such as replication, mirroring, availability group (AG), change data capture (CDC) and other scenarios, the standby side of the anomaly will lead to the main side can not intercept the log.
3. Failure to back up transaction logs in a timely manner: In full recovery mode, failure to back up transaction logs in a timely manner results in log file growth.
4. Database Recovery Time Setting: Modified database recovery time may lead to checkpoint delay, thus prolonging the log file truncation time.

inspection step
Do a case-by-case check along the following lines
1, confirm the database recovery mode: the database is a simple recovery mode, excluding the problem of transaction log backup (Idea 3).
2, check the operating environment: the database is a stand-alone mode, excluding replication, mirroring, AG and other scenarios (Idea 2).
3, check long time transaction: use `DBCC OPENTRAN` check, did not find a long time to run the transaction, to rule out the big transaction problem (idea 1).
4, check the database recovery settings: `TARGET_RECOVERY_TIME` and `recovery interval` for the default value, to rule out the recovery time problem (Idea 4).
5. Check for blockage: No blockage was found.
6. Check SQL Agent jobs: no other jobs were found running except for full backups.
7, check the write logic: communication with the developers learned that the database for the daily batch data write, log write volume of more than 400G.

Analysis of the cause of the problem
In this we need to start from the principle of database full backup, dissect the details of the backup, analyze the reasons, in general, database full backup is divided into two stages:
1. Phase 1: Record the LSN at the beginning of the backup to generate a snapshot backup.
2. Phase 2: At the end of the snapshot backup, record the latest LSN and write the transaction log between these two LSNs to the backup.
Due to the large database, the backup took more than 12 hours to complete, resulting in the backup operation remaining in phase one and not being able to move to phase two. During this time, transactions in the log file could not be truncated, and the amount of transactions in the 12-hour period exceeded 600GB, resulting in the LDF log file being populated.

Optimization Recommendations
The final solution was to turn on the ADR (Accelerated Database Recovery) feature of SQL Server 2019, and the problem was solved with only a small growth in the log file during a full backup.
Steps to Enable ADR
Make sure you are using SQL Server 2019 or SQL Server 2022, as ADR was introduced starting with SQL Server 2019.

--Enabling ADR for databases
ALTER DATABASE [YourDatabaseName]
SET ACCELERATED_DATABASE_RECOVERY = ON;
GO


--Checking ADR Status
SELECT name, is_accelerated_database_recovery_on
FROM 
WHERE name = 'YourDatabaseName';

--Close ADR
ALTER DATABASE [YourDatabaseName]
SET ACCELERATED_DATABASE_RECOVERY = OFF;
GO

Here are a few ideas for optimization:
1、Multi-file backup: Change the default single-file backup to multi-file backup, which can improve the backup efficiency and shorten the backup time. (Previously introduced an inconspicuous function of SQL Server's backup file splitting)
2、Upgrade SQL Server version: Upgrade to SQL Server 2019 or SQL Server 2022, enable ADR (Accelerated Database Recovery) function, and realize timely log truncation through SLOG.
3, business split: although unrealistic, but from the business split can also reduce the amount of transactions in a single database.

summarize

In SQL Server 2019, a 48TB database was taking too long to back up, causing the log file to fail to truncate and reach its limit, preventing transaction writes.
This article describes a variety of solutions including multi-file split backups, enabling database ADR functionality.
Eventually enabling the database ADR feature solved the problem. The important thing to note here is to always try to use the latest database version, such as SQL Server 2019 or SQL Server 2022, for example.
Ensuring that you have access to the latest features can get rid of a lot of hassle.

Supplementary: Principles of Full Database Backup
During a full backup, even if the database is in simple recovery mode, the backup still copies the logs of uncommitted transactions. For long-running transactions, the backup will contain enough log information to undo these uncommitted transactions.
Therefore, even if the database has a small MDF file, the log file (LDF) may result in a very large backup file.
This is almost the same principle as MySQL's Xtrabackup, which generates a start LSN at the beginning of the backup and an end LSN at the end, and if there are uncommitted transactions, MySQL 8.0'sundoxxThe file will be very large

reference article
/en-us/sql/relational-databases/accelerated-database-recovery-concepts?view=sql-server-ver16
/sqlservertip/5971/accelerated-database-recovery-in-sql-server-2019/
/ricos-note/accelerated-database-recovery-a7f0d30b1e0

This article is copyrighted and may not be reproduced without the author's permission.