Location>code7788 >text

Detailed explanation of database MVCC

Popularity:782 ℃/2025-03-20 21:36:22

MVCC

1. Basic introduction

Database: MySQL. [Many mainstream databases use MVCC, such as MySQL's InnoDB engine, PostgreSQL, Oracle]

MVCC, full name Multi-Version Concurrency Control, is multi-version concurrent control. It is a concurrent control method in the database management system.

The basic principles of MVCC:It is implemented by saving historical versions of the data, soRead operations will not be blocked by write operations, and write operations will not be blocked by read operations.. In this way, the concurrency performance is improved. For example, when a transaction starts reading data, it will see a snapshot of the database at a certain point in time, and then even if other transactions modify the data, the transaction still sees the old version of the data until it commits or rolls back.

Different isolation levels (such as read submitted, repeatable read, serialized) are implemented in MVCC differently. For example, at the read committed level, the latest snapshot is obtained every read, while the repeatable read is to determine the snapshot at the beginning of the transaction, and subsequent reads are based on this snapshot, thus avoiding the problem of non-repeatable reads.

  • Current Reading and Snapshot Reading
  • Current reading: operations such as select lock in share mode, select for update; update, insert, delete (exclusive lock) are all current readings, why is it called current reading? It is the latest version of the record. When reading, it must ensure that other concurrent transactions cannot modify the current record, and will lock the read records.
  • Snapshot reading: LikeThe unlocked select operation is to read the snapshot, i.e. non-blocking read without locking; the premise of snapshot reading is that the isolation level is not the serial level, and snapshot reading at the serial level will degenerate into the current read; the reason for snapshot reading is based on the consideration of improving concurrency performance. The implementation of snapshot reading is based on multi-version concurrency control, that is, MVCC. It can be considered that MVCC is a variant of row lock, but in many cases, it avoids locking operations and reduces overhead; since it is based on multiple versions, that is, what the snapshot reading may read is not necessarily the latest version of the data, but may be the previous historical version

MVCC is for implementationRead-write conflict without locking, and this reading refers to snapshot reading, not current reading. Current reading is actually a lock operation, which is the implementation of pessimistic locking.

Current Reading <==> Pessimistic Lock; Snapshot Reading <=> Optimistic Lock?

Implementation principle

A secret that the database doesn't know.

In addition to our customized fields, each row record also has fields such as DB_TRX_ID, DB_ROLL_PTR, DB_ROW_ID, etc. that are implicitly defined by the database.

  • DB_TRX_ID: 6byte, most recently modified (modify/insert) transaction ID: record creates this record/last modified transaction ID ---------[Transaction ID
  • DB_ROLL_PTR: 7byte, rollback pointer, pointing to the previous version of this record (stored in rollback segment) -----------------[Roll back the pointer
  • DB_ROW_ID: 6byte, implicit self-increment ID (hidden primary key). If the data table does not have a primary key, InnoDB will automatically generate a clustered index with DB_ROW_ID ---------[Hide primary key
  • There is actually a hidden field for deleting flags. The record is updated or deleted does not mean that it is really deleted, but that the deleting flag has changed --------[Delete Mark]

① MVCC core mechanism

  • Version chain: Each row of data passes through hidden fieldsDB_TRX_ID(Transaction ID) andDB_ROLL_PTR(Rolling back pointer) Maintain multiple versions.
  • ReadView: A "snapshot" is generated when the transaction starts, recording the ID list of the currently active transaction, which is used to judge the visibility of the data version.
  • Transaction isolation level: Under different isolation levels, the generation rules of ReadView are different (for example, "Read Committed" generates a new snapshot for each read, and "Repeatable" uses snapshots at the start of the transaction).

②Example

There is such a table with records.

id name age DB_TRX_ID DB_ROLL_PTR
1 Alice 25 100 0x0000

Initial version by transactiontxid=100Submit generation;DB_ROLL_PTRPoint to the old version (initiallyNULL)。

At this time, two transactions came.

  • Transaction A (txid=200):implementUPDATE user SET age=26 WHERE id=1
  • Transaction B (txid=201): Execute before transaction A is submittedSELECT * FROM user WHERE id=1

Transaction A updates data, InnoDB engine will create a new version for this line of data and modify itDB_TRX_ID=200DB_ROLL_PTRPoint to the old version (the version committed by transaction 100).

At this time, the transaction A has not been submitted yet, the new version (V2) has not been submitted yet, and the old version (V1) still exists.

id | name | age | DB_TRX_ID | DB_ROLL_PTR 【This is the new version V2】
 ---|-------|-----|-----------|------------
 1 | Alice | 26 | 200 | 0x1234 (point to old version V1)
                                  丨丨
                                  丨丨
                                   ↓

 id | name | age | DB_TRX_ID | DB_ROLL_PTR 【This is the old version V1】
 ---|-------|-----|-----------|------------
 1 | Alice | 25 | 100 | 0x0000

Then, transaction B reads the data, and a transaction B will generate aReadView, record the ID list of the currently active transaction. Assume that only transaction A (txid=200) has not been committed at this time, and the active transaction list is[200]

Transaction B judges the visibility of the data version based on ReadView:

  • rule: If the data version isDB_TRX_ID Less than the current transactiontxid, and the transaction has been submitted, then it can be seen.
  • The latest version of the current data is V2 (DB_TRX_ID=200), but the ReadView of transaction B showstxid=200is active (not committed), so V2 is not visible.
  • Transaction B rolls the pointer along the rollback (DB_ROLL_PTR) Found the old version V1 (DB_TRX_ID=100),judge100 < 201And it has been submitted, so the data of V1 is returned:

Therefore, the transaction B reads as follows:

id | name  | age 
---|-------|-----
1  | Alice | 26  

You can see that the above read and write seems to have no locking. .

③Visibility rules

The logic for InnoDB to determine whether the data version is visible is as follows:

  1. If the data versionDB_TRX_ID Less thanThe current transactiontxidandThe transaction has been submitted → visible, [How do you think if it has been submitted? Look at the active list of transactions].
  2. If the data versionDB_TRX_ID equalThe current transactiontxid→ Visible (transactions can see their own modifications).
  3. If the data versionDB_TRX_ID Greater thanThe current transactiontxid→ Not visible (belongs to future modifications).
  4. If the data versionDB_TRX_IDIn the transaction's ReadView active list → Not visible (the transaction has not been committed yet).

MVCC realizes read and write without blocking and high concurrency by maintaining multi-version data and ReadView mechanism:

  1. Write operation: Generate a new version and does not affect the read operation of other transactions.
  2. Read operation: Read old version based on snapshots without locking.
  3. Submitted version: It is visible to new transactions, but the old transactions still see historical versions.

This mechanism ensures transaction isolation while greatly improving database concurrency performance.

3. Supplementary knowledge points

In MySQLundo logredo logandbinlogThey are three core logs, each with different responsibilities, and jointly guarantee the transactionality, durability and high availability of the database.

The following three are explanations for reference deepseek.

① Undo Log (rollback log)

effect

  • Transaction rollback: Record the old value before data modification, and is used to restore the original data when transaction rollback.
  • MVCC (multi-version concurrent control): Provide historical version data and support Consistent Non-locking Read.

Working mechanism

  • When executedINSERTUPDATEorDELETEInnoDB saves the modified data to undo log.
  • Transaction rollback: Through undo log reverse operation, restore the data to its state before modification.
  • MVCC implementation: When other transactions read data, if the current version is not visible (if not committed), the visible historical version is found through undo log.

See mvcc above.

② Binlog (binary log)

effect

  • Master-slave copy: Record all database modification operations for data synchronization from the master library to the slave library.
  • Data recovery: Supports recovery of data by point-time (Point-in-Time Recovery, PITR).

Working mechanism

  • Logical log: Records the logical changes of SQL statements or rows (depending onbinlog_formatSTATEMENTROWMIXED)。
  • Write timing: Write to binlog after transaction is committed (unlike redo log).
  • Brush disk control:Depend onsync_binlogParameters control the frequency of the brush disk.

Let me give you an example:

-- Transaction C: Delete a row of data
 BEGIN;
 DELETE FROM users WHERE id = 1;
 COMMIT;

 -- After submission, binlog records the DELETE statement (or row change).  Read binlog from the library and playback to maintain data consistency.

③ Redo Log (redo log)

effect

  • Crash recovery: Ensure the persistence of transactions (Durability). Even if the database crashes, committed transaction modifications are not lost.
  • Write-Ahead Logging (WAL): Before modifying the data,Record redo logs to disk

Working mechanism

  • When committing a transaction, all modified physical page changes are first recorded to the redo log (sequential writing, high performance).
  • Flashing strategy:Depend oninnodb_flush_log_at_trx_commitcontrol:
    • 1(Default): Flash disk every time you submit it, ensuring that the crash recovery does not lose data.
    • 0or2: Delayed brushing, higher performance but may lose some data.
  • Crash recovery: After restarting, replay the modifications of unwritten data files through redo log.

Let me give you an example:

-- Transaction B: Insert a new data
 BEGIN;
 INSERT INTO users (id, name) VALUES (2, 'Bob');
 COMMIT;

 -- When submitting, redo log records the physical page modification of the insert operation, and the data may not be written to disk after that.
 -- If it crashes at this time, restore the insertion operation according to the redo log after restarting.

Take the update operation as an example:

  1. Transaction execution: Before modifying data, record the old value inundo log;In the end, you still need to modify the disk data, and record the modified physical page locationredo log
  2. Transaction commit:redo logTagged asprepareStatus and swipe the disk. WritebinlogAnd brush the disk. Willredo logTagged ascommitStatus (two-stage submission to ensure consistency).
  3. Crash recovery: If the crash occurs before binlog writes, the transaction rolls back (via undo log). If binlog has been written, it will be modified according to redo log replay.

1. Why do you need redo log and binlog?

  • redo logIt is the physical log of the InnoDB engine layer, ensuring crash recovery and persistence.
  • binlogIt is the logical log of the MySQL Server layer, which supports master-slave replication and cross-engine data recovery.
  • The two are combined to ensure data consistency through "two-stage submission".

2. Will undo log be deleted?

  • Undo log can be cleaned when a transaction is committed and no other transactions require access to the old version of data.
  • Long transactions can cause undo log stacking ("version bloating") and affect performance.

3. The writing order of binlog and redo log?

  • When committing a transaction, write redo log (prepare) first → then binlog → finally write redo log (commit).

  • This is to ensure consistency between binlog and redo log when crash recovery (two-stage commit).

  • undo log: Guarantee the atomicity of transactions and MVCC.

  • redo log: Ensure the persistence and crash recovery of transactions.

  • binlog: Ensure data replication and logical recovery.

end.Reference

refer to:/jelly12345/p/