Mysql chapter - three major logs

summarize

undo log (rollback log): is a log generated by the Innodb storage engine layer that implements a transaction in theatomicity, mainly used for transaction rollback and MVCC.
redo log (redo log): is a log generated by the Innodb storage engine layer that implements the transaction in thedurability, mainly used for fault recovery such as power down;
binlog (archive log): It is a log generated at the Server level and is mainly used for data backup and master-slave replication;

Rollback log (undo log)

corresponds English -ity, -ism, -ization

A version of the data before the transaction occurs is preserved, which can be used for rollback, guaranteeing atomicity
One of the key factors in realizing reads under multiple version concurrency control (MVCC), also known as non-locked reads, is that MVCC is realized through the Read View + undolog version chaining, which can be seen specifically in theMVCC's snapshot reads

element

Logical format logs, when executing undo, only logically restore the data to the state before the transaction, rather than operating from the physical page to achieve this, which is different from redo log.

Whenever the InnoDB engine performs an operation (modification, deletion, addition) on a record, record all the information needed for rollback into the undo log, for example:

existInsertWhen you roll back a record, you should write down the primary key value of the record, so that when you roll it back later, you only need to write down the value of the primary key corresponding to theRecord deletedJust fine;
existDeleteWhen you write down the contents of a record, write down all the contents of that record so that when you roll back later, you can write down the contents of theRecord insert insertJust go to the table;
existupdateWhen a row is updated, make a note of the old values of the columns that were updated, so that when you roll back the columns later, you will have to write down the old values of the columns that were updated.Update to old valueJust fine.

When did it come about?

Prior to the commencement of the transactionMySQL will first record the pre-update data into the undo log log file, which can be used to roll back the transaction when the transaction is rolled back. At the same time, undo also generates a redo to ensure the reliability of the undo log.

When do you brush the plate?

The undo log and the data page swipe strategy are the same, both need to ensure persistence through redo log. The generation of undo logs is also accompanied by the generation of redolog, which is similar to the mechanism for protecting transaction persistence.

There are undo pages in the buffer pool, and modifications to undo pages are recorded in the redo log. redo log is flushed every second, and it is also flushed when a transaction is committed. both data pages and undo pages rely on this mechanism to ensure persistence, see below for details.

Redo log

corresponds English -ity, -ism, -ization

Ensure that transactions are persistent.
- In order to prevent the problem of data loss due to power failure, when a record needs to be updated, the InnoDB engine will first update the memory (while marking it as a dirty page), and then record the modification of this page in the form of a redo log, and then the update is complete. In other words, redo log is designed to prevent the loss of dirty pages in the Buffer Pool.
- When restarting the mysql service, it is redone based on the redo log, thus achieving the property of transaction persistence.
Changes write operations from "random writes" to "sequential writes" to improve the performance of MySQL writes to disk.

element

Physical format logs record information about modifications to physical data pages, and their redo logs are sequentially written to the physical file in the redo log file. At the same time, after modifying the Undo log in the memory, you need to record the redo log corresponding to the undo log.

Difference between redo log and undo log.

redo log records the transactionpost-completionThe state of the data, recorded as the value after the update;
undo log logged the transactionbefore the beginning (of sth)The state of the data, recorded as the value before the update;

When did it come about?

The redo log is generated after the transaction starts, and the redo log is not written with the commit of the transaction, but during the execution of the transaction, it starts to be written in the redo log file.

If a crash occurs before the transaction commits, the transaction is rolled back via undo log after a restart, and if a crash occurs after the transaction commits, the transaction is restored via redo log after a restart, as shown below:

The redo log has to be written to disk, and the data has to be written to disk, so why is this redundant?

Writing a redo log uses an append operation, so the disk operation is a sequential write, whereas writing data requires finding the write location before writing to disk, so the disk operation is a random write. Sequential writes to disk are much more efficient than random writes, so the overhead of writing a redo log to disk is lower.

When do you brush the plate?

In fact, the redo log generated during the execution of a transaction is not written directly to disk either, as this would result in a large number of I/O operations, and disk runs much slower than memory.

The redo log has a buffer, Innodb_log_buffer, with a default size of 16M. Whenever a redo log is generated, it is first written to the redo log buffer and then persisted to disk.

The innodb log buffer's logs will then be flushed to disk in three ways.

When MySQL shuts down normally;
A disk drop is triggered when the amount of writes recorded in the redo log buffer is greater than half of the redo log buffer memory space;
InnoDB's background thread persists the redo log buffer to disk every 1 second.
The redo log cached in the redo log buffer is persisted directly to disk each time a transaction commits.

Thus the write disk of the redo log buffer is not necessarily written to the redo log file with the commit of the transaction, but rather gradually as the transaction begins.

Even if a transaction has not yet committed, the Innodb storage engine still flushes the redo log buffer to the redo log file every second.

This is important to know because it goes a long way to explaining that the commit time for even the largest transactions is very short.

Files in redolog

The two redo log files are named: ib_logfile0 and ib_logfile1.

The redo log file group works in a round-robin fashion. The InnoDB storage engine writes the ib_logfile0 file first, then switches to the ib_logfile1 file when the ib_logfile0 file is full, and then switches back to the ib_logfile0 file when the ib_logfile1 file is full as well; this is the equivalent of a A ring.

Both write pos and checkpoint move in a clockwise direction;
The portion between write pos ∼ checkpoint (the red portion of the figure) is used to record new update operations;
The portion between check point and write pos (the blue portion in the figure): the dirty data page record to be dropped;

Therefore, if the write pos catches up with the checkpoint, it means that the redo log file is full, and MySQL can't perform any new updates, which means MySQL is blocked!

Binary log (binlog)

corresponds English -ity, -ism, -ization

Used for replication, in a master-slave replication, the slave uses the binlog on the master for replay and master-slave synchronization.
For point-in-time based restoration of databases, i.e. backup recovery

element

The binlog has 3 format types, STATEMENT (default format), ROW, and MIXED, with the following differences:

STATEMENT: Every SQL that modifies the data will be recorded in the binlog (which is equivalent to recording logical operations, so for this format, the binlog can be called a logical log), and the slave side of the master-slave replication will reproduce it according to the SQL statement. However, STATEMENT has the problem of dynamic functions, for example, if you use the uuid or now functions, the result you execute on the master is not the result you execute on the slave, and this kind of function that changes all the time will lead to inconsistency in the replicated data;
ROW: Record rows of data was eventually modified into what kind of log (this format of the log, it can not be called a logical log), will not appear under the STATEMENT dynamic function of the problem. However, the disadvantage of ROW is that the results of changes in each row of data will be recorded, such as the execution of bulk update statement, update how many rows of data will produce how many records, so that the binlog file is too large, while in the STATEMENT format will only record an update statement;
MIXED: contains STATEMENT and ROW modes, it will automatically use ROW mode and STATEMENT mode according to different situations;

Note: Different log types under master-slave replication have the same effect on update_time in addition to the dynamic function issue. Generally, update_time in the database is set to ON UPDATE CURRENT_TIMESTAMP, which means that the timestamp column is updated automatically. Under master-slave replication, the
If the log format type is STATEMENT, since the record is a sql statement, in the salve side is to do statement replay, then the update time is also the time when replay, at this time, the slave will have the problem of time delay;
If the log format type is ROW, this is a record of what the row data ended up being modified to, and this slave's data is identical to the master server.

When did it come about?

(political, economic etc) affairsAt the time of submissionIf you want to record a transaction's sql statements (a thing may correspond to more than one sql statement) into the binlog in a certain format at one time, you can do it in the binlog.

The binlog file is a log of all database table structure changes and table data modifications. It does not log query operations, such as SELECT and SHOW operations.

The obvious difference between here and redo log is that binlog is an append write, write a full file, create a new file to continue to write, will not overwrite the previous logs, save the full amount of logs. redo log is a cyclic write, the size of the log space is fixed, all the writes are full from the beginning, to save the dirty pages of logs that have not been brushed to disk.

That is, if the entire database data is accidentally deleted, you can only use the bin log file to recover the data. This is because redo log cyclic writes will erase the data.

Master-Slave Replication Implementation

MySQL's master-slave replication relies on a binlog, which is a record of all changes made on MySQL and is saved in binary form on disk. The process of replication is to transfer the data in the binlog from the master to the slave.

This process is generally asynchronous, i.e., the thread performing the transaction operation on the master repository does not wait for the thread replicating the binlog to finish synchronizing.

The master-slave replication process for a MySQL cluster is as follows:

Write Binlog: MySQL repository writes to the binlog after receiving the client's request to commit the transaction, and then commits the transaction to update the data in the storage engine, and returns the response of "Operation Successful" to the client after the transaction is completed.
Synchronized Binlog: The slave will create a dedicated I/O thread to connect to the master's log dump thread to receive the master's binlog, then write the binlog information to the relay log, and then return to the master with the response of "Replication Successful".
Playback Binlog: The slave library creates a thread for playing back the binlog, reads the relay log, and then plays back the binlog to update the data in the storage engine, and finally realizes the data consistency of the master and the slave.

When do you brush the plate?

The timing of flushing is different from redolog, which can flush every 1 second even if the transaction is not committed. However, a transaction's binlog cannot be split, so no matter how big the transaction is (e.g., many statements), it must be written at once. If a transaction's binlog is split, it will be executed in the backup repository as if it were multiple transactions, which destroys atomicity and is problematic.

The bin log log is similar to the redo log and has a corresponding cache called binlog cache. the binlog cache is written to the binlog file when the transaction is committed.

The write in the figure refers to the log to the binlog file, but does not persist the data to disk, because the data is still cached in the file system's page cache, write the write speed is still relatively fast, because it does not involve disk I/O.
The fsync in the figure is the operation that persists the data to disk, which involves disk I/O, so frequent fsync will result in high disk I/O.

MySQL provides a sync_binlog parameter to control how often the database binlog is brushed to disk:

When sync_binlog = 0, it means that every time you commit a transaction, you only write, you don't fsync, and you leave it up to the operating system to decide when to persist the data to disk;
sync_binlog = 1 means that every time a transaction is committed, it writes and then immediately executes fsync;
sync_binlog =N(N>1) means write every time you commit a transaction, but fsync only after accumulating N transactions.

Obviously, the system default setting in MySQL is sync_binlog = 0, which means that it does not do any mandatory disk refresh commands, which is the best performance, but also the most risky. Because once the host has an abnormal reboot, the data that has not yet been persisted to disk will be lost.

When sync_binlog is set to 1, it is the safest setting but with the biggest performance loss. Because when set to 1, even if the host abnormal reboot, at most, the loss of a transaction of the binlog, and has been persisted to disk data will not have an impact, but that is too much impact on the performance of the write.

If you can tolerate the risk of losing the binlog logs of a small number of transactions, you will usually set sync_binlog to one of 100~1000 to improve write performance.

Two-stage submission

After the transaction is committed, both redo log and binlog should be persisted to disk, but these two are independent logics, and there may be a semi-successful state, which results in inconsistent logic between the two logs. The following:

If MySQL suddenly goes down after flushing the redo log to disk, and the binlog hasn't had a chance to be written yet, then the machine will be restored via the redo log, but the binlog won't have recorded the data at that time. When the machine is restarted, the machine restores the data via redo log, but the binlog does not record the data at this time, and the subsequent backup of the machine loses the data, as well as the master-slave synchronization.
If MySQL suddenly goes down after flushing the binlog to disk, and the redo log hasn't been written yet. Since the redo log hasn't been written, the transaction is invalidated after the crash is recovered, and the binlog contains the update statement. In a master-slave architecture, the binlog is replicated to the slave, and the slave executes the update statement, which is inconsistent with the master's value;

A two-phase commit splits the commit of a single transaction into two phases, the "Prepare" phase and the "Commit" phase.

the whole process

Transaction commit process has two stages, that is, the write redo log is split into two steps: prepare and commit, interspersed with write binlog, as follows:

Prepare phase: Writes the XID (ID of the internal XA transaction) to the redo log, sets the transaction state of the redo log counterpart to prepare, and then persists the redo log to disk (what innodb_flush_log_at_trx_commit = 1 does);
commit phase: write the XID to the binlog, then persist the binlog to disk (the role of sync_binlog = 1), and then call the engine's commit transaction interface to set the redo log state to commit, at this time the state does not need to be persisted to disk, only need to write to the page cache of the file system is enough. This state does not need to be persisted to disk, but only needs to be written to the page cache of the file system, because as long as the binlog is successfully written to disk, even if the state of the redo log is still prepare, it does not matter, and the same will be considered that the transaction has been successfully executed;

In a nutshell, after the transaction is committed, the redo log becomes the prepare phase, then writes to the binlog, and the redo log enters the commit phase after a successful return.

Summarize the specific flow of the three logs

When the optimizer analyzes the least costly execution plan, the executor starts the update operation according to the execution plan.

The process of updating a record UPDATE t_user SET name = 'xiaolin' WHERE id = 1; is as follows.

Checks to see if it is in the buffer pool. The executor is responsible for the specific execution and will call the storage engine's interface to search through the primary key index tree to get the row with id = 1:
- If the data page where the line id=1 is located was already in the buffer pool, it is returned directly to the executor for updating;
- If the record is not in the buffer pool, read the data page from disk into the buffer pool, returning the record to the executor.
Checks to see if it is already the value to be updated. When the executor gets the clustered index record, it looks to see if the record before the update is the same as the record after the update:
- If it's the same there will be no subsequent update process;
- If it's not the same pass both the pre-update and post-update records as parameters to the InnoDB layer and let InnoDB actually perform the operation of updating the records;
Open transaction, record the undo log, and record the redo log corresponding to the modification of the undo log: open transaction, before the InnoDB layer updates the record, it first needs to record the corresponding undo log, because this is an update operation, you need to write down the old value of the updated columns, that is, you need to generate an undo log, and the undo log will be written to the Buffer The undo log is written to the Undo page in the Buffer Pool, but after the Undo page is modified in memory, the corresponding redo log is recorded.
Mark as dirty and write to redo log: When InnoDB layer starts updating the records, it will first update the memory (while marking it as dirty), and then write the records to the redo log, and then the update will be completed. In order to reduce disk I/O, the dirty pages are not written to disk immediately, and a background thread chooses an appropriate time to write the dirty pages to disk. This is the WAL technique, MySQL write operations are not immediately written to disk, but first write the redo log, and then write the modified rows to disk at the right time.
So far, one record has been updated.
Record binlog: After the execution of an update statement is completed, then start recording the binlog corresponding to the statement, at this time the recorded binlog will be saved to the binlog cache, and there is no refresh to the binlog file on the hard disk, when the transaction is committed to unify all the binlogs in the process of running the transaction will be refreshed to the hard disk.
Transaction commits, redo log and binlog flushes.

Interview questions column

Java interview questions columnIt's online, so feel free to visit.

If you don't know how to write a resume, resume projects don't know how to package them;
If there's something on your resume that you're not sure if you should put on it or not;
If there are some comprehensive questions you don't know how to answer;

Then feel free to private message me and I will help you in any way I can.