This post will analyze the execution flow of the next SQL statement in MySQL, including how the SQL query will flow inside MySQL and how the update of the SQL statement is done.
I'll walk you through the MySQL infrastructure before analyzing it. Knowing what components MySQL consists of and what their roles are can help us understand and solve these problems.
A MySQL Infrastructure Analysis
1.1 Overview of MySQL Basic Architecture
Below is a brief architectural diagram of MySQL, from which you can clearly see how a user's SQL statement is executed inside MySQL.
To help you understand the diagram, let's first briefly describe the basic roles of some of the components involved in the following figure, which are described in more detail in Section 1.2.
- Connectors: Authentication and permission related (when logging into MySQL).
- Query Cache: When executing a query statement, the cache is queried first (removed after MySQL version 8.0 because this feature is not very useful).
- Analyzer: If you don't hit the cache, the SQL statement goes through the parser, which, frankly, looks at what your SQL statement is trying to do before checking to see if your SQL statement is syntactically correct.
- Optimizer: Do what MySQL thinks is best.
- Actuator: Executes the statement and then returns the data from the storage engine. -
Simply put, MySQL is divided into a Server tier and a Storage Engine tier:
- Server layer: It mainly includes connectors, query caches, analyzers, optimizers, executors, etc. All cross-storage-engine functionality is implemented in this layer, such as stored procedures, triggers, views, functions, etc., as well as a general-purpose logging module, the binlog logging module.
- storage engine: Mainly responsible for data storage and reading , using replaceable plug-in architecture , support for InnoDB, MyISAM, Memory and other storage engines , which InnoDB engine has its own logging module redolog module .The most commonly used storage engine today is InnoDB, which has been used as the default storage engine since MySQL version 5.5.
1.2 Introduction to Basic Server Tier Components
1) Connectors
The connector is mainly related to authentication and permission-related functions, as if it were a very high-level gatekeeper.
Mainly responsible for user login database, user authentication, including verification of account passwords, permissions and other operations, if the user account password has been passed, the connector will go to the permissions table to query all the user's permissions, and then the permissions of the logical judgment of the connection will rely on the permissions read at this time, that is to say, as long as the connection is not open, even if the administrator modifies the user's permissions, the user will not be affected.
2) Query Cache (Removed after MySQL 8.0)
The query cache is primarily used to cache the SELECT statement we execute and the result set of that statement.
After the connection is established, when the query statement is executed, the cache will be queried first. MySQL will check if the SQL has been executed and cache it in memory in the form of Key-Value, where the Key is the query statement and the Value is the result set. If the cached key is hit, it will be returned directly to the client, if not, it will execute the subsequent operations and cache the results for the next call. Of course, in the real execution of the cached query will still check the user's permissions, whether the table query conditions.
Caching is not recommended for MySQL queries because query cache invalidation can be very frequent in real-world business scenarios. If you update a table, all the query caches on that table will be emptied. For infrequently updated data, using the cache is fine.
So, we generally don't recommend going with query caching in most cases.
The caching feature was removed after MySQL version 8.0. Officially, the feature was also considered to have fewer practical application scenarios, so it was simply removed.
3) Analyzer
MySQL doesn't hit the cache, then it goes to the parser, which is mainly used to analyze what the SQL statement is for, and the parser is divided into several steps:
Step 1: Lexical analysisThe first step is to extract the keywords, such as select, the table to be queried, the field names to be queried, the query conditions to be queried, and so on. After doing these operations, you will go to the second step.
Step 2: Grammatical analysisThe main purpose is to determine whether the SQL you entered is correct and conforms to the MySQL syntax.
After these 2 steps, MySQL is ready to start executing, but how and what is the best way to do it? This is where the optimizer comes in.
4) Optimizer
The role of the optimizer is that it considers the optimal execution scheme to execute (sometimes may not be optimal, this article involves the depth of this part of the knowledge to explain), such as multiple indexes, how to select the index, multi-table query how to choose the correlation order and so on.
It can be said that after the optimizer it can be said that exactly how this statement should be executed is fixed.
5) Actuator
After selecting the execution scheme, MySQL is ready to start the execution, first of all, before the execution, it will check whether the user has any privileges, if not, it will return an error message, if it has the privileges, it will go to call the interface of the engine, and return the result of the interface execution.
II. Analysis of statements
2.1 Query Statements
Having said that, how exactly is a SQL statement executed? In fact, our SQL can be divided into two kinds, one is the query, one is the update (add, modify, delete). We first analyze the query statement, the statement is as follows:
select * from tb_student A where ='18' and =' John Doe ';
In conjunction with the above description, let's analyze the execution flow of this statement:
-
First check if the statement has privileges, if not, it will return an error message directly, if it has privileges, before MySQL version 8.0, it will query the cache first, and use this SQL statement as the key to query whether there is any result in memory, if there is, it will directly cache it, if not, it will execute the next step.
-
Lexical analysis through the analyzer to extract the key elements of the SQL statement, for example, to extract the above statement is a query select, extract the need to query the table named tb_student, the need to query all the columns, the query condition is the id of this table = '1'. Then determine whether the SQL statement has syntax errors, such as whether the keywords are correct, etc., if the check is not a problem on the implementation of the next step.
-
The next step is for the optimizer to determine the execution plan, the above SQL statement, there can be two execution plans: a. First query the student table for students whose names are "Zhang San", and then determine whether the age of 18. b. First find out the students who are 18 years old, and then query the student whose name is " Zhangsan". Then the optimizer selects the most efficient execution plan according to its own optimization algorithm (the optimizer believes that sometimes it is not always the best). Then it confirms the execution plan and is ready to execute.
-
Performs a permission check and returns an error message if you don't have permission, and calls the database engine interface if you have permission, returning the engine's execution results.
2.2 Update Statements
The above is the execution process of a query SQL, so let's look at how an update statement is executed, the SQL statement is as follows:
update tb_student A set ='19' where =' John Doe ';
Let's change the age of Zhang San, in the actual database will not set the age of this field, or to be hit by the technical director. In fact, this statement will basically follow the flow of the previous query, but the execution of the update must be logged, which will introduce the logging module, MySQL comes with the logging module isbinlog (archive log) The InnoDB engine, which we commonly use, comes with a logging module that can be used by all storage engines.redo logWe'll explore the flow of this statement in InnoDB mode. The flow is as follows:
- Querying for the one piece of data, Zhang San, first will not go to the query cache because the update statement will cause the query cache associated with that table to be invalidated.
- Then get the query statement, change the age to 19, and then call the engine API interface, write this line of data, InnoDB engine to save the data in memory, at the same time record redo log, at this time the redo log into the prepare state, and then tell the executor, the execution is complete, can be submitted at any time.
- The executor logs the binlog when it receives the notification, and then calls the engine interface to submit the redo log as a commit state.
- Update complete.
Here there must be students will ask, why use two logging module, with a logging module can not?
This is because at the very beginning MySQL does not have InnoDB engine (InnoDB engine is inserted into MySQL as a plugin by other companies), MySQL comes with MyISAM engine, but we know that redo log is unique to InnoDB engine, other storage engines do not have it, which leads to no crash-safe ability ( This leads to no crash-safe ability ( crash-safe ability even if the database is restarted abnormally, the records previously submitted will not be lost ), binlog log can only be used to archive.
It's not that using only one log module is not possible, it's just that the InnoDB engine supports transactions through redo log. Then, there will be students ask, I use two logging modules, but not so complex can not, why redo log to introduce prepare pre-commit state? Here we use the counterfactual method to explain why we do this?
- Write redo log to commit directly, then write binlogIf the machine hangs after writing the redo log and the binlog log is not written, then after the machine is restarted, the machine will recover the data through the redo log, but the binlog does not record the data at this time, and the data will be lost when the machine is backed up in the future, and the master-slave synchronization will also lose the data.
- Write binlog first, then redo logIf you write a binlog and the machine reboots abnormally, the machine can't recover this record because there is no redo log, but there is a record in the binlog, so for the same reason as above, there will be data inconsistency.
If you use the redo log two-stage commit method is different, after writing the binlog, and then submit the redo log will prevent the above problems, thus ensuring the consistency of the data. So the question is, is there an extreme case? Assuming that the redo log is in a pre-committed state and the binlog has been written, what happens if there is an abnormal restart at this time?
This depends on MySQL's processing mechanism, which is as follows:
- Determine if the redo log is complete, and if it is, commit it immediately.
- If the redo log is just pre-committed but not in commit state, then it will determine whether the binlog is complete or not, and if it is complete, it will commit the redo log, if it is not complete, it will roll back the transaction.
This solves the problem of data consistency.
III. Summary
- MySQL is mainly divided into Server tier and engine tier, Server tier mainly includes connector, query cache, analyzer, optimizer, executor, and there is also a logging module (binlog), this logging module can be shared by all execution engines, redolog only InnoDB has.
- Engine layer is plug-in , the current main including , MyISAM, InnoDB, Memory and so on.
- The execution flow of the query statement is as follows: permission check (if hit cache) ---> query cache ---> parser ---> optimizer ---> permission check ---> executor ---> engine
- The update statement execution flow is as follows: analyzer ---->permission checking ---->executor-->engine--redo log(prepare state)-->binlog-->redo log(commit state)
IV Reference
- MySQL in Action 45 Lectures
- MySQL 5.6 Reference Manual./doc/refman/5.6/en/