Core concepts
- Primary Key Index / Secondary Index
- Clustered Indexes/Non-Clustered Indexes
- Back to Table/Index Override
- index downward thrust
- Union Index/Leftmost Union Match
- prefix index
- explain
I. [Index definition]
1. Index Definition
Beyond the data, the database system maintainsData structures that satisfy specific lookup algorithmsThese data structures are organized asReferencing (pointing to) data in some way, so that advanced lookup algorithms can be implemented on these data structures.This data structure, the index.
2. Data structure of the index
- B-tree / B+-tree (mysql's innodb engine chooses B+-tree as the data structure for indexing by default)
- HASH table
- ordered array
3. Select B + tree instead of B tree as an index
- B-tree data structure: record records are stored in the nodes of the tree
- Data structure of a B+ tree: record records are stored only in the leaf nodes of the tree
- Assuming a data size of 1KB and an index size of 16B, the database is stored using disk data pages, and the default size of a disk page is 16K. the same three times IO.
- B-tree can fetch 16*16*16=4096 data
- B+ tree is able to fetch 1000*1000*1000 = 1 billion pieces of data
II. [Type of index]
1. Primary key indexes and secondary indexes
-
primary key index: The leaf nodes of the index are data rows
-
secondary index: The leaf node of the index is the KEY field plus the primary key index, so it is the primary key value that is first looked up through the secondary index query, and then InnoDB finds the corresponding data block through the primary key index based on the primary key value looked up.
-
innodb's primary index file directly on the row of data, known as clustered indexes, secondary indexes point to the reference to the primary key
-
In myisam, both primary and secondary indexes, point to physical rows (disk locations).
2. Clustered indexes and non-clustered indexes
- A clustered index is an algorithm that reorganizes the actual data on disk to be sorted by the value of a specified column or columns. Characterized by storing the data in the same order as the index order. In general, the primary key will create a clustered index by default, and a table is only allowed to exist a clustered index (reason: once the data is stored, the order can only be one) The figure above shows that innodb's primary and secondary indexes are all clustered indexes
- The leaf nodes of a non-clustered index are pointers to data records as opposed to the leaf clicks of a clustered index which are data records. The biggest difference between a non-clustered index and a clustered index is that the order of the data records is not the same as the index, and therefore the order of the data records in the data
3. Clustered index advantages and disadvantages
-
Advantage: No need to backtrack when there are fewer entries based on the primary key (the data is right under the primary key node).
-
Disadvantage: Frequent page splitting if irregular data insertion is encountered.
III. [Derivation of the concept of indexing]
1. Back to the table
The concept of table return involves the difference between queries with primary key indexes and those with non-primary key indexes
- If the statement is
select * from T where ID=500
,That is, for a primary key query, only the ID tree needs to be searched. - If the statement is
select * from T where k=5
, i.e., a non-primary key index query, would require aFirst search the k index tree to get the value of ID as 500, then go to the ID index tree and search again. - The process of returning from a non-primary key index to a primary key index is called table return.
Queries based on non-primary key indexes require an additional scan of the index tree.Therefore, we should try to use primary key queries in our application. And from the point of view of storage space, since the leaf nodes of the non-primary key index tree store the value of the primary key, then, it should beConsider making the fields of the primary key as short as possible so that the smaller the leaf nodes of the non-primary key index are, the less space the non-primary key index takes up.In general, it is recommended to create a self-incrementing primary key so that the non-primary key index takes up the least amount of space.
2. Index Override
- If one of the conditions in the where clause is a non-primary key index, then the queryFirst locate the primary key index through the non-primary key index (the primary key is located in a leaf node of the non-primary key index search tree); and then locate the query through the primary key index. The process of returning to the primary key index tree in this process is called table return.
- But when ourThe query is the primary key value, then you can provide the query result directly without going back to the table. In other words.In this query, the non-primary key index has "covered" our query requirements, so it is called a covering index.
- Covering indexes allow you to get the query results directly from the secondary indexes without having to go back to the primary indexes for another queryThis is why it is possible to improve performance by reducing the number of searches (no need to go back to the table from the secondary index tree to the clustered index tree), or by reducing IO operations (more nodes can be loaded from disk at once via the secondary index tree).
3. Union index
A joint index is one that indexes multiple columns on a table.
Scene one:
The union index (a, b) isSort by a, b (first by a, then by b if a is the same). Therefore, the following statement can be used directly with a union index to get results (in fact, the leftmost prefix principle is used)
select … from xxx where a=xxx;
select … from xxx where a=xxx order by b;
The following statements, on the other hand, cannot be used with a union query:
select … from xxx where b=xxx;
Scene two:
For a union index (a, b, c), the following statement also gets the result directly from the union index:
select … from xxx where a=xxx order by b;
select … from xxx where a=xxx and b=xxx order by c;
The following statement does not work and requires a filesort sort operation.
select … from xxx where a=xxx order by c;
Summary:
Taking the joint index (a,b,c) as an example, creating such an index is equivalent to creating three indexes, index a, ab, and abc.One index topping three is certainly a good thing; after all, each additional index increases the overhead of write operations and disk space.
4. Leftmost Matching Principle
- The leftmost prefix principle can be appreciated from the joint index example above.
- More than just the full definition of an index, indexes can be utilized to speed up retrieval as long as they satisfy a leftmost prefix. This leftmost prefix can be the leftmost N fields of a union index or the leftmost M characters of a string index.Use the "leftmost prefix" principle of indexes to locate records and avoid duplicate index definitions.
- Therefore, based on the leftmost prefix principle, it is crucial that we consider how to order the fields within the index when defining a union index! The criterion for evaluation is the index's ability to be reused, for example, theWhen there is already an index on the (a,b) field, there is usually no need to create a separate index on a.
5. Index push down
MySQL 5.6 introduced index push-down optimization.You can do judgment on the fields contained in the index first during the index traversal process to filter out the records that do not meet the conditions and reduce the number of words back to the table.
- tabulate
CREATE TABLE `test` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'self-addressing primary key',
`age` int(11) NOT NULL DEFAULT '0',
`name` varchar(255) CHARACTER SET utf8 NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `idx_name_age` (`name`,`age`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
-
SELECT * from user where name like 'narrate%'
Leftmost match principle, hit idx_name_age index -
SELECT * from user where name like 'narrate%' and age=20
- Prior to version 5.6, indexing by name (which at this point was ignoring the
age=20
This condition), match 2 records, then find the corresponding 2 ids. after returning to the table, filter by age=20 - After version 5.6, index push down will be added, after matching 2 pieces of data based on name, it will not be ignored at this time
age=20
condition, it will be filtered based on age before going back to the table. This is index push down, which can reduce the amount of data back to the table and increase query performance
- Prior to version 5.6, indexing by name (which at this point was ignoring the
6. Prefix Index
(coll.) fail (a student)When the index is a very long sequence of characters, the index will be memory intensive and very slow, this is when prefix indexing is used. Often it is possible to index the first few characters instead of the full value to save space and get good performance. The so-calledA prefix index is one that uses the first few letters of the index as an index, but to reduce the duplication rate of indexes, indexes we must also determine the duplication rate of prefix indexes.
- First calculate the uniqueness percentage of the current string field:
select 1.0*count(distinct name)/count(*) from test
- In calculating the percentage of uniqueness for different prefixes:
-
select 1.0*count(distinct left(name,1))/count(*) from test
The percentage of name strings indexed by the first prefix. -
select 1.0*count(distinct left(name,2))/count(*) from test
Take the first two of the name string as a percentage of the prefix indexes - ...
-
- (coll.) fail (a student)
left(str, n)
of n is not increasing significantly, at which point n can be chosen as the number of prefix index intercepts - Creating Indexes
alter table test add key(name(n));
IV. [View index]
When we add the index, how to check the index? Or how do we troubleshoot when the execution of a statement is particularly slow?
explain is typically used to see if an index is in effect.
After we get the logs of the slow query, we look at the logs and observe that those statement executions are slow queries, and we execute them again by adding explain before that statement.Instead of executing the statement, explain sets a flag on the query that causes it to return information about each step in the execution plan when the query is executed.It returns one or more lines of information showing the execution of each part of the plan and the order of execution.
explain Execution statementImportant fields returned:
- type: show is the search method(full table scan or index scan)
- key: the index field used, or null if it is not used
explaintype field:
- ALL: Full Table Scan
- index: index full scan
- range: index range scan
- ref: Scanning with non-unique indexes
- eq_ref: Scanning with unique indexes