Location>code7788 >text

Senior Engineer Interviews - Database

Popularity:270 ℃/2024-08-15 13:04:30

Related Interview Questions

1.1 Why does Redis execute so fast?

(1) Pure memory operation: Redis stores all data in memory, which means that read and write operations on the data are run directly in memory, and the access speed of memory is much higher than disk. This design allows Redis to process data reads and writes at speeds close to the hardware limit.
(2) Single-threaded model: Redis uses a single-threaded model to process client requests. This may sound inefficient, but in fact, this design avoids the performance overhead of frequent multi-threaded switching and excessive contention.The execution time of each request is very short, so Redis can handle a large number of concurrent requests in a single thread.
(3) I/O multiplexing: Redis uses I/O multiplexing technology, you can listen to multiple client connections at the same time in a single-threaded environment, and only when there is a network event (such as the user sends a request) occurs only when the actual I/O operation. This effectively utilizes CPU resources and reduces unnecessary waiting.
(4) Efficient Data Structures: Redis provides a variety of efficient data structures, such as hash tables, ordered collections and so on. The implementation of these data structures are optimized, making Redis very efficient in dealing with the operation of these data structures

1.2 Is Redis a single-threaded or multi-threaded execution? Does it have thread-safety issues? Why?

Redis versions prior to 6.0 ran in a single thread. All client request processing, command execution, and data read and write operations were done in a single main thread. The purpose of this design is to prevent lock contention and context switching performance overheads in a multi-threaded environment, thus ensuring performance in highly concurrent scenarios.

Redis version in 6.0, began to introduce multi-threaded support, but this is limited to the network I / O level, that is, in the network request phase using the working thread for processing, for the execution of the instruction process, is still in the main thread to deal with, so there will be no more than one thread to notify the execution of the operation instruction situation

On the issue of thread security, from the Redis service level, Redis Server itself is a thread-safe K-V database, that is to say, the commands executed on top of the Redis Server do not require any synchronization mechanism, there will be no thread-safety issues.

1.3 In practice, what business scenarios are implemented using Redis?

Redis is widely used in practice in a variety of business scenarios, the following are some common examples:

Caching: Redis, as an in-memory database in Key-Value form, is the first application scenario that comes to mind as a data cache.Redis provides key expiration and key obsolescence policies, so there are many occasions when Redis is used for caching.

Leaderboards: many websites have leaderboard applications, such as Jingdong's monthly sales lists, products uploaded to the list by time, etc. Redis provides the ordered set (zset) data class

Distributed sessions: in cluster mode, generally build session services centered on in-memory databases such as Redis and other memory, the session is no longer managed by the container, but by the session service and the in-memory database management

Distributed locks: In highly concurrent scenarios, you can utilize Redis' setnx feature to write distributed locks

 

 

1.4 What are the common data types used in Redis?

String: the simplest data type, can contain any data, such as text, binary data and so on. Common usage scenarios are storing session information, storing cache information, storing integer information, you can use incr to achieve integer +1, use decr to achieve integer -1
List: an ordered collection of string elements, supports double-ended insertion and deletion operations, and can be used as a queue or stack.
Hash: Used to store objects, similar to an associative array. Each hash can contain fields and values associated with them. Common use scenarios are to store Session information, storage of goods shopping cart , shopping cart is very suitable for hash dictionary representation , the use of personnel as a unique number of the dictionary key, value value can be stored in the id and quantity of goods and other information, storage of information such as detail pages
Set (Set): an unordered and unique collection of keys and values. Its common usage scenario is the fairy follow function, such as the people who follow me and the people I follow, using the set storage, can ensure that the people are not duplicated
Sorted Set: Using zset, it is equivalent to the Set collection type with one more sorting attribute score. Its common use scenario is that it can be used to store ranking information, attention to the list of functions, so that according to the attention to achieve the sorting display

 

1.5 How are ordered collections implemented at the bottom?

Prior to Redis 7, ordered collections used ziplist (compressed list) + skiplist (jump table), when the data list has less than 128 elements and the length of all element members is less than 64 bytes, the ziplist is used for storage, otherwise the tone table is used for storage

After Redis, ordered collections use listPack (compact list) + skiplist (skiplist)

 

1.6 What is a jump meter? Why use a jump meter?

 

Skiplist is a data structure that exchanges space for time. Because the chain table can not be binary lookup, so draw on the idea of database indexing, extract the key sister points in the chain table (index), now the key nodes on the lookup, in the lower level of the chain table lookup to extract the key nodes of the multi-layer, the formation of the skiplist. But because the index to occupy a certain amount of space, so the more the index is added, the more space it occupies.

For a single chained table, even if the data stored in the chained table is ordered, if we want to find a certain data in it, we can only traverse the chained table from beginning to end. The search efficiency will be very low, and the time complexity will be very high O(N)

From this example, we can see that after adding a layer of indexes, the number of nodes that need to be traversed to find a node is reduced, which means that the efficiency of finding a node is increased. The time complexity goes from O(n) to O(logn), which is a space-for-time solution.

 

1.7 How to implement distributed locking using Redis?

Using Redis to implement distributed locking can be achieved through the setnx (set if not exists) command, but when we use setnx to create the keys successfully, then the table locking success, otherwise the code locking failure, the implementation example is as follows:

127.0.0.1:6379> setnx lock true
(integer) 1#Lock created successfully
# Logic Business Processing ...

When we repeat the lock, only the first time the lock will be added successfully

127.8..1:6379> setnx lock true # first time locking
(integer) 1
127.8.8.1:6379> setnx lock true # 2nd locking
(integer) 0

 

As you can see from the above command, we can see if the locking is successful by seeing if the execution result returns 1 or not

Releasing Distributed Locks

127.0.0.1:6379> de1 lock
(integer) 1 #Release the lock

However, if you use setnx lock true to realize distributed locks there will be deadlock problems, that setnx such as not set the expiration time, locks forget to delete or locking thread downtime will lead to deadlock, that is, distributed locks have been occupied situation

Solving Deadlock Problems

The deadlock problem can be solved by setting a timeout, if the timeout is exceeded, the distribution lock will be automatically released, so there will be no deadlock problem also known as setnx and expire work together, after Redis 2.6.12 version, a new powerful feature, we can use an atomic operation, that is, a command to perform setnx and expire in a single command, as follows.

where ex sets the timeout, and nx is the element non-null judgment, which is used to determine whether the lock can be used normally.
Therefore, the most straightforward solution for implementing distributed locking in Redis is to use set key value ex timeout nx.

 

1.8 Redis data persistence solution?

Redis is an in-memory database, once a power failure or server process exits, all data in the in-memory database will be lost, so you need Redis persistence

Redis persistence is the working mechanism of saving data on disk, using permanent storage media to save the data, and restoring the saved data at a specific time

Redis provides two persistence mechanisms:
RDB (Redis DataBase): store data results, focus on data (snapshots)

AOF (Append Only File): stored procedures, focusing on the process of manipulating data (commands)

RDB vs AOF triggering methods, advantages and disadvantages:
RDB's trigger method:

Manual trigger: Manually generate a snapshot with the command (save, bgsave)

Auto Trigger: Trigger the automatic generation of snapshots through the setting of configuration parameters.

Drawbacks:

Snapshots have intervals and cannot be backed up in real time, and more data may be lost

Turning on sub-processes to back up data can be very time-consuming when the dataset is large, causing the server to stop processing clients for a certain period of time.

Pros:

1. Recover data faster

2. The backup file is the size of the original memory data, and will not increase the data occupation.

How AOF is triggered

1. Manual trigger

Via bgrewriteaof command: re-AOF persistence to generate aof file (trigger rewrite)

2. Automatic triggering

By default, redis does not have AOF enabled (it uses RDB persistence by default), so you need to enable it through the configuration file.

 

Advantages and disadvantages of AOF

Pros:

High data security, not easy to lose data

The AOF file saves all write operations in an organized and readable manner

Drawbacks:

The AOF method generates large file sizes

Data recovery is slower than RDB

1.9 Redis Interview Questions - Cache Penetration, Cache Hit, Cache Avalanche

1 Penetration: Neither side exists (Emperor's New Clothes) (Blacklist) (Bloom Filter)

2 Breakthrough: a hot key fails, when a large number of concurrent requests arrive directly at the database. (Warm up in advance)

3 avalanche: a large number of keys fail at the same time (to avoid a large number of keys failing at the same time, staggered)

 

1.91 Redis Stale Key Deletion Policy

1) Inert deletion: let the key expire, but every time you get a key from the keyspace, check if the key is expired, if it is expired, delete the key; if it is not expired, return the key.

2) Periodic Deletion:Every once in a while the program checks the database and deletes the expired keys in it. The algorithm decides how many expired keys are to be deleted and how many databases are to be checked.

3) Memory Elimination Strategy When the size of the Redis in-memory dataset rises to a certain size, a data elimination strategy is implemented.

1.92 Redis Master-Slave Synchronization Mechanism

The steps are as follows: (full volume)

1. The slave server sends the synchronization command sync to the master server;

2. When the master database receives the synchronization command, it executes the bgsave command, generates an rdb file in the background, and uses a buffer to record all write commands executed from now on;

3. When the master server finishes executing the bgsave command, the master server sends the rdb file generated by the bgsave command to the slave server;

4. The slave server receives this rdb file and loads it into memory; after that, the master server synchronizes the commands that were just in the cache, and the slave server executes them. (The slave server will then execute the named commands.)

5. After the above processing is complete, and after that the master database executes each write command, it will send the executed write command to the slave database.

1.93 How Redis and Mysql Database Data Are Consistent

1, after finding that there is no data in the cache, before executing the query database, lock the Key, query the database and put it into the cache and then unlock it, so as to avoid the problem of cache breakdown, when a redis data does not exist, a large number of threads concurrently query the database.

2, in the need to perform double deletion before the Key to lock, after the implementation of the deletion of the cache, update the database, put the new data into the cache, in the unlocking. Ensure cache and data consistency.

3, locking Key need to set the expiration time, to avoid deadlocks caused by downtime.

1.94 Redis cluster

Redis provides a variety of cluster modes to accommodate high availability and horizontal scaling requirements in different scenarios. The following are Redis cluster modes:

Master-Slave mode:

In this mode, there is a master node that handles write requests, while the slave nodes replicate the data from the master and provide read services.

Advantages: Simple to implement, can realize data redundancy, and improve system performance through read-write separation.

Disadvantages: need to manually failover, can not automatically handle the failure of the master node; does not support automatic data partitioning (sharding), it is difficult to achieve horizontal expansion.

Sentinel (Sentinel) mode:

Sentinel is a high availability solution provided by Redis that monitors the status of master and slave nodes and automatically completes failover in the event of a master node failure.

Benefits: Solves the problem of manual failover in master-slave mode and provides automated monitoring and failure recovery mechanisms.

Cons: Although it adds automation over master-slave mode, it still doesn't support automatic data partitioning and the complexity of management and configuration increases as the number of nodes increases.

Redis Cluster mode:

Redis Cluster is an officially supported distributed solution that uses data sharding to spread data across multiple nodes.

Advantages: truly realized distributed storage, each node can handle read and write requests, with a good level of scalability; built-in automatic data segmentation, fault detection and transfer functions.

Disadvantages: more complex compared to other modes, requiring more network resources and configuration management; the client needs to support clustering features; cross-slot data operations may involve multiple nodes, with a certain degree of complexity.

1.95 How data is saved in redis and the structure of the saved data

memory structure

 

 

 

Dictionary (dict) implementation

redis comprehensive databaseThe mapping relationship is implemented through dict. key has a fixed type of string and value has various types.

The KV organization in redis is implemented through dictionaries; when there are more than 512 nodes in the hash structure or when the length of a single string is greater than 64, the hash structure is implemented using a dictionary

dict consists of hash table dictht + hash node dictEntry. There are two hash tables, usually ht[0] is used, ht[1] is not used; when rehash, ht[0] stores the data before rehash, ht[1] stores the new data and the data migrated from ht[0].

(1) The string has been calculated by the hash function to get a 64-bit integer;

2) The same string is passed through the hash function several times to get the same 64-bit integer;

3) Integer pairs Balancing can be converted to bitwise operations; sizemask is size-1, which is an optimization for dictionaries. Because the hash table is stored by hash(key)%size=index to determine the index, sizemask is an optimization of the length of the remainder, hash(key)%size into hash(key) & sizemask, optimize the division into a binary operation, so as to improve the speed of the execution, this optimization is the premise of the length of the array must be n powers of 2 (2 n 2^n2n).

 

hash conflict

Hash conflict refers to different keys in the hash table calculated to get the same hash value, but their actual storage location is not the same. In a hash table, each key is mapped to a bucket (bucket) or slot (slot) by a hash function and stored in the corresponding location.

 

Since the size of a hash table is finite and the number of keys may be infinite, hash conflicts are inevitable.

We pass the load factorLoadFactor = used / size to measure the degree of hash conflict, used is the number of elements stored in the array, and size is the length of the array;
The smaller the load factor, the smaller the conflict; the larger the load factor, the larger the conflict; redis has a load factor of 1 .

 

expansion

  • Expansion occurs if the load factor > 1; the rule for expansion is doubling;
  • If a fork is in progress (in the case of rdb, aof rewrites, and rdb-aof muxes), expansion is prevented;
  • However, at this time, if the load factor > 5, the index efficiency is greatly reduced, then immediately expand the capacity; this involves the principle of copy-on-write;

In copy-on-write, when a copy of data needs to be modified, the actual copy operation is not performed immediately, but a new copy of that data is created as the modification occurs. This avoids modifying the original data, thus maintaining data consistency and integrity.
The core idea of copy-on-write: copy data content only when you have to;

 

 

shrinkage

If the load factor < 0.1, then a curtailment occurs; the rule for curtailment is that exactly 2 n 2^n2n containing USED

The understanding of just right: If the number of elements stored in the array is 9, it is just right that the element containing it is , that is, 16;

 

Why isn't the load factor for downsizing less than 1?
Because the load factor of shrinking is less than 1, it will cause frequent expansion and contraction, expansion and contraction have the operation of allocating memory, memory operations become frequent will cause IO intensive.

 

Progressive rehash

Both expansion and contraction result in rehash because the mapping algorithm is changed.
When there are too many elements in the hashtable, because redis is a database with a lot of data stored in it, you can't rehash to ht[1] all at once; this will occupy redis for a long time, and other commands won't be responded to; so you need to use progressive rehash.

rehash step:
The elements in ht[0] are mapped to ht[1] by re-running them through the hash function to generate a 64-bit integer and then taking the remainder of the ht[1] length.

Progressive rules:
1) The idea of partitioning, the rehash will be divided into each subsequent step of the add, delete, modify and check operation.
2) In the timer, a maximum of one millisecond rehash is performed; each step is 100 array slots.
3) No expansion and contraction occurs during the processing of progressive rehash.

2. mysql related interview questions

2.1  What is a database transaction?

A database transaction is a series of operations executed as a single logical unit of work. Transactions have the ACID attributes of Atomicity, Consistency, Isolation, and Durability. This means that operations within a transaction either all succeed or all fail, maintain data integrity, and operate independently of other transactions.

2.2 What is the difference between InnoDB and MyISAM in MySQL?

InnoDB supports transactions, row-level locking, and foreign keys, and is suitable for scenarios that require high concurrency and transaction processing.MyISAM does not support transactions and row-level locking, but is fast to read and suitable for query-intensive scenarios.

 

2.3 How to optimize MySQL queries?

Ways to optimize MySQL queries include using appropriate indexes, avoiding functions in WHERE clauses, choosing appropriate data types, using LIMIT statements to reduce the amount of data, avoiding full table scans, and designing table structures wisely.

 

2.4 Why the B+ Tree

  • Hash indexes, while providing O(1) complexity queries, do not support range queries and sorting well, which can eventually lead to full table scans.
  • B-trees are capable of storing data at non-leaf nodes, but can lead to more random IO when querying continuous data.
  • And all leaf nodes of B+ tree can be connected to each other by pointers, reducing the random IO brought by sequential traversal.

2.5 What is covered indexing and index push down?

Override indexing:

A query in which index k already "covers" our query requirements is called a covering index.
Covered indexes can reduce the number of tree searches and significantly improve query performance, so using covered indexes is a common performance optimization.
Index downward push:

MySQL 5.6 introduces index condition pushdown, which allows you to filter out records that do not satisfy the conditions and reduce the number of times you have to go back to the table during index traversal by first judging the fields contained in the index.

2.6 What operations can cause an index to fail?

Using left or left-right fuzzy matches for indexes, i.e. like %xx or like %xx% both of these will cause the index to fail. The reason for this is that the query may yield multiple results, and it is not clear which index value to start comparing, so the query is forced to go through a full table scan.
Functions on indexes/expressions on indexes are computed because the indexes keep the original values of the indexed fields, not the values computed by the function, so naturally there is no way to go indexing.
An implicit conversion of an index is equivalent to using a new function.
The OR statement in the WHERE clause performs a full table scan whenever the conditional column is not an indexed column.

2.7 What is the difference between MySQL redo log and binlog?

 

2.8 Why does redo log have crash-safe capabilities that binlog cannot replace?

The first point is that redo logs ensure that innoDB can determine what data has been flushed and what data has not.

One big difference between redo log and binlog is that one is cyclic writing and the other is append writing. That is to say, redo log only records the logs that have not been flushed to disk, and any data that has already been flushed to disk will be deleted from redo log, which is a log file of limited size. binlog is an append log, which keeps the full amount of logs.
When a database crash occurs and you want to recover data that has not been flushed but has been written to memory in redo log and binlog, binlog cannot be recovered. Although binlog has the full amount of logs, there is no flag for innoDB to determine which data has been flushed and which data has not.
But redo log is different, as long as the data is flushed to disk, it will be erased from the redo log, because it is a circular write! After restarting the database, you can just restore all the data in the redo log to memory.
The second point is that if the redo log write fails, the operation has failed and the transaction cannot be committed.

redo log Every time an update operation is completed, it must be written to the log, and if the write fails, the operation fails and the transaction cannot be committed.
The internal structure of redo log is page-based and records the changes in field values on this page, so as long as the redo log is read for replay after a crash, the data can be recovered.
That's why redo log has crash-safe capabilities and binlog does not.


2.9 How to restore unflashed data to memory after a database crash?

Unpersistent data is categorized into several scenarios based on the two-phase commit of the redo log and binlog:

The change buffer is written, the redo log is fsync'd but not committed, the binlog is not fsync'd to disk, and this part of the data is lost.
The change buffer is written, the redo log fsync is not committed, the binlog is fsync'd to disk, the redo log is recovered from the binlog first, and then the change buffer is recovered from the redo log.
The change buffer is written, the redo log and binlog are fsync'd and recovered directly from the redo log.

 

2.91 What is a two-stage submission?

MySQL splits redo log writes into two steps: prepare and commit, interspersed with binlog writes, which is called a "two-stage commit".

The two-phase commit is to keep these two states logically consistent. redolog is used to recover physical data that was not updated when the host failed, and binlog is used for backup operations. The two are two separate entities in their own right, and to be consistent, they must be handled using a distributed transaction solution.

Why do you need a two-stage submission?

If you don't use a two-stage submission, this may be the case
Write redo log first, after crash the bin log backup is restored with one less update, which is not consistent with the current data.
If you write the bin log first, after the crash, the transaction is invalid because the redo log is not written, so when the subsequent bin log backup is restored, the data is inconsistent.
The two-phase commit is to ensure the security consistency of redo log and binlog data. Only when these two log files are logically consistent can they be used with confidence.
When recovering data, if the redolog status is commit, it means that the binlog is also successful and the data is recovered directly; if the redolog is prepare, you need to query whether the corresponding binlog transaction is successful and decide whether to rollback or execute.

2.92 Difference between CHAR and VARCHAR?

  1. CHAR and VARCHAR types differ in terms of storage and retrieval
  2. The CHAR column length is fixed to the length declared when the table was created, and the range of length values is 1 to 255
  3. When CHAR values are stored, they are padded with spaces to a specific length, and trailing spaces are removed when retrieving CHAR values

 

2.93 Difference between clustered and non-clustered indexes

Stored differently:#

Clustered indexes store data on disk in the order of the index, so the data storage and index storage of clustered indexes are mixed together; whereas non-clustered indexes store the index and data separately.

Uniqueness is different:#

Clustered indexes must be unique because they store data in index order and if two pieces of data have the same index value, they will be indistinguishable; whereas non-clustered indexes may or may not be unique.

The query efficiency is different:#

For clustered indexes, the query efficiency is often higher than non-clustered indexes, because clustered indexes store data together, and the query can locate the required data rows faster; while for non-clustered indexes, the query needs to look up the indexes first, and then according to the indexes to find the corresponding data rows, so the query efficiency is relatively low.

Inserting data is differently efficient:#

For clustered indexes, since the data is stored in the order of the index, it may be necessary to move the existing data when inserting new data, so the efficiency of inserting data is low; while for non-clustered indexes, only the index needs to be updated when inserting data, so the efficiency is relatively high.
It should be noted that a table can only have one clustered index, because the data can only be stored in one order; and there can be multiple non-clustered indexes to meet different query requirements. When designing a database, you need to choose different index types according to specific application scenarios and query requirements.

 

2.94 What is MVCC?

MVCC, or Multi-Version Concurrency Control. It is a method of concurrency control, generally in database management systems, which implements concurrent access to databases, and in programming languages, which implements transactional memory.
 
In layman's terms, there are multiple versions of data in the database at the same time, not multiple versions of the entire database, but multiple versions of a record exist at the same time, in a transaction to operate on it, you need to view the hidden columns of the transaction version of the id of this record, compared to the transaction id and according to the isolation level of the thing to determine which version of the data read

 

Database isolation level read committed, repeatable reads are based on the MVCC implementation, compared to the simple and brutal way of adding locks, it is a better way to deal with read and write conflicts, can effectively improve database concurrency performance.

Key Knowledge Points for MVCC Implementation

Transaction version number: Each time a transaction is opened, a self-growing transaction ID is obtained from the database, from which you can determine the order of execution of the transaction. This is the transaction version number.

Hidden Fields: For the InnoDB storage engine, each row of records has two hidden columns trx_id, roll_pointer, and a third hidden primary key column row_id if there is no primary key and non-NULL unique key in the table.

 

undo log : undo log, rollback log, is used to record the information before the data is modified. Before the table record is modified, the data will be copied to the undo log, and if the transaction is rolled back, the data can be restored through the undo log.

Think of it this way: when you delete a record, the undo log records a corresponding INSERT record, and when you UPDATE a record, it records a corresponding UPDATE record.

What is the purpose of undo log?

  1. Atomicity and consistency are guaranteed when transactions are rolled back.
  2. for MVCC snapshot reads.

Version Chain.Multiple transactions in parallel operation of a row of data, different transactions on the line of data will produce multiple versions of the modification, and then through the rollback pointer (roll_pointer), connected to a chain table, this chain table is called the version of the chain. The following:

 

 

In fact, by using the version chain, we can see the relationship between the transaction version number, the table's hidden columns, and the undo log. Let's analyze it a little more.

 

  1. Suppose now there is a core_user table with a data,id of 1 and name of Sun Quan inside the table:

 

Now open a transaction A: execute update core_user set name = "Cao Cao" where id=1 on the core_user table, the following process will take place
First get a transaction ID = 100
Copy the data of core_user table before modification to undo log.
Modify the data in the core_user table with id=1 and change the name to Cao Cao
Change the modified data transaction Id=101 to the current transaction version number and point the roll_pointer to the undo log data address.

 

Snapshot read and current read

Snapshot read: read the visible version of the record data (there are old versions). Without locking, ordinary select statements are snapshot reads, such as:

select * from core_user where id > 2;

Current read: reads the latest version of the recorded data, explicitly locked are current reads

select * from core_user where id > 2 for update;
select * from account where id>2 lock in share mode;

 

Read View

What is a Read View? It is the Read View that is generated when a transaction executes a SQL statement. In fact, in innodb, every SQL statement gets a Read View before it executes.
What is the use of Read View? It is mainly used to do visibility judgment, that is, to determine which version of data is visible to the current transaction~.
How does Read View ensure visibility judgment? Let's take a look at some of the important properties of the Read view.

m_ids: IDs of active (uncommitted) read/write transactions in the current system, structured as a List.
min_limit_id: Indicates the smallest transaction id, i.e., the smallest value in m_ids, among the read and write transactions currently active in the system at the time of generating the ReadView.
max_limit_id: Indicates the id value that should be assigned to the next transaction in the system when generating a ReadView.
creator_trx_id: transaction ID of the transaction that created the current read view
Read view The rules for matching conditions are as follows:

If the data transaction ID trx_id < min_limit_id, it indicates that the transaction that generated this version was committed before the Read View was generated (because the transaction ID is incremented), so this version can be accessed by the current transaction.
If trx_id>= max_limit_id, it indicates that the transaction that generated this version was generated after the ReadView was generated, so this version is not accessible by the current transaction.
If min_limit_id =<trx_id< max_limit_id, there are 3 cases to discuss.
(1). If m_ids contains trx_id, it means that at the moment of Read View generation, this transaction has not yet been committed, but if the trx_id of the data is equal to creator_trx_id, it means that the data is generated by itself, and therefore it is visible.
(2) If m_ids contains trx_id and trx_id is not equal to creator_trx_id, the Read View is generated with the transaction uncommitted and not self-produced, so the current transaction is also invisible;
(3). If m_ids does not contain a trx_id, it means that your transaction was committed before the Read View was generated, and the result of the modification is visible to the current transaction.

MVCC implementation principle analysis

What is the process for querying a record, based on MVCC

  1. Get the transaction's own version number, i.e., the transaction ID
  2. Get Read View
  3. The data obtained from the query is then compared with the transaction version number in Read View.
  4. If the Read View visibility rules are not met, then a historical snapshot in the Undo log is required.
  5. Finally return the data that conforms to the rule

InnoDB implements MVCC by means of the Read View+ Undo Log implemented, Undo Log saves historical snapshots, and Read View visibility rules help determine whether the current version of the data is visible.

 

 

2.95 What features have been added since mysql version 8.0?

8.0 refactored the redo log, removing the previous locking mechanism and using intervals to ensure data consistency. This improvement makes redo log writing more efficient and improves overall operational efficiency. At the same time, MySQL 8.0 also introduced the Link_buf data structure, making the entire module into a Lock_free mode, further improving performance. This lock-free refactoring allows different threads to write to redo_log_buffer in parallel, thus improving the concurrent performance of the database.

8.0 adds new features such as hidden indexes and descending indexes. Hidden indexes can be used to test the impact of de-indexing on query performance and help administrators find the best indexing strategy. Descending indexes can improve query efficiency, especially for query scenarios that require sorting in descending order. These indexing optimizations make MySQL a significant improvement in query performance.

 

3. With the rise of non-relational databases and data stores, MySQL 8.0 has also been optimized for NoSQL support. It no longer relies on schema and implements NoSQL functionality in a more flexible way to meet the diverse needs of users in data processing. This allows MySQL to better adapt to different application scenarios, providing a wider range of data storage and query options.

4.New to MySQL 8.0 is the SET PERSIST command, which allows users to persist a configuration to a file in the data directory. This simplifies configuration management by allowing previous configurations to be retained even if the database is restarted. This makes it easier for administrators to manage and adjust database configurations and improves database maintainability.

Operate the SET PERSIST command. for example:

SET PERSIST max_connections = 500;
MySQL saves the configuration for this command to a file in the data directory, which it reads the next time it starts and overwrites the default configuration file with the configuration.

5. MySQL 8.0 adds support for window functions such as ROW_NUMBER(), RANK(), DENSE_RANK(), and so on. Window functions allow users to perform more complex computational tasks by windowing a data set in a query. This gives MySQL a significant boost in data processing and analysis. With window functions, users can easily calculate moving averages, cumulative sums, and more to achieve more advanced data analysis needs.

The window function is a bit like SUM(), COUNT() and other aggregate functions, but instead of combining the results of multiple rows into a single row, it puts the results back into multiple rows. In other words, the window function does not need GROUP BY.

 

3. mongoDB related interview questions

3.1.Briefly describe what is MongoDB?

MongoDB is a distributed file storage based database , written in C++ , designed to provide scalable high-performance data storage solutions for Web applications . It is a product between relational databases and non-relational databases, and is the most feature-rich and relational database among the non-relational databases.MongoDB supports a very loose data structure, which is similar to the bson format of json, so it can store more complex data types.The most important feature of MongoDB is that it supports a very powerful query language, and its syntax is somewhat similar to the object-oriented query language , you can almost realize a relational database similar to the vast majority of the functions of a single table query , and also supports the establishment of an index on the data .


3.2. What are the most basic differences between MySQL and MongoDB?

There are many differences between ySQL and MongoDB, here are some of the most basic ones:

Database type: MySQL is a relational database, while MongoDB is a non-relational database.
Data storage method: MySQL supports multiple engines, different engines have different storage methods, while MongoDB to JSON-like document format storage.
Query language: MySQL uses traditional SQL statements for querying, while MongoDB has its own way of querying (JavaScript-like functions).
Indexing: MySQL can index columns in a table, while MongoDB can index any attribute.
Scalability: MySQL, while also scalable, requires more work, whereas MongoDB is a distributed file-storage based database that can easily scale to large amounts of data and high concurrency.
Latency: Since MongoDB has low latency for write operations, it is ideal for real-time applications, while MySQL has relatively high latency.
Transactions: MySQL has full transaction support, while MongoDB does not support transactional operations.
Data schema: MySQL requires predefined fields, while MongoDB is a dynamic schema where documents in the same collection do not need to have the same fields and structure.

3.3 What makes MongoDB the best NoSQL database?

There are several main reasons why MongoDB is the best NoSQL database:

Document-Oriented Storage: MongoDB uses document-oriented storage, which means that it can directly store data objects without splitting the data into multiple fields like relational databases. This storage approach makes MongoDB more flexible and efficient when dealing with complex data structures.
High Performance: MongoDB has excellent performance, especially when dealing with large amounts of data and highly concurrent access. It uses a binary protocol to read and write data quickly and supports indexing and query optimization to further improve query efficiency.
High Availability: MongoDB has high availability and can perform data replication and backup between multiple nodes to ensure data reliability and fault tolerance. In addition, MongoDB supports automatic sharding and horizontal scaling, which makes it easy to scale database capacity and performance.
Easily Scalable: MongoDB is easily scalable, making it easy to add nodes to handle more data and requests. This is useful for applications that need to handle large-scale data and highly concurrent access.
Rich query language : MongoDB uses a query language similar to JavaScript , you can easily perform complex query operations . This query language is powerful and easy to use , can meet a variety of data retrieval needs .
To summarize, MongoDB's high performance, high availability, easy scalability and rich query language make it one of the best NoSQL databases.

3.4 Briefly describe MongoDB internals?

The internals of MongoDB consist of the following main components:

Database: MongoDB is a document-based database that allows you to create multiple databases. Each database has its own files and indexes for storing and retrieving data.
Collections : Collections are containers for storing documents in MongoDB. A collection can contain multiple documents, each in the form of a key-value pair, where the key is a string and the value can be of various data types.
Documents: A document is the basic unit of data in MongoDB, it is a collection of key-value pairs. Keys are strings and values can be of various data types, including other documents, arrays, dates, booleans, etc.
Index: An index is a data structure used in MongoDB to speed up queries. It can be created based on the values of one or more fields, enabling query operations to quickly find documents that fulfill conditions.
Replication: MongoDB supports data replication, which allows you to copy data from one node to another. This helps to increase the availability and reliability of data.
Sharding: MongoDB supports data sharding, which allows you to spread the data in a collection across multiple nodes in order to increase the processing power and storage capacity of the data.
Query Language: MongoDB uses a JavaScript-like query language for retrieving and manipulating data. This language is very flexible and can be used to perform a variety of complex query operations.
In a nutshell, the internal structure of MongoDB is a document-based storage model, which improves data processing power and storage capacity through indexing, replication and sharding. At the same time, MongoDB also provides a rich query language and API interfaces to facilitate application development and integration for developers.