Database Series: Master-Slave Latency Optimization

Database Series: MySQL Slow Query Analysis and Performance Optimization
Database Series: MySQL Index Optimization Summary (Comprehensive Edition)
Database Series: Data Field Changes under High Concurrency
Database Series: Overriding Indexes and Circumventing Table Returns
Database Series: Database High Availability and Lossless Scaling
Database Series: Improving Performance with Highly Distinctive Index Columns
Database Series: Prefix Indexes and Index Length Tradeoffs
Database Series: MySQL Engines MyISAM and InnoDB Comparison
Database Series: Implementing High Concurrency Control under InnoDB
Database Series: 4 Isolation Levels for Transactions
Database Series: Difference between snapshot reads under RR and RC
Database Series: Introduction to MySQL InnoDB Locking Mechanisms
Database Series: What locks are used for different MySQL operations?
Database Series: A Look at the Industry's Mainstream MySQL Data Middleware
Database Series: Paging Performance Issues with Huge Data Tables
Database Series: Mainstream library and table middleware introduction (graphic summary)
Data Fission, Database High Availability Architecture Design Practice

1 Index to this paper

1 What is MySQL Master-Slave Replication?

Master-slave replication, refers to the process of creating a database environment (called a slave database) that is exactly the same as the master database and replicates the operational behavior of the master database, so how do you ensure the consistency of data between the slave database and the master database?That is, the DDL and DML operation logs of the master database are synchronized to the slave database, and then these logs are re-executed on the slave database to ensure the consistency of the data in the slave database and the master database.

1.1 Why Master-Slave Replication?

1, in the complex business operations, there are often operations that lead to lock rows or even lock the table, if the read and write are not decoupled, it will very much affect the running of the business, the use of master-slave replication, so that the master library is responsible for the write, from the library is responsible for the read, the responsibility for a more single, better performance.
Even if there is a table lock scenario in the master, you can keep your business running by reading the slaves.

2, to ensure that the hot backup of data, the main library downtime can replace the main library in a timely manner to protect the availability of business.

3, the evolution of the architecture: the expansion of business, I / O access frequency increases, a single machine can not meet the master-slave replication can be done to do multi-library program to reduce the frequency of disk I / O access to improve the performance of a single machine I / O .

4, in essence, is also the concept of partition, master-slave replication, read-write separation that is the process of pressure splitting.

5, read-write ratio also affects the entire split mode, read-write ratio, the higher the master-slave library ratio should be higher in order to ensure the balance of read and write, in order to ensure better operational performance. The master-slave allocation strategy under the read-write ratio is as follows:

Read/write ratio (approximate)	main library	slaves
50：50	1	1
66.6：33.3	1 2
80：20	1	4
-- --	-- --	-- --

1.2 Why are there master-slave delays?

When replication is initiated on the slave library, an I/O thread is first created to connect to the master library, the master library then creates a Binlog Dump thread to read the database events and send them to the I/O thread, which gets the event data and updates it to the relay log Relay Log on the slave library, and after that, the SQL thread on the slave library reads the updated database events in the relay log Relay Log and apply it, as shown in the following figure:

To break it down there are several steps:
1. The MySQL master database records data changes (insert, delet, update) as event logs in the binary log table (binlog) when the transaction is committed.
2. There is a working thread binlog dump thread on the master repository that sends the contents of the binlog to the relay log in the relay log of the slave repository.
3, the slave library according to the relay log relay log redo data change operations, through logical replication to achieve data consistency between the master library and the slave library.
4. MySQL accomplishes data replication between master and slave libraries through three threads, where the binlog dump thread runs on the master, and the I/O and SQL threads run on the slave. A master with multiple slaves creates a binlog dump thread for each slave that connects to the master.

2 Analysis of the causes of master-slave delay

Let's start by understanding the following concepts and talking about what is a master-slave delay?

Master-slave latency usually refers to the time delay experienced by a slave server (Slave) in receiving and applying data changes on the master server (Master) in a master-slave replication architecture of a database. Specifically, when the data on the Master server changes, these changes need to be synchronized to the Slave server through the replication mechanism, and the time required for the Slave server to process these changes and complete data synchronization constitutes the so-called latency.
The direct consequence is that the results of the data queried from the library and the main library are inconsistent, we often query the process of querying data will query the old data, it may be the result of this reason!

MySQL master-slave replication, read-write separation is our commonly used database architecture, but in the scenario of large concurrency and large data changes, master-slave latency will be more serious.
The essential reason for the delay is that when the system TPS concurrency is high, the number of DMLs generated by the master repository (which also contains a portion of DDLs) is more than the Slave can handle with one Sql thread, and the efficiency is reduced.
We see that this sql thread is a single thread, so his ability to redo RelayLog is limited.

There are some other reasons, for example:

Excessive load on the master server: When the master server needs to process a large number of data changes, it may generate a higher load, resulting in slower generation and transmission of data changes, which increases the synchronization latency of the slave servers.
Excessive load on slave servers: When a slave server receives and applies changes to data on the master server, if its own load is too high, it may cause processing to slow down, resulting in synchronization delays.
network latency: An unstable network connection or insufficient bandwidth between the master and slave servers may also result in slower transmission of data changes, thereby increasing synchronization latency.
Differences in machine performance: Different hardware configurations of the master and slave servers, such as performance differences in CPU, memory, disks, etc., may also cause synchronization delays to occur.
MySQL misconfiguration: For example, if the master server's binary log (binlog) is set too large, or the slave server's relay log is not configured properly, this can lead to slower processing and thus synchronization delays.
A large number of large query statements from the repository generate lock waits

3 Master-Slave Delay Optimization Scheme

3.1 Optimal system configuration

Optimize system configuration (system level, link tier, storage engine tier) so that the database is in the optimal state: maximum number of connections, number of errors allowed, timeout allowed, pool_size, log_size, etc., and ensure the expansion of memory, CPU, and storage space (hardware part).

inverted pyramid ruleTelling us that this piece is often overlooked but essential.

If MySQL is deployed on a linux system, the operating system parameters can be adjusted appropriately to optimize MySQL performance, and the following are appropriate adjustments to Linux kernel parameters.

 1 # TIME_WAIT timeout, default is 60s
 2 net.ipv4.tcp_fin_timeout = 30
 3 # Increase the number of queues supported by tcp, increase the queue length to accommodate more waiting connections.
 4 net.ipv4.tcp_max_syn_backlog = 65535
 5 # Reduce resource recovery when disconnecting.
 6 net.ipv4.tcp_max_tw_buckets = 8000
 7 net.ipv4.tcp_tw_reuse = 1
 8 net.ipv4.tcp_tw_recycle = 1
 9 net.ipv4.tcp_fin_timeout = 10
10 # Limit of open files
11 *soft nofile 65535
12 *hard nofile 65535

After MySQL version 5.5+, the default storage engine is InnoDB, we list some of the parameters that may affect the database performance.
Default values for public parameters:

1 max_connections = 151
2 # Maximum number of simultaneous connections, it is recommended to set the maximum number of connections is about 80% of the maximum number of connections, the general default value of 151, you can make appropriate adjustments.
3 sort_buffer_size = 2M
4 # Query sorting buffer size, only for order by and group by role, it is recommended to increase to 16M.
5 open_files_limit = 1024
6 # Limit the number of open files, if show global status like 'open_files' to see the value is equal to or greater than the value of open_files_limit, the program will not be able to connect to the database or stuck!

InnoDB parameter defaults:

 1 innodb_buffer_pool_size = 128M
 2 # Index and data buffer size, it is recommended to set about 70% of the physical memory (this assumes that this server is only used as a Mysql database server)
 3 innodb_buffer_pool_instances = 1
 4 # Number of buffer pool instances, recommended setting 4 or 8.
 5 innodb_flush_log_at_trx_commit = 1
 6 # Key parameters, 0 represents about every second to write to the log and synchronize to disk, database failure will lose about 1 second of transaction data. 1 for every SQL after the execution of writing to the log and synchronized to disk, I / O overhead, after the execution of the SQL to wait for the log to read and write, inefficient. 2 represents only write the log to the system cache, and then synchronized to the disk every second, efficient, if the server fails, before the loss of transaction data. If the server fails, the transaction data will be lost. The data security requirements are not very high recommended setting 2, high performance, the effect is obvious after the modification.
 7 sync_binlog=1
 8
 9 innodb_file_per_table = ON
10 # Whether shared tablespace, 5.7 + version default ON, shared tablespace idbdata file is increasing, affecting certain I/O performance. It is recommended to turn on the independent tablespace mode, each table's index and data exist in their own independent tablespace, you can realize a single table to move in different databases.
11 innodb_log_buffer_size = 8M
12 # The size of the log buffer, because the log is refreshed once every second, so generally do not need to exceed 16M

3.2 Rational partitioning of the database layer

Database partitioning is an eternal topic, and master-slave latency is in part a result of a single database master service operating too frequently, making theSingle-threaded SQL threads are overwhelming. The database can be appropriately split functionally to share the pressure.

Database splitting can be found in this article of mine, "Thedatabases and tables", which will not be repeated on this side.

3.3 Response after slave synchronization is complete

If your business time permits, you can write to the master, make sure that the data are synchronized to the slave before returning this data write success, of course, if there are multiple slaves, you must also ensure that each slave write success. Of course.This solution is extremely performance and time consuming, will directly reduce your system throughput, and is not recommended.

3.4 Appropriate introduction of caching

You can introduce redis or other nosql databases to store business data that we often have master-slave latency. While I am writing to the database, we write another copy to redis.

The first thing we can do is to check if the data is available in redis, if so we can read the data directly from redis. When the data is really synchronized to the database, then remove the data from redis. The figure below:

This side also need to pay attention to two points, very important yo, interview must ask:

1, although to some extent to alleviate the problem of delay, but if you encounter high concurrency, the frequent deletion of Redis is also unreasonable, so you need to combine with the scenario of comprehensive consideration, such as regular deletion of the cache.

2, high concurrency may exist in the case of slave has not been synchronized, there are new values written in, this time Master --> Slave is still in the queue, but the Cache has been updated. So if you delete Redis, you may mistakenly delete the latest cache values, resulting in reading the old data.

As in the above figure, a value was updated 1, 2, 3, master-slave synchronization in order, just after the synchronization of 1, the Cache was updated to 3, this time if the Cache is deleted, the read request will go to the slave library to read, read 1, the data will be inconsistent for a short period of time.

So this place also need to pay attention to, you can at the same time the unique key (such as the primary key) also do save, delete before making a judgment to avoid accidental deletion. Or simply do not delete the cache in real time, low peak period to deal with.

3.5 Multi-threaded replay of RelayLog

MySQL uses a single thread to replay RelayLog, so can you do the solution on this, for example, using multiple threads to replay RelayLog in parallel, you can shorten the time. But this is a test for data consistency.

need to consider how to split RelayLog.To enable multiple database instances and multiple threads to replay RelayLog in parallel, there will be no inconsistencies. For example, if RelayLog contains a record of these three statements awarding credit to a student, you don't know what the result will turn out to be. It could be 806 or even 721.

1 update t_score set score = 721 where stu_code=374532;
2 update t_score set score = 806 where stu_code=374532;
3 update t_score set score = 899 where stu_code=374532;

The solution is:

Write operations on the same library table use the same thread to replay RelayLog; write operations on different library tables can be concurrently replayed RelayLog with multiple threads concurrently.

Design a hash algorithm, hash(db-name) % thread-num, table name hash and then modulo the number of threads, it can be easily done, the same library table write operations, by the same replay thread serial execution, to achieve the purpose of improving efficiency.

This is actually a form of partitioning thinking, similar to splitting the database directly above.

3.6 Small number of read operations directly connected to the main repository

If there is not a lot of business, do not do master-slave separation. Since the master-slave delay is caused by the slave synchronization write library is not timely, then we can also have a master-slave delay in the place to change the way to read the library, from the original read slave library to read the master library. Of course, this will also increase the complexity of some of the logic of the code.
This side should be noted that the amount of business directly read the main library should not be more, and is read real-time consistency of the business has a rigid demand to do so. Otherwise, it deviates from the purpose of read-write separation.

3.7 Appropriate flow restriction, downgrading

Any server has a throughput limit, and no program can carry an unlimited amount of traffic from users. So we have to estimate what is the upper limit of traffic that our server can carry.
After reaching this limit, it is necessary to take the three major killing strokes of caching, limiting and degrading to cope with our traffic. This is also the fundamental treatment to deal with master-slave latency.

3.8 Higher versions of MySQL support multi-threaded replication

Multi-threaded replication (also known as parallel replication) was first supported in MySQL version 5.6, and after MySQL 5.7, the ability to parallelize replication based on GTID was provided. In this version 5.6, the default is single-threaded replication, but you can enable multithreaded replication by configuring the parameter slave_parallel_workers.
To enable multithreaded replication, follow these steps:

1. Make sure your MySQL version is 5.6+.

2. Modify multithreading configuration
Modify the MySQL (or) configuration file to set the slave_parallel_workers parameter on the slave server to the desired number of worker threads, for example:

[mysqld]
slave_parallel_workers = 8

3. Restart the MySQL service for the configuration to take effect.

4. Verify that multithreaded replication is enabled:

SHOW VARIABLES LIKE 'slave_parallel_workers';

If the return value is greater than 0, multi-threaded replication is enabled and the specified number of threads can be used to apply log events.

4 Summary

Above mentioned a variety of options are open to discussion, each method has disadvantages and advantages, according to the actual situation of the selection.