Failure Analysis Methods for MySQL without Opening SQL Full Audit

A few years ago MySQL database appeared suddenly from the library latency failure and CPU burst high, how to troubleshoot the specific cause, may say has been in the Tencent cloud MySQL library to open the SQL full audit, recorded all the execution of the SQL, and then through the following methods can easily find the cause:

1，Efficient analytics with high practical QPS and TPS

But if you don't have SQL Full Audit enabled, how do you go about finding the cause. Of course, you can look at the slow SQL logs, but at this point the slow SQL logs do not record all the SQL, so it is not easy to determine and analyze which specific SQL is caused.

How to MySQL database in non-open SQL full audit (because instances can not all open SQL audit, self-built MySQL open also not easy to store view, etc.), theWhat can I do? Moreinformation to analyze the cause!

At that time took over the company's database operations and maintenance, R & D has written a Shell, every 10 seconds to save the current: show full processlist, to retain the current implementation of SQL situation to the file, and later found that these are not enough, and then add long transactions and table locks, may be better, Shell scripts are as follows:

1, CPU popping high:

Monitor the shell:

Execute every 10 seconds to retain the monitoring of the current query to txt text

#!/bin/bash
cd $(dirname $0)
echo $log_file
sql=" show full processlist"
ip=172.16.0.15

start_time=$(date +%s)
let "check_time=start_time+60";
now=$start_time
while [ $now -lt  $check_time  ];
do
    log_dir=/root/monitor/$(date +"%Y-%m-%d")/$(date +"%H")/$(date +"%M")
    mkdir -p $log_dir    
    log_file=$log_dir/$(date +"%Y-%m-%d-%H-%M-%S")
    echo $sql | mysql   -udba_op -ppwd -h$ip mysql  > $log_file.master
        mysql   -udba_op -ppwd -h$ip mysql </root/shell/long_tran.sql > $log_file.master_tran
    echo "next "
    sleep 10;
    now=$(date +%s)
done

SQL script under long_tran.sql:

select timestampdiff(SECOND,trx_started,now()) diffsecond,,b.user,concat('kill ',,' ;') killsql,,
IFNULL(,a.trx_query) runsql
from information_schema.innodb_trx a left join
information_schema.PROCESSLIST b on a.trx_mysql_thread_id=
where timestampdiff(SECOND,trx_started,now()) >=1
order by 1 desc limit 20;

show open tables where in_use >=1;

Crontab runs scheduling and cleans up logs from 10 days ago

*/1 * * * *  sh /root/shell/sg_monitor_master.sh > /dev/null 2>&1
1 1 * * *  find /root/monitor/ -name "20*" -type d -mtime +10 -exec rm -rf {}  \; > /dev/null 2>&1

This gives you the following 2 files every 10 seconds:

You can combine the slow SQL and these 2 files to monitor the current SQL execution of the database at the time of the problem and identify the problem.

2, Sudden delay from the repository

Because at that time the company is to do online education, but from the library delayed tens of seconds, a great impact on them, while the leadership will receive alarm text messages, nail group called, we have to find out what causes the delay from the library, a way to view the above monitoring and processing of SQL to get the information, in addition to the time there is a parse out of the binlog information, to find out the specific SQL, to see what the SQL lead to. Specific practices:

1, from the tencent cloud to download the problem time binlog log local machine

2, parse the binlog logs according to the timeframe of the problem, as follows:

mysqlbinlog --no-defaults -vv --base64-output=decode-rows --start-datetime='2020-08-18 11:20:00' --stop-datetime='2020-08-18 12:03:00' mysql-bin.000203 >

3, after parsing out the binlog log, use notepad++ to open the file, carefully view the running SQL log, analyze the SQL, which is careful work. If you check the specific table, you can use thegrepFiltering, as follows:

1, grep user_0  -A 10 -B 10 >/data/61.txt
2, mysqlbinlog --start-datetime="2019-07-08 9:32:00" --stop-datetime="2019-07-09 11:20:00" -v --base64-output=DECODE-ROWS binlog.000001 | grep -A4 'INSERT' >

4, Combine the above information and issue SQL for the R&D students to modify the business.

Note the binlog:

1, after the master library turns on gtid, the pos location point, the pos at the beginning of each binlog log file is calculated from 1

2, binlog replace, will be converted to the corresponding idelete and insert, or update, etc.

3, binlog: the executed SQL is with the name of the library

The --base64-output parameter is used to control whether the binlog part is displayed or not, specifying decode-rows means that the binglog part is not displayed.

Difference between plus-v and plus-vv

Plus-vv

### UPDATE `online`.`ol_teacher_inversion_rate_list` ### WHERE ### @1=595135615 /* INT meta=0 nullable=0 is_null=0 */

Add-v

### UPDATE `online`.`ol_teacher_inversion_rate_list` ### WHERE ### @1=595135615

One less v, less information about the type of the field that follows it

The difference between adding --base64-output=DECODE-ROWS and not adding it:

Add: --base64-output=DECODE-ROWS

Do not show the binary information of the original binlog : e.g. BINLOG ' JV7gWg8QJwAAdwAA.......'

4, binlog batch delete

E.g. DELETE FROM `online`. `ol_task_agents_fixed` WHERE `record_date` = '2021-04-21'

In the binlog it will be one record at a time, deleting 5000, there will be 5000 delete statements and it will be the same transaction.

3, Record the original SQL

At that time in the company, the company's technical VP said to record the execution of the original SQL, so that the big data read binlog, the original SQL retained, the future audit can be found through the original SQL who changed the online business data, at the same time, to the R & D staff each time the business to change the data of the SQL email copy to him. The query found that mysql also has such parameters:binlog_rows_query_log_events parameter

Under the default configuration, the ROW format binary log only records changes in the data, and does not record what statements lead to changes in the data, but for auditing or bug handling needs, you need to know the SQL statements that lead to changes in the data, MYSQL provides binlog_rows_query_log_events to control whether to store the "raw SQL" in the binary. SQL" in the binary. When enabled, it looks like this:

Later, the parameter was turned on in the master database, and the big data recorded the SQL execution of the master database, to monitor who had messy modifications to update the online business data, but later on the data has not been used to check, but instead, the big data said that the recorded execution of the SQL data is too large, and had to shorten the date of retention of the data.