A few years ago MySQL database appeared suddenly from the library latency failure and CPU burst high, how to troubleshoot the specific cause, may say has been in the Tencent cloud MySQL library to open the SQL full audit, recorded all the execution of the SQL, and then through the following methods can easily find the cause:
1,Efficient analytics with high practical QPS and TPS
But if you don't have SQL Full Audit enabled, how do you go about finding the cause. Of course, you can look at the slow SQL logs, but at this point the slow SQL logs do not record all the SQL, so it is not easy to determine and analyze which specific SQL is caused.
How to MySQL database in non-open SQL full audit (because instances can not all open SQL audit, self-built MySQL open also not easy to store view, etc.), theWhat can I do? Moreinformation to analyze the cause!
At that time took over the company's database operations and maintenance, R & D has written a Shell, every 10 seconds to save the current: show full processlist, to retain the current implementation of SQL situation to the file, and later found that these are not enough, and then add long transactions and table locks, may be better, Shell scripts are as follows:
1, CPU popping high:
Monitor the shell:
Execute every 10 seconds to retain the monitoring of the current query to txt text
#!/bin/bash cd $(dirname $0) echo $log_file sql=" show full processlist" ip=172.16.0.15 start_time=$(date +%s) let "check_time=start_time+60"; now=$start_time while [ $now -lt $check_time ]; do log_dir=/root/monitor/$(date +"%Y-%m-%d")/$(date +"%H")/$(date +"%M") mkdir -p $log_dir log_file=$log_dir/$(date +"%Y-%m-%d-%H-%M-%S") echo $sql | mysql -udba_op -ppwd -h$ip mysql > $log_file.master mysql -udba_op -ppwd -h$ip mysql </root/shell/long_tran.sql > $log_file.master_tran echo "next " sleep 10; now=$(date +%s) done
SQL script under long_tran.sql:
select timestampdiff(SECOND,trx_started,now()) diffsecond,,b.user,concat('kill ',,' ;') killsql,, IFNULL(,a.trx_query) runsql from information_schema.innodb_trx a left join information_schema.PROCESSLIST b on a.trx_mysql_thread_id= where timestampdiff(SECOND,trx_started,now()) >=1 order by 1 desc limit 20; show open tables where in_use >=1;
Crontab runs scheduling and cleans up logs from 10 days ago
*/1 * * * * sh /root/shell/sg_monitor_master.sh > /dev/null 2>&1
1 1 * * * find /root/monitor/ -name "20*" -type d -mtime +10 -exec rm -rf {} \; > /dev/null 2>&1
This gives you the following 2 files every 10 seconds:
You can combine the slow SQL and these 2 files to monitor the current SQL execution of the database at the time of the problem and identify the problem.
2, Sudden delay from the repository
Because at that time the company is to do online education, but from the library delayed tens of seconds, a great impact on them, while the leadership will receive alarm text messages, nail group called, we have to find out what causes the delay from the library, a way to view the above monitoring and processing of SQL to get the information, in addition to the time there is a parse out of the binlog information, to find out the specific SQL, to see what the SQL lead to. Specific practices:
1, from the tencent cloud to download the problem time binlog log local machine
2, parse the binlog logs according to the timeframe of the problem, as follows:
mysqlbinlog --no-defaults -vv --base64-output=decode-rows --start-datetime='2020-08-18 11:20:00' --stop-datetime='2020-08-18 12:03:00' mysql-bin.000203 >
3, after parsing out the binlog log, use notepad++ to open the file, carefully view the running SQL log, analyze the SQL, which is careful work. If you check the specific table, you can use thegrepFiltering, as follows:
1, grep user_0 -A 10 -B 10 >/data/61.txt
2, mysqlbinlog --start-datetime="2019-07-08 9:32:00" --stop-datetime="2019-07-09 11:20:00" -v --base64-output=DECODE-ROWS binlog.000001 | grep -A4 'INSERT' >
4, Combine the above information and issue SQL for the R&D students to modify the business.
3, Record the original SQL
At that time in the company, the company's technical VP said to record the execution of the original SQL, so that the big data read binlog, the original SQL retained, the future audit can be found through the original SQL who changed the online business data, at the same time, to the R & D staff each time the business to change the data of the SQL email copy to him. The query found that mysql also has such parameters:binlog_rows_query_log_events parameter
Under the default configuration, the ROW format binary log only records changes in the data, and does not record what statements lead to changes in the data, but for auditing or bug handling needs, you need to know the SQL statements that lead to changes in the data, MYSQL provides binlog_rows_query_log_events to control whether to store the "raw SQL" in the binary. SQL" in the binary. When enabled, it looks like this:
Later, the parameter was turned on in the master database, and the big data recorded the SQL execution of the master database, to monitor who had messy modifications to update the online business data, but later on the data has not been used to check, but instead, the big data said that the recorded execution of the SQL data is too large, and had to shorten the date of retention of the data.