Recently in a financial unit core system project to do ob performance pressure testing, during which encountered a lot of problems, the scene for two weeks every day to work overtime to about one or two o'clock in the morning, really tired. 🤢🤢🤢🤢🤢🤢
I actually into ob before there is mental expectation, roll on roll, 8 or 9 o'clock off work is also acceptable, I did not expect to really to do the project situation, every day in the early hours of the morning off work, I really fear that I do not know when it will die suddenly. 😂😂😂
ob is pretty much criticized by the gv$ob_sql_audit view, if there are a large number of things in the database inside the execution of the contents of this view will soon be brushed away, and do query analysis is very slow, just a predicate filtering statement is estimated to be executed for a few minutes to get the results, it is difficult to analyze the business interface in the end which thing is slow SQL.
Although the ob_sql_audit_percentage parameter can be adjusted through the gv$ob_sql_audit view of the saved content, but if the continuous high concurrency pressure test case, the same will also be inside the content of the very fast refresh, so we need to gv$ob_sql_audit data persistence to the data table inside, add appropriate indexes used to analyze the server layer different scenarios business interface sql statement, in the end slow in which link, to target tuning.
So I went to the site to understand the situation, spend half a day writing and debugging extract_sqlaudit_proc this procedure to help me grab the server layer data, this process subsequently help me analyze the performance of different scenarios to save a lot of effort.
1, xz_sql_audit table, extract_sqlaudit_proc process code
Key Note: This option is only available on oracle tenants.
-- Creating an Analysis Table
CREATE TABLE XZ_SQL_AUDIT(
C_ID NUMBER(38), -- uniqueID
C_RQ_TIME TIMESTAMP, -- Request time
C_BATCH_ID NUMBER(38), -- Insertion of batches
C_NAME VARCHAR2(100), -- Scene Name
C_FLAG CHAR(2), -- s For pulling a single transaction,m For pulling multiple transactions(pressure measurement)
SVR_IP VARCHAR2(46) ,
SVR_PORT NUMBER(38) ,
REQUEST_ID NUMBER(38) ,
SQL_EXEC_ID NUMBER(38) ,
TRACE_ID VARCHAR2(128) ,
SID NUMBER(38) ,
CLIENT_IP VARCHAR2(46) ,
CLIENT_PORT NUMBER(38) ,
TENANT_ID NUMBER(38) ,
EFFECTIVE_TENANT_ID NUMBER(38) ,
TENANT_NAME VARCHAR2(64) ,
USER_ID NUMBER(38) ,
USER_NAME VARCHAR2(64) ,
USER_GROUP NUMBER(38) ,
USER_CLIENT_IP VARCHAR2(46) ,
DB_ID NUMBER(38) ,
DB_NAME VARCHAR2(128) ,
SQL_ID VARCHAR2(32) ,
QUERY_SQL CLOB ,
PLAN_ID NUMBER(38) ,
AFFECTED_ROWS NUMBER(38) ,
RETURN_ROWS NUMBER(38) ,
PARTITION_CNT NUMBER(38) ,
RET_CODE NUMBER(38) ,
QC_ID NUMBER(38) ,
DFO_ID NUMBER(38) ,
SQC_ID NUMBER(38) ,
WORKER_ID NUMBER(38) ,
EVENT VARCHAR2(64) ,
P1TEXT VARCHAR2(64) ,
P1 NUMBER(38) ,
P2TEXT VARCHAR2(64) ,
P2 NUMBER(38) ,
P3TEXT VARCHAR2(64) ,
P3 NUMBER(38) ,
WAIT_CLASS_ID NUMBER(38) ,
WAIT_CLASS# NUMBER(38) ,
WAIT_CLASS VARCHAR2(64) ,
STATE VARCHAR2(19) ,
WAIT_TIME_MICRO NUMBER(38) ,
TOTAL_WAIT_TIME_MICRO NUMBER(38) ,
TOTAL_WAITS NUMBER(38) ,
RPC_COUNT NUMBER(38) ,
PLAN_TYPE NUMBER(38) ,
IS_INNER_SQL NUMBER(38) ,
IS_EXECUTOR_RPC NUMBER(38) ,
IS_HIT_PLAN NUMBER(38) ,
REQUEST_TIME NUMBER(38) ,
ELAPSED_TIME NUMBER(38) ,
NET_TIME NUMBER(38) ,
NET_WAIT_TIME NUMBER(38) ,
QUEUE_TIME NUMBER(38) ,
DECODE_TIME NUMBER(38) ,
GET_PLAN_TIME NUMBER(38) ,
EXECUTE_TIME NUMBER(38) ,
APPLICATION_WAIT_TIME NUMBER(38) ,
CONCURRENCY_WAIT_TIME NUMBER(38) ,
USER_IO_WAIT_TIME NUMBER(38) ,
SCHEDULE_TIME NUMBER(38) ,
ROW_CACHE_HIT NUMBER(38) ,
BLOOM_FILTER_CACHE_HIT NUMBER(38) ,
BLOCK_CACHE_HIT NUMBER(38) ,
DISK_READS NUMBER(38) ,
RETRY_CNT NUMBER(38) ,
TABLE_SCAN NUMBER(38) ,
CONSISTENCY_LEVEL NUMBER(38) ,
MEMSTORE_READ_ROW_COUNT NUMBER(38) ,
SSSTORE_READ_ROW_COUNT NUMBER(38) ,
DATA_BLOCK_READ_CNT NUMBER(38) ,
DATA_BLOCK_CACHE_HIT NUMBER(38) ,
INDEX_BLOCK_READ_CNT NUMBER(38) ,
INDEX_BLOCK_CACHE_HIT NUMBER(38) ,
BLOCKSCAN_BLOCK_CNT NUMBER(38) ,
BLOCKSCAN_ROW_CNT NUMBER(38) ,
PUSHDOWN_STORAGE_FILTER_ROW_CNT NUMBER(38) ,
REQUEST_MEMORY_USED NUMBER(38) ,
EXPECTED_WORKER_COUNT NUMBER(38) ,
USED_WORKER_COUNT NUMBER(38) ,
SCHED_INFO VARCHAR2(16384) ,
PS_CLIENT_STMT_ID NUMBER(38) ,
PS_INNER_STMT_ID NUMBER(38) ,
TX_ID NUMBER(38) ,
SNAPSHOT_VERSION NUMBER(38) ,
REQUEST_TYPE NUMBER(38) ,
IS_BATCHED_MULTI_STMT NUMBER(38) ,
OB_TRACE_INFO VARCHAR2(4096) ,
PLAN_HASH NUMBER(38) ,
PARAMS_VALUE CLOB ,
RULE_NAME VARCHAR2(256) ,
TX_INTERNAL_ROUTING NUMBER ,
TX_STATE_VERSION NUMBER(38) ,
FLT_TRACE_ID VARCHAR2(1024) ,
NETWORK_WAIT_TIME NUMBER(38),
PRIMARY KEY(C_ID,C_BATCH_ID))
PARTITION BY RANGE(C_BATCH_ID)
(
PARTITION P10 VALUES LESS THAN (10),
PARTITION P20 VALUES LESS THAN (20),
PARTITION P30 VALUES LESS THAN (30),
PARTITION P40 VALUES LESS THAN (40),
PARTITION P50 VALUES LESS THAN (50),
PARTITION P60 VALUES LESS THAN (60),
PARTITION P70 VALUES LESS THAN (70),
PARTITION P80 VALUES LESS THAN (80),
PARTITION P90 VALUES LESS THAN (90),
PARTITION P100 VALUES LESS THAN (100),
PARTITION P110 VALUES LESS THAN (110),
PARTITION P120 VALUES LESS THAN (120),
PARTITION P130 VALUES LESS THAN (130),
PARTITION P140 VALUES LESS THAN (140),
PARTITION P150 VALUES LESS THAN (150),
PARTITION P160 VALUES LESS THAN (160),
PARTITION P170 VALUES LESS THAN (170),
PARTITION P180 VALUES LESS THAN (180),
PARTITION P190 VALUES LESS THAN (190),
PARTITION P200 VALUES LESS THAN (200)
);
CREATE OR REPLACE PROCEDURE extract_sqlaudit_proc(
p_NAME VARCHAR2, p_FLAG CHAR, p_FLAG_PROCORD, p_FLAG_PROCORD
p_FLAG CHAR,
p_START_TIME VARCHAR2, -- start time
p_END_TIME VARCHAR2 -- end time
)
/*
extract_sqlaudit_proc
Author:small to sharp
Time:2024-08-01
Purpose: for pulling gv$ob_sql_audit view data
*/
IS
V_NAME VARCHAR2(100) := UPPER(P_NAME); -- scene name
V_FLAG CHAR(2) := UPPER(p_FLAG); -- s for pulling a single transaction, m for pulling multiple transactions (pressure testing), the
V_MAX_BATCH INT.
V_START_TIME VARCHAR2(100) := P_START_TIME; -- start time
V_END_TIME VARCHAR2(100) := p_END_TIME; --end time
/*
V_NAME:name of the scene
V_FLAG:S for pulling a single transaction, M for pulling multiple transactions (pressure testing)
V_MAX_BATCH:Record batch number of times
*/
V_SQL VARCHAR2(4000).
V_FLAG_EXCEPTION EXCEPTION.
BEGIN
IF V_FLAG ! = 'S' AND V_FLAG ! = 'M' THEN
RAISE V_FLAG_EXCEPTION.
END IF.
-- Add 1 to the maximum number of finds
SELECT NVL(MAX(C_BATCH_ID), 0) + 1 INTO V_MAX_BATCH FROM XZ_SQL_AUDIT;
-- Convert incoming values to string timestamp format.
V_START_TIME := TO_CHAR(TO_TIMESTAMP(V_START_TIME, 'YYYY-MM-DD HH24:MI:SS:FF6'), 'YYYY-MM-DD HH24:MI:SS:FF6');
V_END_TIME := TO_CHAR(TO_TIMESTAMP(V_END_TIME, 'YYYY-MM-DD HH24:MI:SS:FF6'), 'YYYY-MM-DD HH24:MI:SS:FF6');
V_SQL := '
insert /*+ ENABLE_PARALLEL_DML PARALLEL(100)*/ into XZ_SQL_AUDIT
select (:1 * 10000000000)+i , scn_to_timestamp(x.request_time * 1000),:2,:3,:4,
x.SVR_IP, x.SVR_PORT, :2, :3, :4, x.SVR_IP
x.SVR_PORT, x.REQUEST_ID,:2,:3,:4, x.SVR_IP, x.SVR_PORT, x.SVR_PORT
x.REQUEST_ID, x.
x.SQL_EXEC_ID, x.
x.TRACE_ID, x.SQL_EXEC_ID, x.TRACE_ID, x.TRACE_ID
x.SQL_EXEC_ID, x.TRACE_ID, x.
x.CLIENT_IP, , x.CLIENT_PORT, , x.SQL_EXEC_ID
x.CLIENT_PORT, , x.TENANT_ID, , x.SQL_EXEC_ID
x.TENANT_ID, , x.EFFECTIVE_TENANT_ID
x.EFFECTIVE_TENANT_ID, , x.TENANT_NAME, , x.TENANT_ID
x.TENANT_NAME, x.USER_ID, x.EFFECTIVE_TENANT_ID
x.USER_ID, x.USER_NAME
x.USER_NAME, x.USER_GROUP
x.USER_GROUP, x.USER_CLIENT
x.USER_CLIENT_IP, x.DB_ID, x.DB_IP, x.USER_GROUP
x.DB_ID, x.DB_NAME, x.DB_RESPONSE_IP, x.DB_ID
x.DB_ID, x.DB_NAME, x.SQL_ID, x.SQL_IP
x.PLAN_ID, x.AFFECTED_ROWSED_SQL_ID, x.QUERY_SQL, x.PLAN_ID, x.AFFECTED_ROWSED_SQL_ID
x.AFFECTED_ROWS, x.RETURN_ROWS
x.RETURN_ROWS, x.PARTITION_CN
x.PARTITION_CNT, x.RET_CODE, x.PARTITION_CNT
x.RET_CODE, x.QC_ID, x.PARTITION_CNT, x.RET_CODE
x.QC_ID, x.DFO_ID, x.RETURN_ROWS
x.SQC_ID, x.WORKER_ID
x.WORKER_ID, .
x.WORKER_ID, x.P1TEXT, x.P1TEXT
x.P1TEXT, , x.P1
x.P1, , x.P2TEXT
x.P2TEXT, , x.P2TEXT
x.P2, , x.P3TEXT
x.P3TEXT, , x.P3TEXT
x.P3, x.WAIT_CLASS_ID, x.WAIT_CLASS_ID
x.WAIT_CLASS_ID, x.WAIT_CLASS#, x.WAIT_CLASS_ID
x.WAIT_CLASS#, x.
x.WAIT_CLASS.
WAIT_CLASS
x.WAIT_TIME_MICRO, , x.WAIT_TIME_MICRO
x.TOTAL_WAIT_TIME_MICRO, , x.TOTAL_WAIT_TIME_MICRO
x.TOTAL_WAITS, , x.
x.RPC_COUNT, , x.PLAN_TYPE
x.PLAN_TYPE, x.IS_INNER_TIME_MICRO
x.IS_INNER_SQL, x.IS_EXECUTOR_SQL, x.IS_EXECUTOR_SQL
x.IS_EXECUTOR_RPC, x.
x.IS_HIT_PLAN, x.REQUEST_TIME_TYPE
x.REQUEST_TIME, x.ELAPSED_TIME
x.ELAPSED_TIME, x.NET_TIME, x.
x.NET_TIME, x.NET_WAIT_TIME, x.
NET_WAIT_TIME, x.QUEUE_TIME, x.NET_WAIT_TIME, x.
x.QUEUE_TIME, x.DECODE_TIME
x.DECODE_TIME, x.GET_PLAN_TIME
x.GET_PLAN_TIME, x.EXECUTE_TIME, x.
x.EXECUTE_TIME, x.APPLICATION_WAIT_TIME, x.
x.APPLICATION_WAIT_TIME, x.CONCURRENCY_TIME
x.CONCURRENCY_WAIT_TIME, x.USER_IO_WAIT_TIME, x.USER_IO_WAIT_TIME
x.USER_IO_WAIT_TIME, x.SCHEDULE_TIME
x.SCHEDULE_TIME, x.ROW_CACHE_TIME, x.ROW_CACHE_TIME
x.ROW_CACHE_HIT, x.
x.BLOOM_FILTER_CACHE_HIT, x.
x.BLOCK_CACHE_HIT, x.
x.DISK_READS, x.RETRY_CACHE_HIT, x.
x.RETRY_CNT, x.
x.TABLE_SCAN, x.CONSISTENCY
x.CONSISTENCY_LEVEL, x.MEMSTORE_READS
x.MEMSTORE_READ_ROW_COUNT, x.SSSTORE_READ_ROW_COUNT, x.SSSTORE_READ_ROW_COUNT
x.SSSTORE_READ_ROW_COUNT, x.SSSTORE_READ_ROW_COUNT, x.DATA_BLOCK_CNT
x.DATA_BLOCK_READ_CNT, x.DATA_BLOCK_READ_CNT, x.
x.DATA_BLOCK_CACHE_HIT, x.
x.DATA_BLOCK_READ_CNT, x.
x.INDEX_BLOCK_READ_CNT, x.DATA_BLOCK_CACHE_HIT, x.
x.INDEX_BLOCK_READ_CNT, x.INDEX_BLOCK_CACHE_HIT, x.INDEX_BLOCK_CNT
x.BLOCKSCAN_ROW_CNT, x.
x.PUSHDOWN_STORAGE_FILTER_ROW_CNT, x.REQUEST_MEMORANDUM
x.REQUEST_MEMORY_USED, x.EXPECTED_WORK_CNT
x.EXPECTED_WORKER_COUNT, x.USED_WORKER_COUNT, x.USED_WORKER_COUNT
x.USED_WORKER_COUNT, x.SCHEDDED_INFO, x.EXPECTED_WORKER_COUNT
x.SCHED_INFO, x.PS_CLIENT_INFO, x.PS_CLIENT_INFO
x.PS_CLIENT_STMT_ID, x.PS_INNER_INFO, x.PS_CLIENT_STMT_ID, x.PS_CLIENT_STMT_ID
x.PS_INNER_STMT_ID, x.TX_ID, x.TX_ID, x.TX_ID, x.TX_ID
x.TX_ID, x.TX_ID, x.SNAPSHOT_INFO
x.SNAPSHOT_VERSION, x.REQUEST_TYPE
x.REQUEST_TYPE, x.IS_BATCHED
x.IS_BATCHED_MULTI_STMT, x.OB_TRACE_VERSION, x.OB_TYPE
x.OB_TRACE_INFO, x.PLAN_HASH, x.OB_TRACE_INFO
x.PLAN_HASH, x.PARAMS_VALUE
x.PARAMS_VALUE, x.RULE_NAME, x.RULE_NAMES
x.RULE_NAME, x.TX_INTERNAL_INFO, x.TX_INTERNAL_INFO
x.TX_INTERNAL_ROUTING, x.TX_STATE_VALUE, x.TX_STATE_VALUE
x.TX_STATE_VERSION, x.FLT_TRACE
x.FLT_TRACE_ID, x.NETWORK_WAX
x.NETWORK_WAIT_TIME
from(select /*+ PARALLEL(100) query_timeout(50000000000) */
rownum i, a.* from GV$OB_SQL_AUDIT a where
query_sql not like ''% is used to exclude unneeded table names, such as remote scheduling tasks initiated by other microservices during the pressure test, confirm with the business side in advance, and continue to add and query_sql not like%'' if there are more than one.
and query_sql not like ''%table_name_1%'' -- excluded tables
and query_sql not like ''%table_name_2%''
and query_sql not like ''%table_name_3%''
and query_sql not like ''%table_name_4%''
and query_sql not like ''%table_name_5%''
and query_sql not like ''%ob_sql_audit%'' -- exclude query_sql for ob_sql_audit
and TENANT_NAME <> ''sys''
and query_sql not like ''%extract_sqlaudit_proc%'' -- exclude query sql for extract_sqlaudit_proc
) x ';
-- Add filter conditions
V_SQL := V_SQL || ' where to_char(scn_to_timestamp((request_time * 1000)),''YYYY-MM-DD hh24:mi:ss:ff6'') >=' | ''''
|| V_START_TIME || '''' || ' AND ' ||
' to_char(scn_to_timestamp((request_time * 1000)),''YYYY-MM-DD hh24:mi:ss:ff6'') <= '
|| '''' || V_END_TIME || '''' || ';'.
EXECUTE IMMEDIATE V_SQL USING V_MAX_BATCH,V_MAX_BATCH,V_NAME,V_FLAG;
COMMIT.
DBMS_OUTPUT.PUT_LINE(p_START_TIME | | ' ~ ' | | p_END_TIME | | 'The time period can be queried for XZ_SQL_AUDIT Table C_BATCH_ID: ' | |
V_MAX_BATCH).
/*
-- debug code
dbms_output.put_line(V_NAME);
dbms_output.put_line(V_MAX_BATCH);
dbms_output.put_line(V_START_TIME);
dbms_output.put_line(V_START_TIME); dbms_output.put_line(V_END_TIME); dbms_output.
dbms_output.put_line(V_SQL);
*/
EXCEPTION
WHEN V_FLAG_EXCEPTION THEN
DBMS_OUTPUT.PUT_LINE('Incorrect value passed for p_FLAG parameter, S: for pulling single transaction, M: for pulling multiple transactions (pressure test)'); ;
END.
begin
yzj.extract_sqlaudit_proc(
P_NAME => 'OB 4522 Scenario: Internet Banking Borrowing 200 Concurrent', -- name of the scenario to be pressure tested
p_FLAG => 'm', -- s for pulling a single transaction, m for pulling multiple transactions (pressure test)
p_START_TIME => '2024-08-10 18:00:00', -- press test start time
p_END_TIME => '2024-08-10 18:59:00' -- press test end time
);
end ;
/
This table is designed as a partitioned table in order to cater to the characteristics of the ob and the large amount of data that needs to be saved. c_batch_id is the core field of this table.
After each execution of extract_sqlaudit_proc, the c_batch_id will be added 1, stored over the automatic execution, for example, after calling the procedure above, by querying: select * from xz_sql_audit where c_batch_id = 1; , you can check out all the data in the period 2024-08-10 18:00:00 ~ 2024-08-10 18:59:00 all the data of this period. Yossi, isn't it very convenient. 😁😁😁😁😁😁
Warm tips: if in obclient black screen execution, it is recommended to set serveroutput on ; option open in advance, save over will prompt the data of this period of time to use the c_batch_id is what.
2, through the xz_sql_audit table to analyze case one
Some of the important event intervals during sql execution are as follows:
SELECT /*+ PARALLEL(80) */
a.TX_ID Transaction ID,
SQL count, round(a.ELAPSED_TIME)
round(a.ELAPSED_TIME) Maximum total execution time, round(a.ELAPSED_TIME) Average total execution time, round(a.ELAPSED_TIME)
round(a.ELAPSED_TIME) Maximum total execution time, round(a.ELAPSED_TIME) Average total execution time, round(a.ELAPSED_TIME)
round(a.ELAPSED_TIME) minimum total execution time, round( max(a.net_times) )
round( max(a.net_time) ) Maximum RPC Received Time, round( max(a.net_time) ) Maximum RPC Received Time, round( max(a.net_time) )
round( max(a.net_wait_time) ) Maximum request received time, round( max(a.net_wait_time) )
round( max(a.QUEUE_TIME) ) maximum queue time, round( max(a.net_wait_time) ) maximum request received time, round( max(a.net_wait_time) )
round( max(a.DECODE_TIME) ) max out-of-queue resolution time, round( max(a.get_time) ) max out-of-queue resolution time, round( max(a.get_time) )
round( max(a.get_plan_time) ) max get_plan_time, round( max(a.execute_plan_time) )
round( max(a.execute_time) ) Maximum execution time, round( max(a.get_plan_time) )
round( max(a.RETRY_CNT) ) max retries, round( max(a.RPC.RETRY_CNT) )
round( max(a.RPC_COUNT) ) max RPC counts, round( avg(a.net) )
round( avg(a.net_time) ) average RPC received time, round( avg(a.net_time) )
round( avg(a.net_wait_time) ) average request received time, round( avg(a.net_wait_time) )
round( avg(a.QUEUE_TIME) ) Average queue time, round( avg(a.net_wait_time) )
round( avg(a.DECODE_TIME) ) average out-of-queue resolution time, round( avg(a.DECODE_TIME) )
round( avg(a.get_plan_time) ) average get_plan_time, round( avg(a.execute_plan_time) )
round( avg(a.execute_time) ) average execution time, round( avg(a.RECEIVE_TIME) )
round( avg(a.RETRY_CNT) ) average retry time, round( avg(a.RETRY_CNT) )
round( avg(a.RPC_COUNT) ) average number of RPCs, round( avg(a.RPC_COUNT) )
round( min(a.net_time) ) Minimum RPC receive time, round( min(a.net_time) )
round( min(a.net_wait_time) ) Minimum request received time, round( min(a.net_wait_time) )
round( min(a.QUEUE_TIME) ) minimum queue time, round( min(a.net_wait_time) )
round( min(a.DECODE_TIME) ) minimum out-of-queue resolution time, round( min(a.get_time) )
round( min(a.get_plan_time) ) minimum get_plan_time, round( min(a.execute_plan_time) )
round( min(a.execute_time) ) Minimum execution time, round( min(a.RETURN_TIME) )
round( min(a.RETRY_CNT) ) minimum retry count, round( min(a.RPC_time) ) minimum execution time, round( min(a.RPC_time) )
round( min(a.RPC_COUNT) ) Minimum RPC counts
FROM XZ_SQL_AUDIT a inner join
(SELECT COUNT(to_char(substr(query_sql,1,2000))) gg,
C_BATCH_ID,
TX_ID
FROM XZ_SQL_AUDIT GROUP BY C_BATCH_ID,TX_ID ) b
ON a.TX_ID = b.TX_ID and a.C_BATCH_ID = b.C_BATCH_ID
WHERE a.C_BATCH_ID = 2 -- change to the corresponding c_batch_id
GROUP BY a.TX_ID, ORDER BY 3 DESC.
The SQL is able to analyze the running link of sql within different things on the server side, to see in which which link the top sql actually consumes high time, whether it is the queue, the network, the acquisition of the plan step, or the stage of the execution of the slow, the specific problem is analyzed specifically.
All fields within the sql are analyzed by the order of execution from left to right in the figure above, so that project staff can immediately locate the time-consuming steps.
3, through the xz_sql_audit table to analyze the case two
I encountered a strange problem during the pressure test, the QPS would trough every 2 minutes and the TPS response time would spike at the same moment.
The following SQL can be used to locate the SQL access traffic around 15 seconds before and after the range during QPS and TPS troughs.
select /*+ PARALLEL(62) */
to_char(a.c_rq_time,'YYYY-MM-DD HH24:MI:SS') c_date,
case when a.plan_type = 0 then 'internal plan' when a.plan_type = 1 then 'local program' when a.plan_type = 2 then 'teleprogram' when a.plan_type = 3 then 'distributed program' else null end plan_type,
count(1) cnt
from XZ_SQL_AUDIT a
where a.C_BATCH_ID=2 and c_rq_time between to_timestamp('2024-08-06 19:22:00','YYYY-MM-DD HH24:MI:SS') and to_timestamp('2024-08-06 19:22:15','YYYY-MM-DD HH24:MI:SS')
group by to_char(a.c_rq_time,'YYYY-MM-DD HH24:MI:SS'),a.plan_type
order by 2,1
You can see that 19:22:00 This time period sql still has 54008 accesses, but the next second directly dropped to 14028 accesses.
But the final stonewall in this case is that using benchmarkSQL pressure testing on a machine in the same network segment has no problem at all, but launching benchmarkSQL pressure testing on a pressure testing machine for a cross-segment application reproduces this phenomenon perfectly.
So it is possible that some factors such as cross network segments, or network instability caused by the QPS trough, TPS response spikes, in fact, has not been ruled out clearly, just to the network direction of the investigation.
But the above method of analyzing sql traffic can also play a certain corroboration of the effect of encountering a problem to multi-dimensional level and means to rule out, and finally determine the ultimate root cause.
4. Analyze case three through the xz_sql_audit table.
The customer's core system is a JAVA-developed microservices architecture, with eight application machines and hundreds of virtual machines running various auxiliary services behind it.
The transaction management approach at the application level is to use the TCC distributed transaction model with a mechanism for transaction compensation (the application layer provides a rollback scheme that does not require the database to be rolled back).
The advantages and disadvantages of this distributed transaction model are as follows:
- Pros:
- Business layer logic is often implemented by writing simple point-and-click SQL, not complex SQL.
- Drawbacks:
- The transaction volume is huge, and one transaction contains very many point-check SQL.
- The rollback logic is application-provided and may result in inconsistent data if a TCP broken link occurs during the period.
- The business logic is very complex and tests the developer's ability to understand the business.
Distributed transaction models like TCC have awesome performance if run under a centralized database. 😁😁😁😁😁😁😁😁
Due to the large number of transactions, it is difficult to analyze what the SQL inside a single transaction scenario consists of under the original gv$ob_sql_audit view, let alone how to optimize this interface without knowing the model of a single transaction of the business interface.
This is how I communicate with business people on my end:
- Tasks that allow business developers to initiate a single transaction for a given scenario.
- Have the business person provide the transaction start time and end time, preferably down to microseconds, or milliseconds if that's not possible.
- Then I use extract_sqlaudit_proc to store over to collect the corresponding point in time, for example, a transaction task initiated by the business development, the start time and the end time are: 2024-08-10 20:01:01.123 ~ 2024-08-10 20:01:01.500 (a transaction millisecond level), then I can add 1 second to the start time and the end time by adding 1 second each to expand the collection of all SQL in that time period.
begin
yzj.extract_sqlaudit_proc(
P_NAME => 'OB 4522 Scenario: Internet Banking Borrowing Single Transaction', -- name of the scenario to be pressure tested
p_FLAG => 's', -- s for pulling single transaction, m for pulling multiple transactions (pressure test)
p_START_TIME => '2024-08-10 20:01:00.123', -- transaction start time
p_END_TIME => '2024-08-10 20:01:02.123' -- transaction end time
);
end ;
/
- Then I use extract_sqlaudit_proc to store over to collect the corresponding point in time, for example, a transaction task initiated by the business development, the start time and the end time are: 2024-08-10 20:01:01.123 ~ 2024-08-10 20:01:01.500 (a transaction millisecond level), then I can add 1 second to the start time and the end time by adding 1 second each to expand the collection of all SQL in that time period.
- You also need to identify with business development the SQL statement for the table where the transaction starts executing and the SQL statement for the table where the transaction ends executing, or whatever the SQL is, preferably with flags that identify the operations that are part of this transaction.
- Finally, the following SQL is used to analyze the number of all transactions for a single transaction and the SQL statements contained in the different transactions.
SELECT *
FROM (SELECT TO_CHAR(b.trans_start_time, 'YYYY-MM-DD HH24:MI:SS.FF6') Single transaction start time, TO_CHAR(b.trans_end_time, 'YYYY-MM-DD HH24:MI:SS.FF6')
TO_CHAR(b.trans_end_time, 'YYYY-MM-DD HH24:MI:SS.FF6') single transaction end time, ROW_NUMBER()
ROW_NUMBER() OVER (PARTITION BY a.tx_id ORDER BY a.c_rq_time) sql order of execution within a transaction,
a.tx_id Transaction ID, TO_CHAR(a.c_rq_time)
TO_CHAR(a.c_rq_time, 'YYYY-MM-DD HH24:MI:SS.FF6') sql request execution time,
TO_CHAR(SUBSTR(a.query_sql, 1, 2000)) sql statement,
a.net_time RPC receive time, TO_CHAR(SUBSTR(a.query_sql, 1, 2000))
a.net_wait_time RPC received, a.net_wait_time request received, a.QUEUE_TIME
a.QUEUE_TIME Queue time, a.DECODE_TIME
a.DECODE_TIME out of queue parse time, a.get_plan_time get_plan_time
a.get_plan_time Get plan time, a.execute_time
a.execute_time Execute time, a.RETRY_CFD
a.RETRY_CNT Retry count, a.RPC_COUNT RUNT
a.RPC_COUNT RPC count, a.ELAPSED_TIME
a.ELAPSED_TIME total execution time
FROM xz_sql_audit a
INNER JOIN (SELECT MIN(c_rq_time) trans_start_time,
MAX(c_rq_time) trans_end_time
FROM xz_sql_audit
WHERE c_batch_id = 6 -- c_batch_id Go to your own environment and remember to change the
AND (
query_sql LIKE '%insert%in_log%' -- SQL statement operation at the beginning of the transaction
OR query_sql LIKE '%update%in_log%') -- the SQL statement that ends the transaction.
) b
ON a.c_rq_time >= b.trans_start_time AND a.c_rq_time <= b.trans_end_time AND a.c_batch_id = 6)
5. Concluding remarks
The extract_sqlaudit_proc process helps by pulling the gv$ob_sql_audit data into the xz_sql_audit table, but it's up to you to understand the gv$ob_sql_audit table and your own SQL skills to analyze the data for the project staff.
SQL skills are required to write SQL for analyzing the data needed for different scenarios, rather than being limited to analyzing in one or two scenarios.
Another point, you have to write what SQL analysis, just add the appropriate index to xz_sql_audit, can accelerate the efficiency of the analysis of the sql, machine resources enough case, analyze the sql more concurrently, faster than gv$ob_sql_audit, greatly improving the efficiency of the work.
I'm still messing around with obproxy's log analysis script, passing the start and end times of the xz_sql_audit table data into a script to form structured data from obproxy's logs into the obproxy_audit table, and then later on, I can write a sql to analyze how much of a business transaction, the server layer, and the proxy layer's consumption are used respectively. time. 🤞🤞🤞🤞🤞🤞🤞