Location>code7788 >text

Using extract_sqlaudit_proc to save over analyzing ob performance issues

Popularity:350 ℃/2024-08-10 20:52:30

Recently in a financial unit core system project to do ob performance pressure testing, during which encountered a lot of problems, the scene for two weeks every day to work overtime to about one or two o'clock in the morning, really tired. 🤢🤢🤢🤢🤢🤢
I actually into ob before there is mental expectation, roll on roll, 8 or 9 o'clock off work is also acceptable, I did not expect to really to do the project situation, every day in the early hours of the morning off work, I really fear that I do not know when it will die suddenly. 😂😂😂
ob is pretty much criticized by the gv$ob_sql_audit view, if there are a large number of things in the database inside the execution of the contents of this view will soon be brushed away, and do query analysis is very slow, just a predicate filtering statement is estimated to be executed for a few minutes to get the results, it is difficult to analyze the business interface in the end which thing is slow SQL.
Although the ob_sql_audit_percentage parameter can be adjusted through the gv$ob_sql_audit view of the saved content, but if the continuous high concurrency pressure test case, the same will also be inside the content of the very fast refresh, so we need to gv$ob_sql_audit data persistence to the data table inside, add appropriate indexes used to analyze the server layer different scenarios business interface sql statement, in the end slow in which link, to target tuning.
So I went to the site to understand the situation, spend half a day writing and debugging extract_sqlaudit_proc this procedure to help me grab the server layer data, this process subsequently help me analyze the performance of different scenarios to save a lot of effort.


1, xz_sql_audit table, extract_sqlaudit_proc process code

Key Note: This option is only available on oracle tenants.

-- Creating an Analysis Table
CREATE TABLE XZ_SQL_AUDIT(
 C_ID							  							NUMBER(38),				-- uniqueID
 C_RQ_TIME					      				TIMESTAMP,				-- Request time
 C_BATCH_ID						  					NUMBER(38),				-- Insertion of batches	
 C_NAME							  						VARCHAR2(100),			-- Scene Name
 C_FLAG							  						CHAR(2),					-- s For pulling a single transaction,m For pulling multiple transactions(pressure measurement)
 SVR_IP                           VARCHAR2(46)      ,
 SVR_PORT                         NUMBER(38)        ,
 REQUEST_ID                       NUMBER(38)        ,
 SQL_EXEC_ID                      NUMBER(38)        ,
 TRACE_ID                         VARCHAR2(128)     ,
 SID                              NUMBER(38)        ,
 CLIENT_IP                        VARCHAR2(46)      ,
 CLIENT_PORT                      NUMBER(38)        ,
 TENANT_ID                        NUMBER(38)        ,
 EFFECTIVE_TENANT_ID              NUMBER(38)        ,
 TENANT_NAME                      VARCHAR2(64)      ,
 USER_ID                          NUMBER(38)        ,
 USER_NAME                        VARCHAR2(64)      ,
 USER_GROUP                       NUMBER(38)        ,
 USER_CLIENT_IP                   VARCHAR2(46)      ,
 DB_ID                            NUMBER(38)        ,
 DB_NAME                          VARCHAR2(128)     ,
 SQL_ID                           VARCHAR2(32)      ,
 QUERY_SQL                        CLOB              ,
 PLAN_ID                          NUMBER(38)        ,
 AFFECTED_ROWS                    NUMBER(38)        ,
 RETURN_ROWS                      NUMBER(38)        ,
 PARTITION_CNT                    NUMBER(38)        ,
 RET_CODE                         NUMBER(38)        ,
 QC_ID                            NUMBER(38)        ,
 DFO_ID                           NUMBER(38)        ,
 SQC_ID                           NUMBER(38)        ,
 WORKER_ID                        NUMBER(38)        ,
 EVENT                            VARCHAR2(64)      ,
 P1TEXT                           VARCHAR2(64)      ,
 P1                               NUMBER(38)        ,
 P2TEXT                           VARCHAR2(64)      ,
 P2                               NUMBER(38)        ,
 P3TEXT                           VARCHAR2(64)      ,
 P3                               NUMBER(38)        ,
 WAIT_CLASS_ID                    NUMBER(38)        ,
 WAIT_CLASS#                      NUMBER(38)        ,
 WAIT_CLASS                       VARCHAR2(64)      ,
 STATE                            VARCHAR2(19)      ,
 WAIT_TIME_MICRO                  NUMBER(38)        ,
 TOTAL_WAIT_TIME_MICRO            NUMBER(38)        ,
 TOTAL_WAITS                      NUMBER(38)        ,
 RPC_COUNT                        NUMBER(38)        ,
 PLAN_TYPE                        NUMBER(38)        ,
 IS_INNER_SQL                     NUMBER(38)        ,
 IS_EXECUTOR_RPC                  NUMBER(38)        ,
 IS_HIT_PLAN                      NUMBER(38)        ,
 REQUEST_TIME                     NUMBER(38)        ,
 ELAPSED_TIME                     NUMBER(38)        ,
 NET_TIME                         NUMBER(38)        ,
 NET_WAIT_TIME                    NUMBER(38)        ,
 QUEUE_TIME                       NUMBER(38)        ,
 DECODE_TIME                      NUMBER(38)        ,
 GET_PLAN_TIME                    NUMBER(38)        ,
 EXECUTE_TIME                     NUMBER(38)        ,
 APPLICATION_WAIT_TIME            NUMBER(38)        ,
 CONCURRENCY_WAIT_TIME            NUMBER(38)        ,
 USER_IO_WAIT_TIME                NUMBER(38)        ,
 SCHEDULE_TIME                    NUMBER(38)        ,
 ROW_CACHE_HIT                    NUMBER(38)        ,
 BLOOM_FILTER_CACHE_HIT           NUMBER(38)        ,
 BLOCK_CACHE_HIT                  NUMBER(38)        ,
 DISK_READS                       NUMBER(38)        ,
 RETRY_CNT                        NUMBER(38)        ,
 TABLE_SCAN                       NUMBER(38)        ,
 CONSISTENCY_LEVEL                NUMBER(38)        ,
 MEMSTORE_READ_ROW_COUNT          NUMBER(38)        ,
 SSSTORE_READ_ROW_COUNT           NUMBER(38)        ,
 DATA_BLOCK_READ_CNT              NUMBER(38)        ,
 DATA_BLOCK_CACHE_HIT             NUMBER(38)        ,
 INDEX_BLOCK_READ_CNT             NUMBER(38)        ,
 INDEX_BLOCK_CACHE_HIT            NUMBER(38)        ,
 BLOCKSCAN_BLOCK_CNT              NUMBER(38)        ,
 BLOCKSCAN_ROW_CNT                NUMBER(38)        ,
 PUSHDOWN_STORAGE_FILTER_ROW_CNT  NUMBER(38)        ,
 REQUEST_MEMORY_USED              NUMBER(38)        ,
 EXPECTED_WORKER_COUNT            NUMBER(38)        ,
 USED_WORKER_COUNT                NUMBER(38)        ,
 SCHED_INFO                       VARCHAR2(16384)   ,
 PS_CLIENT_STMT_ID                NUMBER(38)        ,
 PS_INNER_STMT_ID                 NUMBER(38)        ,
 TX_ID                            NUMBER(38)        ,
 SNAPSHOT_VERSION                 NUMBER(38)        ,
 REQUEST_TYPE                     NUMBER(38)        ,
 IS_BATCHED_MULTI_STMT            NUMBER(38)        ,
 OB_TRACE_INFO                    VARCHAR2(4096)    ,
 PLAN_HASH                        NUMBER(38)        ,
 PARAMS_VALUE                     CLOB              ,
 RULE_NAME                        VARCHAR2(256)     ,
 TX_INTERNAL_ROUTING              NUMBER            ,
 TX_STATE_VERSION                 NUMBER(38)        ,
 FLT_TRACE_ID                     VARCHAR2(1024)    ,
 NETWORK_WAIT_TIME                NUMBER(38),       
PRIMARY KEY(C_ID,C_BATCH_ID))
PARTITION BY RANGE(C_BATCH_ID)
(
PARTITION P10 VALUES LESS THAN (10),
PARTITION P20 VALUES LESS THAN (20),
PARTITION P30 VALUES LESS THAN (30),
PARTITION P40 VALUES LESS THAN (40),
PARTITION P50 VALUES LESS THAN (50),
PARTITION P60 VALUES LESS THAN (60),
PARTITION P70 VALUES LESS THAN (70),
PARTITION P80 VALUES LESS THAN (80),
PARTITION P90 VALUES LESS THAN (90),
PARTITION P100 VALUES LESS THAN (100),
PARTITION P110 VALUES LESS THAN (110),
PARTITION P120 VALUES LESS THAN (120),
PARTITION P130 VALUES LESS THAN (130),
PARTITION P140 VALUES LESS THAN (140),
PARTITION P150 VALUES LESS THAN (150),
PARTITION P160 VALUES LESS THAN (160),
PARTITION P170 VALUES LESS THAN (170),
PARTITION P180 VALUES LESS THAN (180),
PARTITION P190 VALUES LESS THAN (190),
PARTITION P200 VALUES LESS THAN (200)
);
CREATE OR REPLACE PROCEDURE extract_sqlaudit_proc(
    p_NAME VARCHAR2, p_FLAG CHAR, p_FLAG_PROCORD, p_FLAG_PROCORD
    p_FLAG CHAR,
    p_START_TIME VARCHAR2, -- start time
    p_END_TIME VARCHAR2 -- end time
)
  /*
    extract_sqlaudit_proc
    Author:small to sharp
    Time:2024-08-01
    Purpose: for pulling gv$ob_sql_audit view data
  */
    IS
    V_NAME VARCHAR2(100) := UPPER(P_NAME); -- scene name
    V_FLAG CHAR(2) := UPPER(p_FLAG); -- s for pulling a single transaction, m for pulling multiple transactions (pressure testing), the
    V_MAX_BATCH INT.
    V_START_TIME VARCHAR2(100) := P_START_TIME; -- start time
    V_END_TIME VARCHAR2(100) := p_END_TIME; --end time
    /*
     V_NAME:name of the scene
     V_FLAG:S for pulling a single transaction, M for pulling multiple transactions (pressure testing)
     V_MAX_BATCH:Record batch number of times
    */
    V_SQL VARCHAR2(4000).
    V_FLAG_EXCEPTION EXCEPTION.
BEGIN

    IF V_FLAG ! = 'S' AND V_FLAG ! = 'M' THEN
        RAISE V_FLAG_EXCEPTION.
    END IF.


    -- Add 1 to the maximum number of finds
    SELECT NVL(MAX(C_BATCH_ID), 0) + 1 INTO V_MAX_BATCH FROM XZ_SQL_AUDIT;
    -- Convert incoming values to string timestamp format.
    V_START_TIME := TO_CHAR(TO_TIMESTAMP(V_START_TIME, 'YYYY-MM-DD HH24:MI:SS:FF6'), 'YYYY-MM-DD HH24:MI:SS:FF6');
    V_END_TIME := TO_CHAR(TO_TIMESTAMP(V_END_TIME, 'YYYY-MM-DD HH24:MI:SS:FF6'), 'YYYY-MM-DD HH24:MI:SS:FF6');

    V_SQL := '
        insert /*+ ENABLE_PARALLEL_DML PARALLEL(100)*/ into XZ_SQL_AUDIT
        select (:1 * 10000000000)+i , scn_to_timestamp(x.request_time * 1000),:2,:3,:4,
        x.SVR_IP, x.SVR_PORT, :2, :3, :4, x.SVR_IP
x.SVR_PORT, x.REQUEST_ID,:2,:3,:4, x.SVR_IP, x.SVR_PORT, x.SVR_PORT
x.REQUEST_ID, x.
x.SQL_EXEC_ID, x.
x.TRACE_ID, x.SQL_EXEC_ID, x.TRACE_ID, x.TRACE_ID
x.SQL_EXEC_ID, x.TRACE_ID, x.
x.CLIENT_IP, , x.CLIENT_PORT, , x.SQL_EXEC_ID
x.CLIENT_PORT, , x.TENANT_ID, , x.SQL_EXEC_ID
x.TENANT_ID, , x.EFFECTIVE_TENANT_ID
x.EFFECTIVE_TENANT_ID, , x.TENANT_NAME, , x.TENANT_ID
x.TENANT_NAME, x.USER_ID, x.EFFECTIVE_TENANT_ID
x.USER_ID, x.USER_NAME
x.USER_NAME, x.USER_GROUP
x.USER_GROUP, x.USER_CLIENT
x.USER_CLIENT_IP, x.DB_ID, x.DB_IP, x.USER_GROUP
x.DB_ID, x.DB_NAME, x.DB_RESPONSE_IP, x.DB_ID
x.DB_ID, x.DB_NAME, x.SQL_ID, x.SQL_IP


x.PLAN_ID, x.AFFECTED_ROWSED_SQL_ID, x.QUERY_SQL, x.PLAN_ID, x.AFFECTED_ROWSED_SQL_ID
x.AFFECTED_ROWS, x.RETURN_ROWS
x.RETURN_ROWS, x.PARTITION_CN
x.PARTITION_CNT, x.RET_CODE, x.PARTITION_CNT
x.RET_CODE, x.QC_ID, x.PARTITION_CNT, x.RET_CODE
x.QC_ID, x.DFO_ID, x.RETURN_ROWS

x.SQC_ID, x.WORKER_ID
x.WORKER_ID, .
x.WORKER_ID, x.P1TEXT, x.P1TEXT
x.P1TEXT, , x.P1
x.P1, , x.P2TEXT
x.P2TEXT, , x.P2TEXT
x.P2, , x.P3TEXT
x.P3TEXT, , x.P3TEXT
x.P3, x.WAIT_CLASS_ID, x.WAIT_CLASS_ID
x.WAIT_CLASS_ID, x.WAIT_CLASS#, x.WAIT_CLASS_ID
x.WAIT_CLASS#, x.
x.WAIT_CLASS.
WAIT_CLASS
x.WAIT_TIME_MICRO, , x.WAIT_TIME_MICRO
x.TOTAL_WAIT_TIME_MICRO, , x.TOTAL_WAIT_TIME_MICRO
x.TOTAL_WAITS, , x.
x.RPC_COUNT, , x.PLAN_TYPE
x.PLAN_TYPE, x.IS_INNER_TIME_MICRO
x.IS_INNER_SQL, x.IS_EXECUTOR_SQL, x.IS_EXECUTOR_SQL
x.IS_EXECUTOR_RPC, x.
x.IS_HIT_PLAN, x.REQUEST_TIME_TYPE
x.REQUEST_TIME, x.ELAPSED_TIME
x.ELAPSED_TIME, x.NET_TIME, x.
x.NET_TIME, x.NET_WAIT_TIME, x.
NET_WAIT_TIME, x.QUEUE_TIME, x.NET_WAIT_TIME, x.
x.QUEUE_TIME, x.DECODE_TIME
x.DECODE_TIME, x.GET_PLAN_TIME
x.GET_PLAN_TIME, x.EXECUTE_TIME, x.
x.EXECUTE_TIME, x.APPLICATION_WAIT_TIME, x.
x.APPLICATION_WAIT_TIME, x.CONCURRENCY_TIME
x.CONCURRENCY_WAIT_TIME, x.USER_IO_WAIT_TIME, x.USER_IO_WAIT_TIME
x.USER_IO_WAIT_TIME, x.SCHEDULE_TIME
x.SCHEDULE_TIME, x.ROW_CACHE_TIME, x.ROW_CACHE_TIME
x.ROW_CACHE_HIT, x.
x.BLOOM_FILTER_CACHE_HIT, x.
x.BLOCK_CACHE_HIT, x.
x.DISK_READS, x.RETRY_CACHE_HIT, x.
x.RETRY_CNT, x.
x.TABLE_SCAN, x.CONSISTENCY
x.CONSISTENCY_LEVEL, x.MEMSTORE_READS
x.MEMSTORE_READ_ROW_COUNT, x.SSSTORE_READ_ROW_COUNT, x.SSSTORE_READ_ROW_COUNT
x.SSSTORE_READ_ROW_COUNT, x.SSSTORE_READ_ROW_COUNT, x.DATA_BLOCK_CNT
x.DATA_BLOCK_READ_CNT, x.DATA_BLOCK_READ_CNT, x.
x.DATA_BLOCK_CACHE_HIT, x.
x.DATA_BLOCK_READ_CNT, x.
x.INDEX_BLOCK_READ_CNT, x.DATA_BLOCK_CACHE_HIT, x.
x.INDEX_BLOCK_READ_CNT, x.INDEX_BLOCK_CACHE_HIT, x.INDEX_BLOCK_CNT
x.BLOCKSCAN_ROW_CNT, x.
x.PUSHDOWN_STORAGE_FILTER_ROW_CNT, x.REQUEST_MEMORANDUM
x.REQUEST_MEMORY_USED, x.EXPECTED_WORK_CNT
x.EXPECTED_WORKER_COUNT, x.USED_WORKER_COUNT, x.USED_WORKER_COUNT
x.USED_WORKER_COUNT, x.SCHEDDED_INFO, x.EXPECTED_WORKER_COUNT
x.SCHED_INFO, x.PS_CLIENT_INFO, x.PS_CLIENT_INFO
x.PS_CLIENT_STMT_ID, x.PS_INNER_INFO, x.PS_CLIENT_STMT_ID, x.PS_CLIENT_STMT_ID
x.PS_INNER_STMT_ID, x.TX_ID, x.TX_ID, x.TX_ID, x.TX_ID
x.TX_ID, x.TX_ID, x.SNAPSHOT_INFO
x.SNAPSHOT_VERSION, x.REQUEST_TYPE
x.REQUEST_TYPE, x.IS_BATCHED
x.IS_BATCHED_MULTI_STMT, x.OB_TRACE_VERSION, x.OB_TYPE
x.OB_TRACE_INFO, x.PLAN_HASH, x.OB_TRACE_INFO
x.PLAN_HASH, x.PARAMS_VALUE
x.PARAMS_VALUE, x.RULE_NAME, x.RULE_NAMES
x.RULE_NAME, x.TX_INTERNAL_INFO, x.TX_INTERNAL_INFO
x.TX_INTERNAL_ROUTING, x.TX_STATE_VALUE, x.TX_STATE_VALUE
x.TX_STATE_VERSION, x.FLT_TRACE
x.FLT_TRACE_ID, x.NETWORK_WAX
x.NETWORK_WAIT_TIME
         from(select /*+ PARALLEL(100) query_timeout(50000000000) */
        rownum i, a.* from GV$OB_SQL_AUDIT a where
query_sql not like ''% is used to exclude unneeded table names, such as remote scheduling tasks initiated by other microservices during the pressure test, confirm with the business side in advance, and continue to add and query_sql not like%'' if there are more than one.
and query_sql not like ''%table_name_1%'' -- excluded tables
and query_sql not like ''%table_name_2%''
and query_sql not like ''%table_name_3%''
and query_sql not like ''%table_name_4%''
and query_sql not like ''%table_name_5%''
and query_sql not like ''%ob_sql_audit%'' -- exclude query_sql for ob_sql_audit
and TENANT_NAME <> ''sys''
and query_sql not like ''%extract_sqlaudit_proc%'' -- exclude query sql for extract_sqlaudit_proc
) x ';
    -- Add filter conditions
    V_SQL := V_SQL || ' where to_char(scn_to_timestamp((request_time * 1000)),''YYYY-MM-DD hh24:mi:ss:ff6'') >=' | ''''
        || V_START_TIME || '''' || ' AND ' ||
             ' to_char(scn_to_timestamp((request_time * 1000)),''YYYY-MM-DD hh24:mi:ss:ff6'') <= '
        || '''' || V_END_TIME || '''' || ';'.

    EXECUTE IMMEDIATE V_SQL USING V_MAX_BATCH,V_MAX_BATCH,V_NAME,V_FLAG;
    COMMIT.

    DBMS_OUTPUT.PUT_LINE(p_START_TIME | | ' ~ ' | | p_END_TIME | | 'The time period can be queried for XZ_SQL_AUDIT Table C_BATCH_ID: ' | |
                         V_MAX_BATCH).

    /*
             -- debug code
     dbms_output.put_line(V_NAME);
     dbms_output.put_line(V_MAX_BATCH);
     dbms_output.put_line(V_START_TIME);
     dbms_output.put_line(V_START_TIME); dbms_output.put_line(V_END_TIME); dbms_output.
     dbms_output.put_line(V_SQL);
*/
EXCEPTION
    WHEN V_FLAG_EXCEPTION THEN
        DBMS_OUTPUT.PUT_LINE('Incorrect value passed for p_FLAG parameter, S: for pulling single transaction, M: for pulling multiple transactions (pressure test)'); ;

END.
begin
yzj.extract_sqlaudit_proc(
    P_NAME => 'OB 4522 Scenario: Internet Banking Borrowing 200 Concurrent', -- name of the scenario to be pressure tested
    p_FLAG => 'm', -- s for pulling a single transaction, m for pulling multiple transactions (pressure test)
    p_START_TIME => '2024-08-10 18:00:00', -- press test start time
    p_END_TIME => '2024-08-10 18:59:00' -- press test end time
);
end ;
/


This table is designed as a partitioned table in order to cater to the characteristics of the ob and the large amount of data that needs to be saved. c_batch_id is the core field of this table.
After each execution of extract_sqlaudit_proc, the c_batch_id will be added 1, stored over the automatic execution, for example, after calling the procedure above, by querying: select * from xz_sql_audit where c_batch_id = 1; , you can check out all the data in the period 2024-08-10 18:00:00 ~ 2024-08-10 18:59:00 all the data of this period. Yossi, isn't it very convenient. 😁😁😁😁😁😁
Warm tips: if in obclient black screen execution, it is recommended to set serveroutput on ; option open in advance, save over will prompt the data of this period of time to use the c_batch_id is what.


2, through the xz_sql_audit table to analyze case one

Some of the important event intervals during sql execution are as follows:

SELECT /*+ PARALLEL(80) */
a.TX_ID Transaction ID,
SQL count, round(a.ELAPSED_TIME)
round(a.ELAPSED_TIME) Maximum total execution time, round(a.ELAPSED_TIME) Average total execution time, round(a.ELAPSED_TIME)
round(a.ELAPSED_TIME) Maximum total execution time, round(a.ELAPSED_TIME) Average total execution time, round(a.ELAPSED_TIME)
round(a.ELAPSED_TIME) minimum total execution time, round( max(a.net_times) )
round( max(a.net_time) ) Maximum RPC Received Time, round( max(a.net_time) ) Maximum RPC Received Time, round( max(a.net_time) )
round( max(a.net_wait_time) ) Maximum request received time, round( max(a.net_wait_time) )
round( max(a.QUEUE_TIME) ) maximum queue time, round( max(a.net_wait_time) ) maximum request received time, round( max(a.net_wait_time) )
round( max(a.DECODE_TIME) ) max out-of-queue resolution time, round( max(a.get_time) ) max out-of-queue resolution time, round( max(a.get_time) )
round( max(a.get_plan_time) ) max get_plan_time, round( max(a.execute_plan_time) )
round( max(a.execute_time) ) Maximum execution time, round( max(a.get_plan_time) )
round( max(a.RETRY_CNT) ) max retries, round( max(a.RPC.RETRY_CNT) )
round( max(a.RPC_COUNT) ) max RPC counts, round( avg(a.net) )
round( avg(a.net_time) ) average RPC received time, round( avg(a.net_time) )
round( avg(a.net_wait_time) ) average request received time, round( avg(a.net_wait_time) )
round( avg(a.QUEUE_TIME) ) Average queue time, round( avg(a.net_wait_time) )
round( avg(a.DECODE_TIME) ) average out-of-queue resolution time, round( avg(a.DECODE_TIME) )
round( avg(a.get_plan_time) ) average get_plan_time, round( avg(a.execute_plan_time) )
round( avg(a.execute_time) ) average execution time, round( avg(a.RECEIVE_TIME) )
round( avg(a.RETRY_CNT) ) average retry time, round( avg(a.RETRY_CNT) )
round( avg(a.RPC_COUNT) ) average number of RPCs, round( avg(a.RPC_COUNT) )
round( min(a.net_time) ) Minimum RPC receive time, round( min(a.net_time) )
round( min(a.net_wait_time) ) Minimum request received time, round( min(a.net_wait_time) )
round( min(a.QUEUE_TIME) ) minimum queue time, round( min(a.net_wait_time) )
round( min(a.DECODE_TIME) ) minimum out-of-queue resolution time, round( min(a.get_time) )
round( min(a.get_plan_time) ) minimum get_plan_time, round( min(a.execute_plan_time) )
round( min(a.execute_time) ) Minimum execution time, round( min(a.RETURN_TIME) )
round( min(a.RETRY_CNT) ) minimum retry count, round( min(a.RPC_time) ) minimum execution time, round( min(a.RPC_time) )
round( min(a.RPC_COUNT) ) Minimum RPC counts
FROM XZ_SQL_AUDIT a inner join
(SELECT COUNT(to_char(substr(query_sql,1,2000))) gg,
C_BATCH_ID,
TX_ID
FROM XZ_SQL_AUDIT GROUP BY C_BATCH_ID,TX_ID ) b
ON a.TX_ID = b.TX_ID and a.C_BATCH_ID = b.C_BATCH_ID
WHERE a.C_BATCH_ID = 2 -- change to the corresponding c_batch_id
GROUP BY a.TX_ID, ORDER BY 3 DESC.

The SQL is able to analyze the running link of sql within different things on the server side, to see in which which link the top sql actually consumes high time, whether it is the queue, the network, the acquisition of the plan step, or the stage of the execution of the slow, the specific problem is analyzed specifically.
All fields within the sql are analyzed by the order of execution from left to right in the figure above, so that project staff can immediately locate the time-consuming steps.


3, through the xz_sql_audit table to analyze the case two

I encountered a strange problem during the pressure test, the QPS would trough every 2 minutes and the TPS response time would spike at the same moment.
qps低谷.png
tps低谷.png
tps响应突刺.png
The following SQL can be used to locate the SQL access traffic around 15 seconds before and after the range during QPS and TPS troughs.

select /*+ PARALLEL(62) */
   to_char(a.c_rq_time,'YYYY-MM-DD HH24:MI:SS') c_date,
		case when a.plan_type = 0 then 'internal plan' when a.plan_type = 1 then 'local program' when a.plan_type = 2 then 'teleprogram' when a.plan_type = 3 then 'distributed program' else null end plan_type,
		count(1) cnt
		from XZ_SQL_AUDIT a
    where a.C_BATCH_ID=2 and c_rq_time between to_timestamp('2024-08-06 19:22:00','YYYY-MM-DD HH24:MI:SS') and to_timestamp('2024-08-06 19:22:15','YYYY-MM-DD HH24:MI:SS')
group by to_char(a.c_rq_time,'YYYY-MM-DD HH24:MI:SS'),a.plan_type
order by 2,1


You can see that 19:22:00 This time period sql still has 54008 accesses, but the next second directly dropped to 14028 accesses.
But the final stonewall in this case is that using benchmarkSQL pressure testing on a machine in the same network segment has no problem at all, but launching benchmarkSQL pressure testing on a pressure testing machine for a cross-segment application reproduces this phenomenon perfectly.
So it is possible that some factors such as cross network segments, or network instability caused by the QPS trough, TPS response spikes, in fact, has not been ruled out clearly, just to the network direction of the investigation.
But the above method of analyzing sql traffic can also play a certain corroboration of the effect of encountering a problem to multi-dimensional level and means to rule out, and finally determine the ultimate root cause.


4. Analyze case three through the xz_sql_audit table.

The customer's core system is a JAVA-developed microservices architecture, with eight application machines and hundreds of virtual machines running various auxiliary services behind it.
The transaction management approach at the application level is to use the TCC distributed transaction model with a mechanism for transaction compensation (the application layer provides a rollback scheme that does not require the database to be rolled back).
The advantages and disadvantages of this distributed transaction model are as follows:

  • Pros:
    • Business layer logic is often implemented by writing simple point-and-click SQL, not complex SQL.
  • Drawbacks:
    • The transaction volume is huge, and one transaction contains very many point-check SQL.
    • The rollback logic is application-provided and may result in inconsistent data if a TCP broken link occurs during the period.
    • The business logic is very complex and tests the developer's ability to understand the business.

Distributed transaction models like TCC have awesome performance if run under a centralized database. 😁😁😁😁😁😁😁😁
Due to the large number of transactions, it is difficult to analyze what the SQL inside a single transaction scenario consists of under the original gv$ob_sql_audit view, let alone how to optimize this interface without knowing the model of a single transaction of the business interface.
This is how I communicate with business people on my end:

  1. Tasks that allow business developers to initiate a single transaction for a given scenario.
  2. Have the business person provide the transaction start time and end time, preferably down to microseconds, or milliseconds if that's not possible.
  3. Then I use extract_sqlaudit_proc to store over to collect the corresponding point in time, for example, a transaction task initiated by the business development, the start time and the end time are: 2024-08-10 20:01:01.123 ~ 2024-08-10 20:01:01.500 (a transaction millisecond level), then I can add 1 second to the start time and the end time by adding 1 second each to expand the collection of all SQL in that time period.
begin
yzj.extract_sqlaudit_proc(
    P_NAME => 'OB 4522 Scenario: Internet Banking Borrowing Single Transaction', -- name of the scenario to be pressure tested
    p_FLAG => 's', -- s for pulling single transaction, m for pulling multiple transactions (pressure test)
    p_START_TIME => '2024-08-10 20:01:00.123', -- transaction start time
    p_END_TIME => '2024-08-10 20:01:02.123' -- transaction end time
);
end ;
/
  1. Then I use extract_sqlaudit_proc to store over to collect the corresponding point in time, for example, a transaction task initiated by the business development, the start time and the end time are: 2024-08-10 20:01:01.123 ~ 2024-08-10 20:01:01.500 (a transaction millisecond level), then I can add 1 second to the start time and the end time by adding 1 second each to expand the collection of all SQL in that time period.
  2. You also need to identify with business development the SQL statement for the table where the transaction starts executing and the SQL statement for the table where the transaction ends executing, or whatever the SQL is, preferably with flags that identify the operations that are part of this transaction.
  3. Finally, the following SQL is used to analyze the number of all transactions for a single transaction and the SQL statements contained in the different transactions.
SELECT *
FROM (SELECT TO_CHAR(b.trans_start_time, 'YYYY-MM-DD HH24:MI:SS.FF6') Single transaction start time, TO_CHAR(b.trans_end_time, 'YYYY-MM-DD HH24:MI:SS.FF6')
             TO_CHAR(b.trans_end_time, 'YYYY-MM-DD HH24:MI:SS.FF6') single transaction end time, ROW_NUMBER()
             ROW_NUMBER() OVER (PARTITION BY a.tx_id ORDER BY a.c_rq_time) sql order of execution within a transaction,
             a.tx_id Transaction ID, TO_CHAR(a.c_rq_time)
             TO_CHAR(a.c_rq_time, 'YYYY-MM-DD HH24:MI:SS.FF6') sql request execution time,
             TO_CHAR(SUBSTR(a.query_sql, 1, 2000)) sql statement,
             a.net_time RPC receive time, TO_CHAR(SUBSTR(a.query_sql, 1, 2000))
             a.net_wait_time RPC received, a.net_wait_time request received, a.QUEUE_TIME
             a.QUEUE_TIME Queue time, a.DECODE_TIME
             a.DECODE_TIME out of queue parse time, a.get_plan_time get_plan_time
             a.get_plan_time Get plan time, a.execute_time
             a.execute_time Execute time, a.RETRY_CFD
             a.RETRY_CNT Retry count, a.RPC_COUNT RUNT
             a.RPC_COUNT RPC count, a.ELAPSED_TIME
             a.ELAPSED_TIME total execution time
      FROM xz_sql_audit a
               INNER JOIN (SELECT MIN(c_rq_time) trans_start_time,
                                  MAX(c_rq_time) trans_end_time
                           FROM xz_sql_audit
                           WHERE c_batch_id = 6 -- c_batch_id Go to your own environment and remember to change the
                             AND (
                                       query_sql LIKE '%insert%in_log%' -- SQL statement operation at the beginning of the transaction
                                   OR query_sql LIKE '%update%in_log%') -- the SQL statement that ends the transaction.
                          ) b
                          ON a.c_rq_time >= b.trans_start_time AND a.c_rq_time <= b.trans_end_time AND a.c_batch_id = 6)


5. Concluding remarks

The extract_sqlaudit_proc process helps by pulling the gv$ob_sql_audit data into the xz_sql_audit table, but it's up to you to understand the gv$ob_sql_audit table and your own SQL skills to analyze the data for the project staff.
SQL skills are required to write SQL for analyzing the data needed for different scenarios, rather than being limited to analyzing in one or two scenarios.
Another point, you have to write what SQL analysis, just add the appropriate index to xz_sql_audit, can accelerate the efficiency of the analysis of the sql, machine resources enough case, analyze the sql more concurrently, faster than gv$ob_sql_audit, greatly improving the efficiency of the work.
I'm still messing around with obproxy's log analysis script, passing the start and end times of the xz_sql_audit table data into a script to form structured data from obproxy's logs into the obproxy_audit table, and then later on, I can write a sql to analyze how much of a business transaction, the server layer, and the proxy layer's consumption are used respectively. time. 🤞🤞🤞🤞🤞🤞🤞