一、需求背景
During the performance pressure test,A performance bottleneck was found on an interface,Expect tools to locate this bottleneck,It is better to locate the specific slow method。
II. Introduction to cProfile
cProfile is a module in the Python standard library for performance analysis of Python programs. It outputs detailed information such as the number of times each function is called and how long it takes to execute, which can help developers identify slow-running methods in their programs for performance optimization, and is suitable as a solution for the above requirements.
In addition, Python has a built-in profile module implemented in pure Python, which has the same functionality as cProfile, except that cProfile is written in C, which has higher performance and lower overhead, and is suitable for performance-sensitive environments such as online production environments. profile is a pure Python module, which has a relatively higher performance overhead, but is easy to understand and modify because it is written in Python. profile is a pure Python implementation with a relatively higher performance overhead, but because it is written in Python, it is easy to understand and modify, and is suitable for learning.
III. Methods of use
cProfile supports three ways to use it: hard-coded in code; loaded when the Python application starts; or run through an IDE (PyCharm). Method 3 is recommended for development environments because it is easy to use and results are graphically rich; method 2 is recommended for production environments because it is non-intrusive to the code.
1. Hard-coded in code
Sample code:
import cProfile
def my_function():
# Some code to profile
pass
profiler = ()
()
my_function()
()
profiler.print_stats()
Implementation results:
2 function calls in 0.000 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :3(my_function)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Result Field Description:
ncalls: number of function calls.
totime: the total time spent in the function, excluding the time spent calling subfunctions.
percall: totime divided by ncalls.
cumtime: the total time spent in this function and all its subfunctions.
percall: cumtime divided by the number of raw calls.
filename:lineno(function): the name of the file where the function is located, the line number and the function name.
2. Load the cProfile module on Python application startup
Sample code:
python -m cProfile my_script.py # Method 1: Output the result to the console.
python -m cProfile -o my_script.py # Method 2: Save the result to the specified prof file.
The snakeviz plugin can be used (installation method ispip install snakeviz
Analyze the prof file. Execute thesnakeviz
After that, the result is mounted in the web container, which can be accessed via URLs such ashttp://127.0.0.1:8080/snakeviz/
) Access.
3. Running via IDE (PyCharm)
Usage:
The application is started from the menu [run] > [profile 'app'] (where app is the name of the application, henceforth). After the application has finished executing and stopped, the relevant panel will output the call statistics and call chain.
Call statistics:
The header "Name" indicates the module or function being called; "Call Count" indicates the number of times it has been called; "Time (ms)" indicates the time spent and the percentage, the unit of time is milliseconds. The unit of time is milliseconds.
Click on a table header column name to sort that column.
In the call statistics, select the cell in the "name" column, right-click and select "Navigate to Source" or "Show on Call Graph "to open the source code or the corresponding call chain and location.
Call Chaining:
In addition, launching the program via the menu [run] > [Concurrency Diagram 'app'] allows you to see the threads and asynchronous concurrency (Asyncio) calls as shown below:
IV. Relevant configuration items
1. cProfile
[root@test bin]# python3 -m cProfile -h
Usage: [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ...
Options: -h, --help show this
-h, --help show this help message and exit
-o OUTFILE, --outfile=OUTFILE
Save stats to <outfile> # Output analysis results to a specified file.
-s SORT, --sort=SORT Sort order when printing to stdout, based on class # Specify how to sort the output results. Can be sorted based on different fields, such as time, cumulative, calls, etc.
-m Profile a library module # Analyze a module, not a script file.
2. snakeviz
[root@test bin]# snakeviz --help
usage: snakeviz [-h] [-v] [-H ADDR] [-p PORT] [-b BROWSER_PATH] [-s] filename
Start SnakeViz to view a Python profile.
positional arguments:
filename Python profile to view
options:
-h, --help show this help message and exit
-v, --version show program \`s version number and exit
-H ADDR, --hostname ADDR hostname to bind to (default: 127.0.0.1) # 用于指定绑定的主机名,默认值为 127.0.0.1,即本地主机。
-p PORT, --port PORT port to bind to; if this port is already in use a free port will be selected automatically (default: 8080) # 用于指定绑定的端口。如果指定的端口已被占用,程序将自动选择一个空闲端口。默认值为 8080。
-b BROWSER_PATH, --browser BROWSER_PATH name of webbrowser to launch as described in the documentation of Python\'s webbrowser module: /3/library/ # 按照 Python 的 webbrowser 模块的文档描述,指定要启动的浏览器名称。用户可以通过指定浏览器的路径来控制使用哪个浏览器打开应用。
-s, --server start SnakeViz in server-only mode--no attempt will be made to open a browser # 仅在服务器模式下启动 SnakeViz,Does not try to open the browser in the server。Useful for non-graphical or non-browser based servers。
V. Examples of use in production environments
The production environment is CentOS 7.9.2009 (Core) with kernel 5.15.81, running in 4-core, 4G containers.DB-GPT controller subsystem.
1. Steps for use
(1) Execute the script:/usr/local/bin/python3.10 -m cProfile -o /usr/local/bin/dbgpt start controller &
The cProfile module is loaded at Python application startup.
(2) Perform relevant interface pressure testing.
(3) Stop the application normally and generate a performance analysis result file ().Attention:Performance analysis results can only be output after the program is stopped normally. There are two conventional approaches: one, background daemon processes, which can be analyzed using thekill -2 {application PID}
The second is the foreground process, which is exited via Ctrl + C.
(4) Usesnakeviz -H 0.0.0.0 -s
Analyze the result file (where -s runs only in server-side mode, and does not try to open the server browser, which normally does not come with the server; -H 0.0.0.0, which supports listening to all interfaces of the NIC), and after successful execution, outputs an accessible URL address, which can be opened by an external or local browser.
2. Analysis of results
Use an external or local browser to access the URL address generated by snakeviz. The result is as follows:
Result Description:
(1) The result consists of two parts, i.e., the above figure and the following table. The upper figure shows the calling relationship, time consumption and percentage of the selected methods and their sub-methods; the lower table shows all the methods and their total number of calls (ncalls), the total time consumed by the method itself (totime), the average time consumed by the method itself (percall), the total time consumed by the method and its sub-methods (cumtime), the average time consumed by the method and its sub-methods (percall), as well as the location of the method and its row-column number. file location and its line number.
Instructions for use:
(1) Any column of the table to support lifting the order of operations, select any line, the graphic at the top of the page will automatically display the method and its sub-methods of the call relationship, time-consuming and accounted for.
(2) Click any sub-module in the graph to view the calling relationship, time consumption and percentage of the method where the sub-module is located and its sub-methods.
Analyze the recommendations:
(1) Select the cumtime column in descending order, select the entry code, look at it step by step, and analyze the bottleneck point.
(2) Use Sunburst graphic to show the percentage of time spent by each method.
3. Evaluating the performance impact of loading cProfile
We used Jmeter to evaluate the performance of Python applications without and with the cProfile module loaded, to determine the extent to which loading cProfile in a production environment affects performance. The results are as follows:
configure | Jmeter Pressure Measurement Thread Count | CPU utilization rate | throughput | Average response time |
---|---|---|---|---|
CASE1 cProfile not loaded by an application | 20 | Close to single-core 100% | 527 | 36ms |
CASE2 After an application loads cProfile | 20 | Close to single-core 100% | 395 | 49ms |
From the above table, we can see that after loading cProfile, the application throughput decreases by 25% and the average response time increases by 13ms, which has some impact on performance.
V. Problems encountered
1. kill -15 {application PID} Unable to generate performance analysis result file
Since cProfile only supports listening to interrupt (SIGINT) signals, a performance analysis result file cannot be generated when kill 15 sends a SIGTERM signal.
Workaround: use kill -2 {apply PID}.
VI. Summary of use
(1) cProfile generates detailed performance distributions and call chains, making it ideal as a tool for analyzing and locating performance bottlenecks in Python applications.
(2) Because generating performance analysis results requires stopping the application and has a large performance loss (25% reduction in throughput), it is generally not recommended to use it directly in the production environment. However, you can use traffic replication to copy the traffic from the generation environment to the test or pre-production environment, which can locate the actual performance bottlenecks without affecting the online business.