Analyzing and Locating Python Application Performance Bottlenecks with cProfile

一、需求背景

During the performance pressure test，A performance bottleneck was found on an interface，Expect tools to locate this bottleneck，It is better to locate the specific slow method。

II. Introduction to cProfile

cProfile is a module in the Python standard library for performance analysis of Python programs. It outputs detailed information such as the number of times each function is called and how long it takes to execute, which can help developers identify slow-running methods in their programs for performance optimization, and is suitable as a solution for the above requirements.

In addition, Python has a built-in profile module implemented in pure Python, which has the same functionality as cProfile, except that cProfile is written in C, which has higher performance and lower overhead, and is suitable for performance-sensitive environments such as online production environments. profile is a pure Python module, which has a relatively higher performance overhead, but is easy to understand and modify because it is written in Python. profile is a pure Python implementation with a relatively higher performance overhead, but because it is written in Python, it is easy to understand and modify, and is suitable for learning.

III. Methods of use

cProfile supports three ways to use it: hard-coded in code; loaded when the Python application starts; or run through an IDE (PyCharm). Method 3 is recommended for development environments because it is easy to use and results are graphically rich; method 2 is recommended for production environments because it is non-intrusive to the code.

1. Hard-coded in code

Sample code:

import cProfile

def my_function():
    # Some code to profile
    pass

profiler = ()
()
my_function()
()
profiler.print_stats()

Implementation results:

2 function calls in 0.000 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.000    0.000    0.000    0.000 :3(my_function)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Result Field Description:

ncalls: number of function calls.

totime: the total time spent in the function, excluding the time spent calling subfunctions.

percall: totime divided by ncalls.

cumtime: the total time spent in this function and all its subfunctions.

percall: cumtime divided by the number of raw calls.

filename:lineno(function): the name of the file where the function is located, the line number and the function name.

2. Load the cProfile module on Python application startup

Sample code:

python -m cProfile my_script.py # Method 1: Output the result to the console.
python -m cProfile -o my_script.py # Method 2: Save the result to the specified prof file.

The snakeviz plugin can be used (installation method ispip install snakevizAnalyze the prof file. Execute thesnakeviz After that, the result is mounted in the web container, which can be accessed via URLs such ashttp://127.0.0.1:8080/snakeviz/) Access.

3. Running via IDE (PyCharm)

Usage:

The application is started from the menu [run] > [profile 'app'] (where app is the name of the application, henceforth). After the application has finished executing and stopped, the relevant panel will output the call statistics and call chain.

Call statistics:

The header "Name" indicates the module or function being called; "Call Count" indicates the number of times it has been called; "Time (ms)" indicates the time spent and the percentage, the unit of time is milliseconds. The unit of time is milliseconds.

Click on a table header column name to sort that column.

In the call statistics, select the cell in the "name" column, right-click and select "Navigate to Source" or "Show on Call Graph "to open the source code or the corresponding call chain and location.

Call Chaining:

In addition, launching the program via the menu [run] > [Concurrency Diagram 'app'] allows you to see the threads and asynchronous concurrency (Asyncio) calls as shown below:

IV. Relevant configuration items

1. cProfile

[root@test bin]# python3 -m cProfile -h
Usage: [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ...

Options: -h, --help show this
  -h, --help show this help message and exit
  -o OUTFILE, --outfile=OUTFILE
                        Save stats to <outfile> # Output analysis results to a specified file.
  -s SORT, --sort=SORT Sort order when printing to stdout, based on class # Specify how to sort the output results. Can be sorted based on different fields, such as time, cumulative, calls, etc.
  -m Profile a library module # Analyze a module, not a script file.

2. snakeviz

[root@test bin]# snakeviz --help
usage: snakeviz [-h] [-v] [-H ADDR] [-p PORT] [-b BROWSER_PATH] [-s] filename

Start SnakeViz to view a Python profile.

positional arguments:
  filename              Python profile to view

options:
  -h, --help            show this help message and exit
  -v, --version         show program \`s version number and exit
  -H ADDR, --hostname ADDR hostname to bind to (default: 127.0.0.1) # 用于指定绑定的主机名，默认值为 127.0.0.1，即本地主机。
  -p PORT, --port PORT  port to bind to; if this port is already in use a free port will be selected automatically (default: 8080) # 用于指定绑定的端口。如果指定的端口已被占用，程序将自动选择一个空闲端口。默认值为 8080。
  -b BROWSER_PATH, --browser BROWSER_PATH  name of webbrowser to launch as described in the documentation of Python\'s webbrowser module: /3/library/ # 按照 Python 的 webbrowser 模块的文档描述，指定要启动的浏览器名称。用户可以通过指定浏览器的路径来控制使用哪个浏览器打开应用。
  -s, --server          start SnakeViz in server-only mode--no attempt will be made to open a browser # 仅在服务器模式下启动 SnakeViz，Does not try to open the browser in the server。Useful for non-graphical or non-browser based servers。

V. Examples of use in production environments

The production environment is CentOS 7.9.2009 (Core) with kernel 5.15.81, running in 4-core, 4G containers.DB-GPT controller subsystem.

1. Steps for use

(1) Execute the script:/usr/local/bin/python3.10 -m cProfile -o /usr/local/bin/dbgpt start controller &The cProfile module is loaded at Python application startup.

(2) Perform relevant interface pressure testing.

(3) Stop the application normally and generate a performance analysis result file ().Attention:Performance analysis results can only be output after the program is stopped normally. There are two conventional approaches: one, background daemon processes, which can be analyzed using thekill -2 {application PID}The second is the foreground process, which is exited via Ctrl + C.

(4) Usesnakeviz -H 0.0.0.0 -s Analyze the result file (where -s runs only in server-side mode, and does not try to open the server browser, which normally does not come with the server; -H 0.0.0.0, which supports listening to all interfaces of the NIC), and after successful execution, outputs an accessible URL address, which can be opened by an external or local browser.

2. Analysis of results

Use an external or local browser to access the URL address generated by snakeviz. The result is as follows:

Result Description:

(1) The result consists of two parts, i.e., the above figure and the following table. The upper figure shows the calling relationship, time consumption and percentage of the selected methods and their sub-methods; the lower table shows all the methods and their total number of calls (ncalls), the total time consumed by the method itself (totime), the average time consumed by the method itself (percall), the total time consumed by the method and its sub-methods (cumtime), the average time consumed by the method and its sub-methods (percall), as well as the location of the method and its row-column number. file location and its line number.

Instructions for use:

(1) Any column of the table to support lifting the order of operations, select any line, the graphic at the top of the page will automatically display the method and its sub-methods of the call relationship, time-consuming and accounted for.

(2) Click any sub-module in the graph to view the calling relationship, time consumption and percentage of the method where the sub-module is located and its sub-methods.

Analyze the recommendations:

(1) Select the cumtime column in descending order, select the entry code, look at it step by step, and analyze the bottleneck point.

(2) Use Sunburst graphic to show the percentage of time spent by each method.

3. Evaluating the performance impact of loading cProfile

We used Jmeter to evaluate the performance of Python applications without and with the cProfile module loaded, to determine the extent to which loading cProfile in a production environment affects performance. The results are as follows:

configure	Jmeter Pressure Measurement Thread Count	CPU utilization rate	throughput	Average response time
CASE1 cProfile not loaded by an application	20	Close to single-core 100%	527	36ms
CASE2 After an application loads cProfile	20	Close to single-core 100%	395	49ms

From the above table, we can see that after loading cProfile, the application throughput decreases by 25% and the average response time increases by 13ms, which has some impact on performance.

V. Problems encountered

1. kill -15 {application PID} Unable to generate performance analysis result file

Since cProfile only supports listening to interrupt (SIGINT) signals, a performance analysis result file cannot be generated when kill 15 sends a SIGTERM signal.

Workaround: use kill -2 {apply PID}.

VI. Summary of use

(1) cProfile generates detailed performance distributions and call chains, making it ideal as a tool for analyzing and locating performance bottlenecks in Python applications.

(2) Because generating performance analysis results requires stopping the application and has a large performance loss (25% reduction in throughput), it is generally not recommended to use it directly in the production environment. However, you can use traffic replication to copy the traffic from the generation environment to the test or pre-production environment, which can locate the actual performance bottlenecks without affecting the online business.