Author:SkyXZ
CSDN:SkyXZ~-CSDN Blog
Blog Park:SkyXZ - Blog Park
Host environment: WSL2-Ubuntu22.04+Cuda12.6, D-Robotics-OE 1.2.8, Ubuntu20.04 GPU Docker
End-side device environment: RDK X5-Server-3.1.0
After buying the RDK X5, I only stayed in the use of the Raspberry Pai? Want to deploy deep learning but don’t know where to start with BPU? I finally found the OE delivery package and Model Zoo, but I don’t know what they do? I know you are in a hurry, but don't be in a hurry yet! Follow this tutorial to learn how to quantitatively deploy models at a glance and it will take you 30 minutes to bid farewell to the newbies in RDK model quantitative deployment! ! ! First, let’s refer to the materials and documents for this tutorial:
- Sweet potato robot RDK user manual:1. Quick Start | RDK DOC
- Sweet potato X5 algorithm tool chain:Digua X5 algorithm tool chain version released
- Sweet potato RDK_ModelZoo introduction manual:4.3.1 ModelZoo Overview | RDK DOC
- Digua RDK_ModelZoo warehouse address:/D-Robotics/rdk_model_zoo
1. Introduction to algorithm tool chain and environment installation
Currently, the models we train on GPUs usually use floating-point number format, because the floating-point type can provide higher calculation accuracy and flexibility. However, for edge devices, the computing power and storage resources required for floating-point type models are far away. exceeds its carrying capacity, so a AI acceleration chips on general edge devices basically only support INT8 (common precision for industry processors) fixed-point models, and our X5 BPU is no exception, so we need to convert our trained floating-point models into fixed-point models. This process is called quantification of the model, andDigua Robot officially developed a set of D-Robotics algorithm tool chain based on the D-Robotics processor.Floating-point models can be quantized into fixed-point models conveniently and quickly, and quickly deployed on D-Robotics processors! ! !Below we introduce how to install the algorithm tool chain:
Since the D-Robotics algorithm toolchain can only run in the Linux environment for the time being, everyone must first ensure that their development machine meets the following requirements and has WSL2-Ubuntu installed (for details, please refer to:Say goodbye to virtual machines! WSL2 installation and configuration tutorial! ! ! - SkyXZ - Blog Park) or Ubuntu in a virtual machine. Since the official has provided us with the docker image of the tool chain, the system version of Ubuntu is not very important.
(1) Install Docker and NVIDIA Container Toolkit
?Get Docker | Docker Docs) and NVIDIA Container Toolkit (Official requirements of Digua are 1.13.1-1.13.5. For installation details, see:Installing the NVIDIA Container Toolkit — NVIDIA Container Toolkit 1.17.3 documentation), then I will take you through this process from the beginning. The first is to install Docker. We first uninstall the docker installed by default on the system and install some necessary support:
#If there is one, delete it. If an error message says there is no one, then it doesn’t matter. Don’t worry about it.
sudo apt-get remove docker docker-engine containerd runc
#Download necessary dependencies
sudo apt install apt-transport-https ca-certificates curl software-properties-common gnupg lsb-release
By default, we will not use proxies, so all our sources use domestic sources. After we add Ali’s GPG KEY and Ali’s APT sources, we can directly APT install the latest version of Docker.
# step 1 Add Alibaba GPG Key
curl -fsSL /docker-ce/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/
# step 2 Add Alibaba Docker APT source
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/] /docker-ce/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt // > /dev/null
# step 3 Update
sudo apt update
sudo apt-get update
# step 4 Download Docker
sudo apt install docker-ce docker-ce-cli
# step 5 Verify Docker installation
sudo docker version #View Docker version
sudo systemctl status docker #Verify Docker running status
If it is verified that the Docker installation has output and is running normally, it means that our Docker installation is complete. Then we add users without root permissions to the Docker user group, so that we can allow the current user to operate without root or without adding Use the docker command normally under sudo:
sudo groupadd docker
sudo gpasswd -a ${USER} docker
sudo service docker restart
But it’s not over yet, because there is a high probability that everyone will rundocker run hello-world
The following network error will always be reported:
This is because the Docker source image cannot be directly accessed in China for the time being. We need to use third-party Docker sources. I have sorted out some common Docker sources for you here. You only need to add them./etc/docker/
Just file:
# step 1 Create or edit /etc/docker/
sudo nano /etc/docker/
# step 2 Copy and paste into the file
{
"registry-mirrors": [
"",
"",
"",
"",
".",
"",
"",
"",
"/ustclug/mirrorrequest",
""
]
}
# step 3 Reload the configuration file and restart docker
sudo systemctl daemon-reload
sudo systemctl restart docker
# step 4 View the Docker configuration to check whether the configuration is successful
sudo docker info
You can see it is runningdocker info
After the command, the terminal outputs the docker source address we added before. At this time, we run it again.docker run hello-world
You can see that docker successfully downloaded the corresponding image and printed it out.“Hello from Docker!”
After installing docker, let’s install itNVIDIA Container Toolkit (If your computer does not have a GPU or is using a virtual machine such as a VM, you can skip this step. Since you cannot access the GPU, you do not need to install it in this step.), this tool chain component is a set of tools provided by Nvidia. After installation, we can use GPU in Docker and support GPU acceleration. Since Nvidia's documentation is very detailed, we follow the steps in Nvidia's documentation. Installation configuration
Similar to the previous Docker, we need to add the official source of Nvidia. After adding it, we can directly use APT to install it.
# step 1 Configure production repository
curl -fsSL /libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/ \
&& curl -s -L /libnvidia-container/stable/deb/ | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/] https://#g' | \
sudo tee /etc/apt//
# step 2 Update
sudo apt-get update
# step 3 Install using APT
sudo apt-get install -y nvidia-container-toolkit #If there is no agent, this part will take longer
Then we started to configure NVIDIA Container Runtime for Docker. This part is very simple only requires two lines of commands:
sudo nvidia-ctk runtime configure --runtime=docker #Use the nvidia-ctk command to modify the /etc/docker/ file
sudo systemctl restart docker #Restart the Docker daemon process
Finally, enter the following command to verify whether our configuration is successful. If the following picture appears, it means that the Nvidia Container Toolkit installation is complete! ! !
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
(2) Configure and use the D-Robotics algorithm tool chain
Okay, if there are no problems after completing the above process, it means that we have now completed all the pre-configuration! Then we can start configuring our algorithm tool chain. First, we download the OE delivery package of RDK (the latest version as of the release of the article is V1.2.8) and the corresponding Docker image
# Download OE-v1.2.8 delivery package
wget -c ftp://x5ftp@/OpenExplorer/v1.2.8_release/horizon_x5_open_explorer_v1.2.8-py310_20240926. --ftp-password=x5ftp@123$%
# Choose the following CPU or GPU version of the Docker image to download, just choose one of the two.
#Ubuntu20.04 CPU Docker image
wget -c ftp://x5ftp@/OpenExplorer/v1.2.8_release/docker_openexplorer_ubuntu_20_x5_cpu_v1.2. --ftp-password=x5ftp@123$%
#Ubuntu20.04 GPU Docker image
wget -c ftp://x5ftp@/OpenExplorer/v1.2.8_release/docker_openexplorer_ubuntu_20_x5_gpu_v1.2. --ftp-password=x5ftp@123$%
# HorizonRising Sun 5 Algorithm Tool Chain User Development Documents (Download on Demand)
wget -c ftp://x5ftp@/OpenExplorer/v1.2.8_release/x5_doc-v1.2. --ftp-password=x5ftp@123$%
#Checksum (download and use on demand)
wget -c ftp://x5ftp@/OpenExplorer/v1.2.8_release/ --ftp-password=x5ftp@123$%
Since the docker system file is large, we need to wait for a while and enter it after downloadingls
You can see two files
We enter the following command to decompress:
tar -xvfhorizon_x5_open_explorer_v1.2.8-py310_20240926. #Decompress the OE delivery package
After decompression is complete, we enter the OE package. We can see that the structure of our OE package is as follows, divided into two large folderspackage
andsamples
,package
It mainly contains the development environment of the board side of the RDK series and the host side. Since we use the Docker image, we can leave this folder alone and let's mainly take a look.samples
Baoba,samples
The following is divided into three folders, the third of whichmodel zoo
It's in the second folderai_toolchain/model_zoo
Soft link to the first folderai_benchmark
Yes Digua officially provides an AI benchmark test sample package for evaluating the performance and accuracy of classification, detection and segmentation models. It supports single frame delay and dual-core multi-thread scheduling performance evaluation. Through this package we can evaluate whether the model meets performance requirements and verify quantification. The final model accuracy, but generally speaking, if we use the official version of the Yolo series without fine-tuning, we don’t need to pay too much attention to this part.
Then let’s look at our highlightai_toolchain
The model tool chain, through the following structure diagram, we can see that it mainly consists of four parts, namely model quantification conversion examples, model training examples and our model running examples.We will introduce its specific usage in Section 3.
After reading the OE delivery package, we started to import the Docker image. Since this docker image depends on the OE package to run, we need to set the docker mapping path, and then we can import the docker image from the tar package:
#Everyone can modify it according to their own path
export version=v1.2.8
export ai_toolchain_package_path=/path/OE/horizon_x5_open_explorer_v1.2.8-py310_20240926#Please modify the path yourself
export dataset_path=/path/OE/dataset #Please modify the path yourself. If there is no dataset, please create it yourself.
#Import image
docker load < docker_openexplorer_ubuntu_20_x5_gpu_v1.2.
Since our image is relatively large, it will take a long time to import. Just wait and see. Then we enter the following command to start the docker image.
sudo docker run -it --rm --gpus all --shm-size=15g -v "$ai_toolchain_package_path":/open_explorer -v "$dataset_path":/data openexplorer/ai_toolchain_ubuntu_20_x5_gpu:v1.2.8-py310
Then enter the command in the imagehb_mapper
The following printout means that the installation of our environment is complete~~
Small Tips:You can~/.bashrc
Use alias to add the following line, and then you can enter it directly in the terminalRDK_Ai_Toolchain
Open the tool chain, so you don’t have to remember such long instructions.
alias RDK_Ai_Toolchain="sudo docker run -it --rm --gpus all --shm-size=15g -v "$ai_toolchain_package_path":/open_explorer -v "$dataset_path":/data openexplorer/ai_toolchain_ubuntu_20_x5_gpu:v1.2.8-py310"
At this point, our sweet potato toolchain environment has been fully installed and configured! ! !
2. Introduction to Model Zoo
I think that for students who have just gotten the RDK board, we cannot bypass the newly launched Model Zoo of Digua Robot and directly learn the RDK algorithm tool chain. Therefore, our X5 model quantification conversion deployment tutorial will start with the Model Zoo. . Model Zoo, as its name implies, literally we can know that this is a"Model Zoo", this is an open source community algorithm case warehouse maintained by the Digua developer community.According to the official explanationThis warehouse contains various heterogeneous sweet potato models (such as Yolo series, FCOS, ResNet, PaddleOCR, etc.) that can be deployed directly on the board and are suitable for a variety of scenarios and have strong versatility, including but not limited to image classification, target detection, Carefully selected and optimized in fields such as semantic segmentation and natural language processing, it has efficient performance andalreadyA series of .bin models that can be run directly after quantization conversion, and C++/Python and Jupyter running examples are also provided for users.
So how do we use this warehouse? We first pull Model Zoo from Github. We can see the project structure of Model Zoo as shown in the figure:
git clone /D-Robotics/rdk_model_zoo #Pull Model Zoo
There are Chinese and English bilingual README and README image resource folders under the main folder.resource
, and our most importantdemo
folder, which contains all officially supported models according to the target detectiondetect
, target classificationclassification
, key point detectionPose
etc. are classified, we usedetect
Open the target detection model as an example and you can see that there are many officially supported model series. If we open the Yolov5 folder again, we can see that there are official C++/Jupyter deployment routines as well as officially converted model files and model quantification. ptq configuration file
I believe that everyone should have a basic understanding of Model Zoo by now. Next, we will use Yolov5-V2.0 as an example to introduce how to convert the model.
3. Model Quantification Example Tutorial
Next, we officially entered the use of the tool chain. We take the official version of YOLOV5-V2.0 as an example to bring you some of the concepts while completing the model transformation. This process will be based onrdk_model_zoo/demos/detect/YOLOv5/README_cn.md
The official document description in the Sweet Potato Model Zoo is introduced. First, we pull the official source code of Yolov5-V2.0 and download the official model weights:
git clone /ultralytics/ #Clone warehouse
cd yolov5 #Enter the warehouse
git checkout v2.0 #Switch branch
git branch #Check, if: * (HEAD detached at v2.0) appears, it means the branch switch is completed
#I use the official 80 category weight for demonstration. If you have a trained model, you don’t need to perform this step. Just use your own model.
wget /ultralytics/yolov5/releases/download/v2.0/ -O yolov5s_tag2. #Download official model weights
Since our BPU needs to use 4-dimensional NHWC output, that is(batch_size, height, width, channels)
, and the Yolov5 source code uses the PyTorch framework, so its output is NCHW, that is(batch_size, channels, height, width)
, so we need to modify the output part of the model so that the .pt file we trained has the correct output format when exported to an ONNX file. We first open the yolov5/models/ file and locate it at about 22 lines. Since We only need to modify the output header when exporting the model to ONNX and keep it as it is during training.Therefore, it is recommended that you do not delete the original code and choose to use comments as shown in my picture., we can modify it with the following code:
def forward(self, x):
return [[i](x[i]).permute(0,2,3,1).contiguous() for i in range()]
Then we use the model export tool provided by Yolo officialWe first copy this file from yolov5/models/
cp ./models/ .
Then we enter this file. Since we only need to export the ONNX model, we delete the 32 lines that export TorchScript and the 60 lines that export CoreML, leaving only the part that exports ONNX. At the same time, we add opset to the part that exports ONNX. Choose the version and add an onnx simplify program to do some graph optimization and constant folding operations
PS: Each ONNX operation (such as convolution, activation, matrix multiplication, etc.) has a specific version, and the opset version refers to the operator version supported in ONNX we are currently using, and our RDK series currently only supports Opset10 and Opset11, so we need to specify to use version 11
try:
import onnx
from onnxsim import simplify
print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
f = ('.pt', '.onnx') # filename
() # only for ONNX
(model, img, f, verbose=False, opset_version=11, input_names=['images'],
output_names=['small', 'medium', 'big'])
# Checks
onnx_model = (f) # load onnx model
.check_model(onnx_model) # check onnx model
print(.printable_graph(onnx_model.graph)) # print a human readable model
# simplify
onnx_model, check = simplify(
onnx_model,
dynamic_input_shape=False,
input_shapes=None)
assert check, 'assert check failed'
(onnx_model, f)
print('ONNX export success, saved as %s' % f)
except Exception as e:
print('ONNX export failure: %s' % e)
If there are students who find it troublesome to modify, you can directly copy the code copy assignment that I have modified below and replace the original content:
import argparse
from import *
from utils import google_utils
if __name__ == '__main__':
parser = ()
parser.add_argument('--weights', type=str, default='./', help='weights path')
parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size')
parser.add_argument('--batch-size', type=int, default=1, help='batch size')
opt = parser.parse_args()
opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand
print(opt)
img = ((opt.batch_size, 3, *opt.img_size)) # image size(1,3,320,192) iDetection
google_utils.attempt_download()
model = (, map_location=('cpu'))['model'].float()
()
[-1].export = True # set Detect() layer export=True
y = model(img) # dry run
try:
import onnx
from onnxsim import simplify
print('\nStarting ONNX export with onnx %s...' % onnx.__version__)
f = ('.pt', '.onnx') # filename
() # only for ONNX
(model, img, f, verbose=False, opset_version=11, input_names=['images'],
output_names=['small', 'medium', 'big'])
# Checks
onnx_model = (f) # load onnx model
.check_model(onnx_model) # check onnx model
print(.printable_graph(onnx_model.graph)) # print a human readable model
# simplify
onnx_model, check = simplify(
onnx_model,
dynamic_input_shape=False,
input_shapes=None)
assert check, 'assert check failed'
(onnx_model, f)
print('ONNX export success, saved as %s' % f)
except Exception as e:
print('ONNX export failure: %s' % e)
After completing these operations, we can export the .pt model we trained as an ONNX model.(By default, everyone has configured the Conda environment of Yolov5)
python3 --weights ./
Then we can start to quantify the model! We add the exported .onnx model into our OE package
cp ./ /path/to/OE # Everyone can modify the copied path according to their own configuration.
Tips: In order to standardize file management, I created a new Model folder in the OE package to uniformly manage my own model projects. I recommend that everyone adopt this method.
Follow usStart the officially provided algorithm tool chain docker imageFirst, check our ONNX. Here we need to use an official command from Diguahb_mapper checker
Its specific usage is as follows:
hb_mapper checker --model-type ${model_type} \
--march ${march} \
--proto ${proto} \
--model ${caffe_model/onnx_model} \
--input-shape ${input_node} ${input_shape} \
--output ${output}
# --model-type is used to specify the model type for checking the input. Currently, only caffe or onnx is supported.
# --march is used to specify the D-Robotics processor type that needs to be adapted. The available values are bernoulli2 and bayes;
# RDK X3 is set to bernoulli2, RDK Ultra is set to bayes, RDK X5 is set to bayes-e
# --proto This parameter is only valid when model-type specifies caffe, and the value is the prototxt file name of the Caffe model.
# --model When model-type is specified as caffe, the value is the caffemodel file name of the Caffe model.
# When model-type is specified as onnx, the value is the name of the ONNX model file
# --input-shape optional parameter, explicitly specify the input shape of the model
# The value is {input_name} {NxHxWxC/NxCxHxW}, and input_name and shape are separated by spaces.
# For example, the model input name is data1 and the input shape is [1,224,224,3], then the configuration should be --input-shape data1 1x224x224x3
# If the shape configured here is inconsistent with the shape information in the model, the configuration here shall prevail.
According to the official introduction to this command, we enter the following command to check our model. The system will have a long output. At the same time, we can also find from the output that the BPU of X5 supports all operators of Yolov5-2.0, that is It is said that all calculations of the model can be performed on the X5 BPU.
#Modify --model parameters according to your own model path
hb_mapper checker --model-type onnx --march bayes-e --model /path/to/model
If there are no problems with this step, then we can start converting the model. The algorithm tool chain of Digua uses the PTQ solution. The same algorithm tool chain of Digua also provides us with a similar command. Use this command The conversion from the floating-point model to the D-Robotics hybrid heterogeneous model will be automatically completed. After this stage, a model that can be run on the D-Robotics processor will be obtained. Let's take a look at the official command analysis first:
PS: PTQ (Post-Training Quantization) is a technology that converts an already trained model into a low-precision (such as 8-bit integer) representation to reduce the storage and computing overhead of the model without retraining. In the case of models, quantizing the model after training is used to speed up the inference process and reduce the model size while trying to maintain its performance.
# Disable fast-perf mode
hb_mapper makertbin --config ${config_file} \
--model-type ${model_type}
# Enable fast-perf mode
hb_mapper makertbin --fast-perf --model ${caffe_model/onnx_model} --model-type ${model_type} \
--proto ${caffe_proto} \
--march ${march}
# --help Display help information and exit
# -c, --config The configuration file for model compilation is in yaml format, and the file name uses the .yaml suffix.
# --model-type is used to specify the model type of conversion input. Currently, it supports setting caffe or onnx.
# --fast-perf Turn on the fast-perf mode. After this mode is turned on, a bin model with the highest performance that can run on the board side will be generated during the conversion process.
# If fast-perf mode is enabled, the following configuration is required
# --model Caffe or ONNX floating point model file
# --proto is used to specify the Caffe model prototxt file
# --march BPU microarchitecture, if using RDK X3, set it to bernoulli2, if using RDK Ultra, set it to bayes, if using RDK X5, set it to bayes-e
We see that this command requires us to provide a configuration file for model compilation. In this configuration file, we need to configure parameters related to model conversion, such asNecessary parameters such as the data preprocessing method used in the original floating point model training framework, the mean value of image subtraction, image preprocessing scaling ratio, compiler related parameters, etc., if you are using the model series in the Digua Model Zoo, Digua official has provided you with a PTQ configuration file that you can use directly, which is stored in the specific folder of each model.Generally speaking, we only need to modify it according to our own environment and board-side equipment.onnx_model model location
、march architecture
as well ascal_data_dir
Just verify the address of the set
But at this time, some friends will come and ask:oops! What should I do if the model I use is not available in Model Zoo? How do I write these parameters myself?Don’t worry, Digua official has also prepared PTQ template files for different devices and models (Caffe, ONNX) for everyone.8.5 Algorithm tool chain class | RDK DOCIn the last part of the linked document, there are model quantification yaml file templates for RDK X3, RDK X5 and RDK Ultra. Friends who need it can take it by themselves. At this time, friends will ask again:Huh! ? So what are the parameters in this YAML file used for? How should I configure it?, don’t be impatient! According to the official documentation of Sweet PotatoModel conversion yaml configuration parametersThis part has a very detailed introductionDetailed explanation of PTQ principles and steps | RDK DOC, but everyone should note that in the configuration file, all four parameter group positions need to exist. Specific parameters are divided into optional and required. Optional parameters do not need to be configured.
Then we continue to start the teaching of model conversion. Based on the above, we know that the Yolov5-2.0 we use is included in the official Model Zoo, so we can directly use the PTQ configuration file provided by the official to us. We first download it from the Model Zoo Copy into our OE package:
#Modify according to your own configuration. Copy YAML to docker in the OE package to access it. It is recommended to use the same path as the model.
cp /path/to/demos/detect/YOLOv5/ptq_yamls/yolov5_detect_bayese_640x640_nv12.yaml /path/to/OE
Then we modify the model path and architecture in the YAML file and change the output path according to our own needs, etc.But this time we found that we also need to prepare the validation set calibration dataIt is used for calibration in the process of converting our floating-point model to a fixed-point model. This is also simple. The calibration sample is actually what everyone uses when training the model.training set or validation set, so we only need to copy nearly 100 data sets into our OE package. At the same time, the official provides us with an option in the accompanying text parameters.preprocess_on
Used to enable automatic processing of image calibration samples. After using this parameter, the tool chain will automatically use skimage to read and automatically resize the calibration data set to the input node size.(Although this parameter is very convenient, it is still recommended to read the official user manual and the examples in the OE package and write a data processing code yourself)
After modifying and adding the content that needs to be modified according to our needs, we can start the model conversion. Enter the following command in the docker environment and wait for a while if no error is reported, which means our conversion is successful! After the model conversion is successful, aoutput
Folder, inside is the model we converted~
hb_mapper makertbin --model-type onnx --config yolov5_detect_bayese_640x640_nv12.yaml
Although our model has been converted, in order to ensure safety we still need to perform visual inspection and input and output inspection of the model. We first enter the following command on the command line, and the tool chain will automaticallyhb_perf_result
Generate a visual structure diagram of our converted Bin model file
hb_perf /path/to/model #Change to your own model path
After checking that there are no errors, we can start to check the input and output of our model. Enter the following command. The tool chain will print the basic information of the input and output of our model.
hrt_model_exec model_info --model_file /path/to/model #Modify to your own model path
At this point, if there are no problems with the model's structure, input and output, it means that our model conversion is complete! ! !
4. Model deployment application examples
Next comes the model deployment step that everyone is most concerned about and curious about! ! ! In the past, RDK only supported the C++ model deployment interface, but with the release of Python deploys inference code! ! !
- Reference manual:Model Inference Interface Description (TODO: Add C++ example) | RDK DOC
Since the official has provided us with the sample code of the corresponding model in the Model Zoo and contains detailed comments, you can take out ourRDK X5 on our development boardUse the official code examples in advance to test whether your model conversion is successful. The corresponding code is in the specific folder of the corresponding model in Model Zoo.cpp
folder
We open the insideJust modify the model path, number of categories, macro definitions of tag names, and the path of the test image.
Then run the executable file after compilation to see the recognition results
mkdir build && cd build
cmake ..
make
./main
Pay attention! The above operations are all done on the board! You can also take a look at this file to understand the model deployment process.
(1) Complete Cmake
Next, I will use the single-category Yolov5-V2.0 version model that I trained and converted before as an example to lead you to deploy C++ model reasoning from scratch. The model reasoning API of RDK is mainly divided into six categories:Model reasoning library information acquisition, model loading and release, model information acquisition, model reasoning, model memory operations, model pre-processing
These six types of APIs also represent that there should be six steps for model inference in our code. First, we create the Cmake file.
#step 1 Set project and version minimum requirements
cmake_minimum_required(VERSION 2.8)
project(rdk_yolov8_detect)
#step 2 Set C++ standards
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
#step 3 Set the compilation type
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release)
endif()
message(STATUS "Build type: ${CMAKE_BUILD_TYPE}")
#step 4 Set compilation options
set(CMAKE_CXX_FLAGS_DEBUG " -Wall -Werror -g -O0 ")
set(CMAKE_C_FLAGS_DEBUG " -Wall -Werror -g -O0 ")
set(CMAKE_CXX_FLAGS_RELEASE " -Wall -Werror -O3 ")
set(CMAKE_C_FLAGS_RELEASE " -Wall -Werror -O3 ")
# Dependency settings
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wl,-unresolved-symbols=ignore-in-shared-libs")
#step 5 Add external dependency packages
find_package(OpenCV REQUIRED)# OpenCV
#step 6 Set the RDK BPU library path
set(DNN_PATH "/usr/include/dnn") # BPU header file path
set(DNN_LIB_PATH "/usr/lib/") # BPU library file path
#step 7 Add header file path
include_directories(
${DNN_PATH}
${OpenCV_INCLUDE_DIRS}
)
#step 8 Add library file path
link_directories(
${DNN_LIB_PATH}
)
#step 9 Add source files
add_executable(main
)
#step 10 Link dependent libraries
target_link_libraries(main
${OpenCV_LIBS} # OpenCV library
dnn # RDK BPU library
pthread # thread library
rt # real-time library
dl # dynamic link library
)
#step 11 Installation target
install(TARGETS main
RUNTIME DESTINATION bin
)
(2) Complete header file import and macro definition
After writing Cmake, we can start writing our C++ code! ! ! Let's first create aThe file then imports some necessary header files:
// C/C++ Standard Libraries
#include <iostream> //Input and output streams
#include <vector> // vector container
#include <algorithm> // algorithm library
#include <chrono> // Time related functions
#include <iomanip> // Input and output format control
// Thrid Party Libraries
#include <opencv2/> // OpenCV main header file
#include <opencv2/dnn/> // OpenCV deep learning module
// RDK BPU libDNN API
#include "dnn/hb_dnn.h" //BPU basic functions
#include "dnn/hb_dnn_ext.h" // BPU extension function
#include "dnn/plugin/hb_dnn_layer.h" // BPU layer definition
#include "dnn/plugin/hb_dnn_plugin.h" // BPU plug-in
#include "dnn/hb_sys.h" // BPU system functions
Then, in order to make our code more standardized and more comprehensive, we modify the detection parameters. We use macro definitions to configure parameters such as model path, number of categories, and confidence. At the same time, we add an error checking macro definition, which allows us to Determine whether the API execution is correct when operating the API.At the same time, considering that the requirements for image display during debugging and non-debugging are different, we added two macro definitionsDETECT_MODE
、ENABLE_DRAW
Used to enable single image reasoning or real-time reasoning and whether to start drawing and display functions respectively.
// Error checking macro
#define RDK_CHECK_SUCCESS(value, errmsg) \
do \
{ \
auto ret_code = value; \
if (ret_code != 0) \
{ \
std::cout << errmsg << ", error code:" << ret_code; \
return ret_code; \
} \
} while (0);
//Default parameter definition
#define DEFAULT_MODEL_PATH "/root/Deep_Learning/YOLOv5/models/tennis_detect_640x640_bayese_.bin" //Model path
#define DEFAULT_CLASSES_NUM 1 //Model category
#define DEFAULT_NMS_THRESHOLD 0.45f //NMS threshold, default 0.45
#define DEFAULT_SCORE_THRESHOLD 0.25f // Confidence threshold, default 0.25
#define DEFAULT_NMS_TOP_K 300 //Number of first K frames selected by NMS, default 300
#define DEFAULT_FONT_SIZE 1.0f // Font size for drawing labels, default 1.0
#define DEFAULT_FONT_THICKNESS 1.0f // Font thickness of drawing labels, default 1.0
#define DEFAULT_LINE_SIZE 2.0f // Line width for drawing rectangular box, default 2.0
#define DETECT_MODE 0 //Selection of inference mode 0 for single picture, 1 for real-time detection
#define ENABLE_DRAW 0 // 1: enable drawing, 0: disable drawing
(3)BPU detection package
We encapsulate the inference code into a BPU_Detect class, which contains three main functional interfacesInit()
、Detect()
、Release()
They are used to initialize BPU and model, perform detection and release resources respectively. In order to complete these three main functions, we also created several internal tool functions.LoadModel()
、GetModelInfo()
、PreProcess()
、Inference();
、PostProcess();
、DrawResults()
as well asPrintResults()
, respectively used to load the model, obtain model information, image preprocessing, model inference, post-processing, image drawing and result formatting printing functions
class BPU_Detect {
public:
BPU_Detect(const std::string& model_path = DEFAULT_MODEL_PATH,
int classes_num = DEFAULT_CLASSES_NUM,
float nms_threshold = DEFAULT_NMS_THRESHOLD,
float score_threshold = DEFAULT_SCORE_THRESHOLD,
int nms_top_k = DEFAULT_NMS_TOP_K,
int d_mode = DETECT_MODE);
~BPU_Detect(); // Destructor
bool Init(); // Initialize BPU and model
bool Detect(const cv::Mat& input_img, cv::Mat& output_img); //Perform detection
bool Release(); // Release resources
private:
bool LoadModel(); // Load model
void GetModelInfo(); // Get model information
bool PreProcess(const cv::Mat& input_img); // Image preprocessing
bool Inference(); // Model inference
bool PostProcess(); // Post-processing
void DrawResults(cv::Mat& img); // Draw results
void PrintResults() const; // Print detection results
//Member variables (arranged according to constructor initialization order)
std::string model_path_; //Model file path
int classes_num_; // Number of categories
float nms_threshold_; // NMS threshold
float score_threshold_; // Confidence threshold
int nms_top_k_; //The maximum number of frames retained by NMS
bool is_initialized_; // Initialization status flag
float font_size_; // draw text size
float font_thickness_; // draw text thickness
float line_size_; // draw line thickness
We start by first completing our constructor and destructor. We transfer all the values of our macro definitions into the constructor and set our small, medium, and large anchors. At the same time, when we destructor free up our resources
PS:What are Anchors? In computer vision, especially object detection,Anchorsis a set of predefined bounding boxes that are used to match objects in the input image. The size, shape and position of these anchor points are usually determined before model training in order to solve the problem of different target scales. In general,AnchorIt can be regarded as a "reference box". Its function is to cover a certain area on the image in advance, and then the model will predict the location and size of the actual target based on these predefined boxes.
//Add private class member variables
class BPU_Detect {
private:
std::vector<std::string> class_names_; // Category names
std::vector<std::pair<double, double>> s_anchors_;
std::vector<std::pair<double, double>> m_anchors_;
std::vector<std::pair<double, double>> l_anchors_;
}
//Constructor implementation
BPU_Detect::BPU_Detect(const std::string& model_path,
int classes_num,
float nms_threshold,
float score_threshold,
int nms_top_k)
: model_path_(model_path),
classes_num_(classes_num),
nms_threshold_(nms_threshold),
score_threshold_(score_threshold),
nms_top_k_(nms_top_k),
is_initialized_(false),
font_size_(DEFAULT_FONT_SIZE),
font_thickness_(DEFAULT_FONT_THICKNESS),
line_size_(DEFAULT_LINE_SIZE) {
class_names_ = {CLASSES_LIST}; // Initialize class names
std::vector<float> anchors = {10.0, 13.0, 16.0, 30.0, 33.0, 23.0,
30.0, 61.0, 62.0, 45.0, 59.0, 119.0,
116.0, 90.0, 156.0, 198.0, 373.0, 326.0};//Initialize anchors
//Set small, medium, large anchors
for(int i = 0; i < 3; i++) {
s_anchors_.push_back({anchors[i*2], anchors[i*2+1]});
m_anchors_.push_back({anchors[i*2+6], anchors[i*2+7]});
l_anchors_.push_back({anchors[i*2+12], anchors[i*2+13]});
}
}
// Destructor implementation
BPU_Detect::~BPU_Detect() {
if(is_initialized_) {
Release();
}
}
(4) Complete the private LoadModel() function
We then started to implement ourLoadModel()
Model loading private class function, we can see from the official API user manual that the official provides two ways to load models, namely loading from files and loading models from memory. Comparatively speaking, these two methodsFromFiles
This function is relatively slow due to file I/O operations and the code is simple. However, because the model files are stored independently, it is more suitable for development and debugging.FromDDR
Because this function reads directly from memory, it is faster and suitable for embedded systems or scenarios that require fast loading. However, the disadvantage is that the code is more complex and is closer to the way TensorRT loads the model. The specific introduction of the two APIs is as follows:
/**
* Creates and initializes Horizon DNN Networks from file list
* @param[out] packedDNNHandle Horizon DNN handle, pointing to multiple models
* @param[in] modelFileNames path to the model file
* @param[in] modelFileCount number of model files
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNInitializeFromFiles(hbPackedDNNHandle_t *packedDNNHandle,
char const **modelFileNames,
int32_t modelFileCount);
/**
* Creates and initializes Horizon DNN Networks from memory
* @param[out] packedDNNHandle Horizon DNN handle, pointing to multiple models
* @param[in] modelData pointer to the model file
* @param[in] modelDataLengths The length of model data
* @param[in] modelDataCount The number of model data
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNInitializeFromDDR(hbPackedDNNHandle_t *packedDNNHandle,
const void **modelData,
int32_t *modelDataLengths,
int32_t modelDataCount);
We can see that both APIs pass in the model and thenhbPackedDNNHandle_t
The structure type passes out the model handle, so if we want to use this function, we first need to usehbPackedDNNHandle_t
Create a private class member variablepacked_dnn_handle_
, since these two model loading methods are relatively common, we introduce the use of the two APIs here:
We start from simpleFromFiles
API starts to be introduced,First of all, since we used the macro definition to import the model path earlier, we need to use a character pointer variable here to get our model path address, and then use ourerror checking macroTo call the model loading API
//Add private class member variables
class BPU_Detect {
private:
hbPackedDNNHandle_t packed_dnn_handle_;
}
// Method One FromFiles
const char* model_file_name = model_path_.c_str(); //Get the file path character pointer
RDK_CHECK_SUCCESS(
hbDNNInitializeFromFiles(&packed_dnn_handle_, &model_file_name, 1),
"Initialize model from file failed");//Call the model loading API
Next we will introduce how to use the API to read the model from memory.The core of this step is to obtain the memory of the file. We first use the C++ official library to open our model file. Then we move the file pointer to the end to get the file size. After getting the model size, we can usemalloc
The function allocates memory for the model. After we input the model data into the memory and verify whether the model file is completely read, we can prepare the model data array and length array to use the RDK's model loading API to initialize the model from the memory. You can see this process. Much more trouble than the previous API
//Add private class member variables
class BPU_Detect {
private:
hbPackedDNNHandle_t packed_dnn_handle_;
}
FILE* fp = fopen(model_path_.c_str(), "rb"); // Open the model file
if (!fp) {
std::cout << "Failed to open model file: " << model_path_ << std::endl;
return false;
}
// Get file size:
fseek(fp, 0, SEEK_END); // 1. Move the file pointer to the end
size_t model_size = static_cast<size_t>(ftell(fp)); // 2. Get the current position (i.e. file size)
fseek(fp, 0, SEEK_SET); // 3. Reset the file pointer to the beginning
// Allocate memory for model data
void* model_data = malloc(model_size);
if (!model_data) {
std::cout << "Failed to allocate memory for model data" << std::endl;
fclose(fp);
return false;
}
//Read model data into memory
size_t read_size = fread(model_data, 1, model_size, fp);
fclose(fp);
// Verify that the file has been read completely
if (read_size != model_size) {
std::cout << "Failed to read model data, expected " << model_size
<< " bytes, but got " << read_size << " bytes" << std::endl;
free(model_data);
return false;
}
// Prepare model data array and length array
const void* model_data_array[] = {model_data};
int32_t model_data_length[] = {static_cast<int32_t>(model_size)};
// Initialize the model from memory using the BPU API
RDK_CHECK_SUCCESS(
hbDNNInitializeFromDDR(&packed_dnn_handle_, model_data_array, model_data_length, 1),
"Initialize model from DDR failed");
// Release temporarily allocated memory
free(model_data);
So far ourLoadModel()
That’s it! ! ! We once again add a macro definition to select the model loading method. The complete code is as follows:
#define LOAD_FROM_DDR 0 // 0: Load model from file, 1: Load model from memory
//Two implementations of loading models
bool BPU_Detect::LoadModel() {
#if LOAD_FROM_DDR
//Read model data from file to memory
auto read_start = std::chrono::high_resolution_clock::now();
FILE* fp = fopen(model_path_.c_str(), "rb");
if (!fp) {
std::cout << "Failed to open model file: " << model_path_ << std::endl;
return false;
}
// Get file size
fseek(fp, 0, SEEK_END);
size_t model_size = static_cast<size_t>(ftell(fp));
fseek(fp, 0, SEEK_SET);
// Allocate memory and read model data
void* model_data = malloc(model_size);
if (!model_data) {
std::cout << "Failed to allocate memory for model data" << std::endl;
fclose(fp);
return false;
}
size_t read_size = fread(model_data, 1, model_size, fp);
fclose(fp);
if (read_size != model_size) {
std::cout << "Failed to read model data, expected " << model_size
<< " bytes, but got " << read_size << " bytes" << std::endl;
free(model_data);
return false;
}
//Load model from memory
auto init_start = std::chrono::high_resolution_clock::now();
const void* model_data_array[] = {model_data};
int32_t model_data_length[] = {static_cast<int32_t>(model_size)};
RDK_CHECK_SUCCESS(
hbDNNInitializeFromDDR(&packed_dnn_handle_, model_data_array, model_data_length, 1),
"Initialize model from DDR failed");
// release memory
free(model_data);
#else
//Load model from file
const char* model_file_name = model_path_.c_str();
RDK_CHECK_SUCCESS(
hbDNNInitializeFromFiles(&packed_dnn_handle_, &model_file_name, 1),
"Initialize model from file failed");
#endif
return true;
}
(5) Complete the private GetModelInfo() function
We continue to introduce ourGetModelInfo()
function, this function is used to obtain theGet model informationIncluding the model name list, model handle, input information, output information and other basic information of the model. By consulting the official API manual, we can see that there are nine APIs for obtaining this part of the model information, namely:
-
hbDNNGetModelNameList()
used to getpackedDNNHandle
The name list and number of the pointed models
/**
* Get model names from given packed handle
* @param[out] modelNameList model name list
* @param[out] modelNameCount number of model names
* @param[in] packedDNNHandle Horizon DNN handle, pointing to multiple models
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetModelNameList(char const ***modelNameList,
int32_t *modelNameCount,
hbPackedDNNHandle_t packedDNNHandle);
-
hbDNNGetModelHandle()
for use frompackedDNNHandle
Obtains the handle of a model in the pointed model list and allows the caller to use the returned value across functions and threads.dnnHandle
/**
* Get DNN Network handle from packed Handle with given model name
* @param[out] dnnHandle DNN handle, pointing to a model
* @param[in] packedDNNHandle DNN handle, pointing to multiple models
* @param[in] modelName model name
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetModelHandle(hbDNNHandle_t *dnnHandle,
hbPackedDNNHandle_t packedDNNHandle,
char const *modelName);
-
hbDNNGetInputCount()
used to getdnnHandle
The number of input tensors pointed to by the model
/**
* Get input count
* @param[out] inputCount The number of model input tensors
* @param[in] dnnHandle DNN handle, pointing to a model
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetInputCount(int32_t *inputCount, hbDNNHandle_t dnnHandle);
-
hbDNNGetInputName()
used to getdnnHandle
The name of the model input tensor pointed to
/**
* Get model input name
* @param[out] name The name of the model input tensor
* @param[in] dnnHandle DNN handle, pointing to a model
* @param[in] inputIndex The number of the model input tensor
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetInputName(char const **name, hbDNNHandle_t dnnHandle,
int32_t inputIndex);
-
hbDNNGetInputTensorProperties()
used to getdnnHandle
Properties of the model-specific input tensor pointed to
/**
* Get input tensor properties
* @param[out] properties input tensor information
* @param[in] dnnHandle DNN handle, pointing to a model
* @param[in] inputIndex The number of the model input tensor
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetInputTensorProperties(hbDNNTensorProperties *properties,
hbDNNHandle_t dnnHandle,
int32_t inputIndex);
-
hbDNNGetOutputCount()
used to getdnnHandle
The number of model output tensors pointed to
/**
* Get output count
* @param[out] outputCount The number of model output tensors
* @param[in] dnnHandle DNN handle, pointing to a model
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetOutputCount(int32_t *outputCount, hbDNNHandle_t dnnHandle);
-
hbDNNGetOutputName()
used to getdnnHandle
The name of the pointed model output tensor
/**
* Get model output name
* @param[out] name The name of the model output tensor
* @param[in] dnnHandle DNN handle, pointing to a model
* @param[in] outputIndex The number of the model output tensor
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetOutputName(char const **name, hbDNNHandle_t dnnHandle,
int32_t outputIndex);
-
hbDNNGetOutputTensorProperties()
used to getdnnHandle
Properties of the model-specific output tensor pointed to
/**
* Get output tensor properties
* @param[out] properties output tensor information
* @param[in] dnnHandle DNN handle, pointing to a model
* @param[in] outputIndex The number of the model output tensor
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNGetOutputTensorProperties(hbDNNTensorProperties *properties,
hbDNNHandle_t dnnHandle,
int32_t outputIndex);
But in this step we only need to use five APIs specific to the model itself to obtain the basic information of our model. First we usehbDNNGetModelNameList()
function to get the number of packaged models in the Bin model we loaded. Since we know that we are only using Yolov5, if we detect that there are multiple packages in the Bin model after our conversion, it means that our Bin model is wrong. , so we first create two variables according to the requirements of the API to obtain the model list and quantity. Then we call the API and determine whether the number of models is correct. The specific code implementation is as follows:
//Add private class member variables
class BPU_Detect {
private:
const char* model_name_;// model name
}
// Get the model name list and quantity
const char** model_name_list; //Create model list variables
int model_count = 0; //Create model packaging quantity variable
RDK_CHECK_SUCCESS(
hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle_),
"hbDNNGetModelNameList failed");
if(model_count > 1) {
std::cout << "Model count: " << model_count << std::endl;
std::cout << "Please check the model count!" << std::endl;
return false;
}
model_name_ = model_name_list[0];
After checking that there are no errors in the model list, we can obtain a return value of the model that the caller can use across functions and threads.dnnHandle
handle, we first use it according to the requirements of the APIhbDNNHandle_t
Create a private class member variable, and then you can call the API directly
//Add private class member variables
class BPU_Detect {
private:
hbDNNHandle_t dnn_handle_;//Model handle
}
// Get model handle
RDK_CHECK_SUCCESS(
hbDNNGetModelHandle(&dnn_handle_, packed_dnn_handle_, model_name_),
"hbDNNGetModelHandle failed");
After creating the model handle, we can obtain the input information! This part involves two APIs:hbDNNGetInputCount
Used to obtain the number of model network inputs andhbDNNGetInputTensorProperties
The tensor used to obtain the model input is still because we are using the detection model of Yolov5, so our model should be a model with one input. If there are multiple inputs, it means that our model is wrong. At the same time, we foundhbDNNGetInputTensorProperties
The API output is ahbDNNTensorProperties
Type structure, we look at the structure definition and find that this structure is a nested structure, which is nestedhbDNNTensorShape
structure,hbDNNQuantiShift
structure,hbDNNQuantiScale
structure andhbDNNQuantiType
The structure can accurately describe the input tensor information. The structure definition and the explanation of each member are as follows:
typedef struct {
int32_t dimensionSize[HB_DNN_TENSOR_MAX_DIMENSIONS];//Indicates the size of each dimension of the tensor. HB_DNN_TENSOR_MAX_DIMENSIONS indicates the maximum number of dimensions that the tensor can have.
int32_t numDimensions;//The number of dimensions of the tensor, indicating how many dimensions the tensor is
} hbDNNTensorShape;
typedef struct {
int32_t shiftLen;//Offset length during quantization, indicating the amount of offset data
uint8_t *shiftData; //Pointer to offset data. These data are usually used to shift tensor data during the quantization process.
} hbDNNQuantiShift;
typedef struct {
int32_t scaleLen;//The length of the scaling factor, indicating how many scaling factors there are
float *scaleData; //Pointer to scaling factor data, usually used to adjust the size of tensor data during quantization
int32_t zeroPointLen;//The length of the zero point, indicating the number of zero point data
int8_t *zeroPointData;//Pointer to zero point data, which is used to adjust the zero point of the tensor during the quantization process
} hbDNNQuantiScale;
typedef enum {
NONE, //No quantification
SHIFT, //Use displacement quantization
SCALE//Use scaling quantization
} hbDNNQuantiType;
typedef struct {
hbDNNTensorShape validShape; //The valid shape of the tensor, indicating the true size of the tensor
hbDNNTensorShape alignedShape; //The aligned shape of the tensor, indicating the aligned tensor size
int32_t tensorLayout;//Tensor layout, indicating how data is organized in memory
int32_t tensorType;//The data type of the tensor, indicating the data type of the elements in the tensor
hbDNNQuantiShift shift;//Offset information in quantization
hbDNNQuantiScale scale;//Scaling information in quantization
hbDNNQuantiType quantiType; //Quantization type, indicating whether quantization uses displacement, scaling or no quantization
int32_t quantizeAxis;//The axis of quantization, indicating in which dimension the quantization operation is applied
int32_t alignedByteSize;//The aligned byte size, indicating the size of the tensor after alignment in memory
int32_t stride[HB_DNN_TENSOR_MAX_DIMENSIONS];//The stride of each dimension represents the element interval of each dimension of the tensor and supports the maximum number of dimensions HB_DNN_TENSOR_MAX_DIMENSIONS
} hbDNNTensorProperties;
After understanding these structures, we can define some of our variables based on the structure parameters. At the same time, because we know that our model is single-input, we also know that the data we input should be NV12, and the data layout is NCHW. At the same time, enter the valid of Tensor data The shape should be (1,3,H,W), so after we use the API to obtain our input information, we can also use this security information to perform some input security checks, so we first add some necessary private class member variables , then we can callhbDNNGetInputCount
andhbDNNGetInputTensorProperties
These two APIs are used to obtain input information,Then we can perform security checks based on the number of inputs received and the input tensor
//Add private class member variables
class BPU_Detect {
private:
//Model input parameters
int input_h_;//Input height
int input_w_;//Input width
hbDNNTensorProperties input_properties_; //Input tensor properties
}
// Get input information
int32_t input_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetInputCount(&input_count, dnn_handle_),
"hbDNNGetInputCount failed");
RDK_CHECK_SUCCESS(
hbDNNGetInputTensorProperties(&input_properties_, dnn_handle_, 0),
"hbDNNGetInputTensorProperties failed");
/*--------------------------------The following is a model safety check-------------- ---*/
//Check the number of model inputs
if(input_count > 1){
std::cout << "Model input node is greater than 1, please check!" << std::endl;
return false;
}
//Check the input type of the model
if(input_properties_. == 4){
std::cout << "Input tensor type: HB_DNN_IMG_TYPE_NV12" << std::endl;
}
else{
std::cout << "The input tensor type is not HB_DNN_IMG_TYPE_NV12, please check!" << std::endl;
return false;
}
//Check the input data arrangement of the model
if(input_properties_.tensorType == 1){
std::cout << "Input tensor data layout: HB_DNN_LAYOUT_NCHW" << std::endl;
}
else{
std::cout << "The input tensor data layout is not HB_DNN_LAYOUT_NCHW, please check!" << std::endl;
return false;
}
// Check the valid shape of the model input Tensor data
input_h_ = input_properties_.[2];
input_w_ = input_properties_.[3];
if (input_properties_. == 4)
{
std::cout << "The input size is: (" << input_properties_.[0];
std::cout << ", " << input_properties_.[1];
std::cout << ", " << input_h_;
std::cout << ", " << input_w_ << ")" << std::endl;
}
else
{
std::cout << "The input size is not (1,3,640,640), please check!" << std::endl;
return false;
}
After the input is obtained and checked, how can our output fall behind? Then we can start checking our output. We usehbDNNGetOutputCount
Get the number of outputs. Since we know that Yolov5 should have three outputs, we can check the output of the model here. After getting the number of outputs, we usehbDNNTensor
Create a private class variablehbDN output_tensors_
Then you can usehbDNNTensor
This type allocates memory for the model’s output
//Add private class member variables
class BPU_Detect {
private:
hbDNNTensor* output_tensors_;//Output tensor array
}
//Model output quantity check
int32_t output_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetOutputCount(&output_count, dnn_handle_),
"hbDNNGetOutputCount failed");
//Allocate output tensor memory
output_tensors_ = new hbDNNTensor[output_count];
But there is a very important step that needs to be completed here. Since YOLOv5 has 3 output heads, corresponding to 3 different scales of feature maps, we also need to ensure that the output order of the model is: Small target (8 times downsampling) - > Medium target (16x downsampling) -> Large target (32x downsampling). In order to complete this step, we first define an array of output order.output_order_[3]
, then we manually initialize the model output sequence of the model and define our expected output feature map size and number of channels. Then we can use a for loop to traverse each of our expected output scales. If we obtain the actual feature map size and If the number of channels matches what we expect, we can record the correct output sequence.
//Add private class member variables
class BPU_Detect {
private:
int output_order_[3];//Output order mapping
}
//Initialize default order
output_order_[0] = 0; // Default 1st output
output_order_[1] = 1; // Default 2nd output
output_order_[2] = 2; // Default 3rd output
// Define the desired output feature map size and number of channels
int32_t expected_shapes[3][3] = {
{H_8, W_8, 3 * (5 + classes_num_)}, // Small target feature map: H/8 x W/8
{H_16, W_16, 3 * (5 + classes_num_)}, // Medium target feature map: H/16 x W/16
{H_32, W_32, 3 * (5 + classes_num_)} // Large target feature map: H/32 x W/32
};
// Iterate through each desired output scale
for(int i = 0; i < 3; i++) {
// Traverse the actual output nodes
for(int j = 0; j < 3; j++) {
hbDNNTensorProperties output_properties;//Get the properties of the current output node
RDK_CHECK_SUCCESS(
hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, j),
"Get output tensor properties failed");
// Get the actual feature map size and number of channels
int32_t actual_h = output_properties.[1];
int32_t actual_w = output_properties.[2];
int32_t actual_c = output_properties.[3];
// If actual size and number of channels match expected
if(actual_h == expected_shapes[i][0] &&
actual_w == expected_shapes[i][1] &&
actual_c == expected_shapes[i][2]) {
output_order_[i] = j; // Record the correct output order
break;
}
}
}
So far ourGetModelInfo()
That’s it! ! ! The specific complete code is as follows:
// Get model information implementation
bool BPU_Detect::GetModelInfo() {
// Get the list of model names
const char** model_name_list;
int model_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle_),
"hbDNNGetModelNameList failed");
if(model_count > 1) {
std::cout << "Model count: " << model_count << std::endl;
std::cout << "Please check the model count!" << std::endl;
return false;
}
model_name_ = model_name_list[0];
// Get model handle
RDK_CHECK_SUCCESS(
hbDNNGetModelHandle(&dnn_handle_, packed_dnn_handle_, model_name_),
"hbDNNGetModelHandle failed");
// Get input information
int32_t input_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetInputCount(&input_count, dnn_handle_),
"hbDNNGetInputCount failed");
RDK_CHECK_SUCCESS(
hbDNNGetInputTensorProperties(&input_properties_, dnn_handle_, 0),
"hbDNNGetInputTensorProperties failed");
if(input_count > 1){
std::cout << "Model input node is greater than 1, please check!" << std::endl;
return false;
}
if(input_properties_. == 4){
std::cout << "Input tensor type: HB_DNN_IMG_TYPE_NV12" << std::endl;
}
else{
std::cout << "The input tensor type is not HB_DNN_IMG_TYPE_NV12, please check!" << std::endl;
return false;
}
if(input_properties_.tensorType == 1){
std::cout << "Input tensor data layout: HB_DNN_LAYOUT_NCHW" << std::endl;
}
else{
std::cout << "The input tensor data layout is not HB_DNN_LAYOUT_NCHW, please check!" << std::endl;
return false;
}
// Get input size
input_h_ = input_properties_.[2];
input_w_ = input_properties_.[3];
if (input_properties_. == 4)
{
std::cout << "The input size is: (" << input_properties_.[0];
std::cout << ", " << input_properties_.[1];
std::cout << ", " << input_h_;
std::cout << ", " << input_w_ << ")" << std::endl;
}
else
{
std::cout << "The input size is not (1,3,640,640), please check!" << std::endl;
return false;
}
// Get the output information and adjust the output order
int32_t output_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetOutputCount(&output_count, dnn_handle_),
"hbDNNGetOutputCount failed");
//Allocate output tensor memory
output_tensors_ = new hbDNNTensor[output_count];
// =============== Adjust output header sequence mapping ===============
// YOLOv5 has 3 output heads, corresponding to 3 different scales of feature maps.
// Need to ensure that the output order is: small target (8x downsampling) -> medium target (16x downsampling) -> large target (32x downsampling)
//Initialize default order
output_order_[0] = 0; // Default 1st output
output_order_[1] = 1; // Default 2nd output
output_order_[2] = 2; // Default 3rd output
// Define the desired output feature map size and number of channels
int32_t expected_shapes[3][3] = {
{H_8, W_8, 3 * (5 + classes_num_)}, // Small target feature map: H/8 x W/8
{H_16, W_16, 3 * (5 + classes_num_)}, // Medium target feature map: H/16 x W/16
{H_32, W_32, 3 * (5 + classes_num_)} // Large target feature map: H/32 x W/32
};
// Iterate through each desired output scale
for(int i = 0; i < 3; i++) {
// Traverse the actual output nodes
for(int j = 0; j < 3; j++) {
// Get the properties of the current output node
hbDNNTensorProperties output_properties;
RDK_CHECK_SUCCESS(
hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, j),
"Get output tensor properties failed");
// Get the actual feature map size and number of channels
int32_t actual_h = output_properties.[1];
int32_t actual_w = output_properties.[2];
int32_t actual_c = output_properties.[3];
// If actual size and number of channels match expected
if(actual_h == expected_shapes[i][0] &&
actual_w == expected_shapes[i][1] &&
actual_c == expected_shapes[i][2]) {
//Record the correct output sequence
output_order_[i] = j;
break;
}
}
}
//Print out sequence mapping information
std::cout << "\n============ Output Order Mapping ============" << std::endl;
std::cout << "Small object (1/" << 8 << "): output[" << output_order_[0] << "]" << std::endl;
std::cout << "Medium object (1/" << 16 << "): output[" << output_order_[1] << "]" << std::endl;
std::cout << "Large object (1/" << 32 << "): output[" << output_order_[2] << "]" << std::endl;
std::cout << "==========================================\ n" << std::endl;
return true;
}
(6) Complete the private PreProcess() function
Next we can complete the pre-processing function of the model. Image pre-processing is nothing more than image size conversion and image format conversion, so this part is relatively simple and I will talk about it a little faster. We use letterbox for image size conversion. way, as we all know, there is an image conversion function in OpenCV calledresize
This function can directly transform the image size. However, because the implementation of this function is too simple and crude, when the image size is inconsistent, the aspect ratio of the image will be changed, causing image distortion. For example, in the following situation, you can see the image on the right There was a twist
When we use the LetterBox method, we can see that the picture is not distorted because the LetterBox method maintains the aspect ratio of the original image when resizing the image and scales it proportionally. When the long side is resized to the required length When , the remaining part of the short side is filled with gray, thus maintaining the aspect ratio of the original image.
So next we use LetterBox to implement image preprocessing. The specific code is as follows. The core idea is to scale the image proportionally to adapt to the target size while maintaining the aspect ratio of the original image. To ensure that the image is centered within the target dimensions, the empty areas will be filled with a fill, usually a neutral color (such as 127, 127, 127). This way we avoid distortion when the image is scaled and ensure that the aspect ratio of the image remains the same
//Add private class member variables
class BPU_Detect {
private:
float x_scale_; //X-direction scaling ratio
float y_scale_; // Y direction scaling ratio
int x_shift_; // X direction offset
int y_shift_; // Y direction offset
cv::Mat resized_img_; // Scaled image
hbDNNTensor input_tensor_; //Input tensor
}
//Use letterbox method for preprocessing
x_scale_ = std::min(1.0f * input_h_ / input_img.rows, 1.0f * input_w_ / input_img.cols);
y_scale_ = x_scale_;
int new_w = input_img.cols * x_scale_;
x_shift_ = (input_w_ - new_w) / 2;
int x_other = input_w_ - new_w - x_shift_;
int new_h = input_img.rows * y_scale_;
y_shift_ = (input_h_ - new_h) / 2;
int y_other = input_h_ - new_h - y_shift_;
cv::resize(input_img, resized_img_, cv::Size(new_w, new_h));
cv::copyMakeBorder(resized_img_, resized_img_, y_shift_, y_other,
x_shift_, x_other, cv::BORDER_CONSTANT, cv::Scalar(127, 127, 127));
After completing the scaling of the image size, we use the OpenCV function to convert the image to NV12 format:
//Convert to NV12 format
cv::Mat yuv_mat;
cv::cvtColor(resized_img_, yuv_mat, cv::COLOR_BGR2YUV_I420);
After completing the previous image operations, we have to start preparing the input data for the model! Next, we need to convert the processed image data into an input format that our model can accept. In this process, we first allocate memory for the input tensor and copy the processed image data (YUV format) to in memory to ensure that the model can access and use this data correctly. Which involves an API forhbSysAllocCachedMem
, let’s take a look at his explanation and the structure definitions involved:
/**
* Allocate cachable system memory
* @param[out] mem
* @param[in] size
* @return 0 if success, return defined error code otherwise
*/
int32_t hbSysAllocCachedMem(hbSysMem *mem, uint32_t size);
typedef struct {
hbSysMem sysMem[4];
hbDNNTensorProperties properties;
} hbDNNTensor;
typedef struct {
uint64_t phyAddr;
void *virAddr;
uint32_t memSize;
} hbSysMem;
According to the API, we first need to create ahbSysMem
Structure, this structure is used to describe the physical address of memory (phyAddr
), virtual address (virAddr
) and the size of the memory (memSize
). Next, we callhbSysAllocCachedMem
The function allocates memory for the input tensor. The allocated memory is cacheable, which means that the hardware can directly access this memory when processing the data without frequent swapping with main memory.hbDNNTensor
is a structure used to store the entire tensor information, which contains multiplehbSysMem
Structures to describe different parts of data (such as input, output, etc.). andhbDNNTensorProperties
Then store attribute information about the tensor, such as the shape, data type, quantization information, etc. of the tensor.
We first passhbSysAllocCachedMem
Allocate cache memory for input tensors,sysMem[0]
is used for storageYUV
Data memory.size
for imagesYUV
The memory size required for the data, i.e.3 * input_h_ * input_w_ / 2
, this is because the memory layout of the YUV format requires the data of the Y component, U component and V component to be stored separately. The Y component occupies a larger memory space, and the U and V components each occupy half the size. Then we will process the YUV Image data fromyuv_mat
Copy toynv12
in, amongynv12
we passedhbSysAllocCachedMem
The virtual address of the allocated memory, then we convert it to the NV12 format by alternating copies of the U and V components to meet the input requirements of the model. Finally, after the data is ready, we call the hbSysFlushMem function to clean the memory cache. The specific implementation code is as follows:
// Prepare to input tensor
hbSysAllocCachedMem(&input_tensor_.sysMem[0], int(3 * input_h_ * input_w_ / 2));
uint8_t* yuv = yuv_mat.ptr<uint8_t>();
uint8_t* ynv12 = (uint8_t*)input_tensor_.sysMem[0].virAddr;
// Calculate the height and width of the UV part, and the size of the Y part
int uv_height = input_h_ / 2;
int uv_width = input_w_ / 2;
int y_size = input_h_ * input_w_;
//Copy the Y component data to the input tensor
memcpy(ynv12, yuv, y_size);
// Get the UV component position in NV12 format
uint8_t* nv12 = ynv12 + y_size;
uint8_t* u_data = yuv + y_size;
uint8_t* v_data = u_data + uv_height * uv_width;
//Write U and V components alternately into NV12 format
for(int i = 0; i < uv_width * uv_height; i++) {
*nv12++ = *u_data++;
*nv12++ = *v_data++;
}
//Clear the memory cache to ensure that the data is ready for use by the model
hbSysFlushMem(&input_tensor_.sysMem[0], HB_SYS_MEM_CACHE_CLEAN);//Clear the cache to ensure data synchronization
So far ourPreProcess()
That’s it! ! ! The specific complete code is as follows:
(7) Complete the private Inference() function
We are now going to complete our inference part. After consulting the user manual, we can see that in the inference part we mainly need the following two APIs. According to the API introduction, we can seehbDNNInfer
Mainly used to perform our model inference andhbDNNWaitTaskDone
It is used to wait for the inference task to complete or timeout. Its main function is to wait for the execution results of the inference task until the task is completed or the specified timeout period is exceeded.
/**
*DNN inference
* @param[out] taskHandle: return a pointer represent the task if success, otherwise nullptr
Returns a pointer representing the task. If successful, returns a pointer to the task handle. On failure, returns nullptr.
* @param[out] output: pointer to the output tensor array, the size of array should be equal to $(`hbDNNGetOutputCount`)
Pointer to an array of output tensors. The size of the array should be equal to the number returned by hbDNNGetOutputCount.
* @param[in] input: input tensor array, the size of array should be equal to $(`hbDNNGetInputCount`)
Pointer to the input tensor array. The size of the array should be equal to the number returned by hbDNNGetInputCount.
* @param[in] dnnHandle: pointer to the dnn handle
DNN handle, used to identify the model used by the inference task
* @param[in] inferCtrlParam: infer control parameters
Inference control parameters, used to set some configuration items during the inference process (such as whether to use acceleration, inference mode, etc.)
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNInfer(hbDNNTaskHandle_t *taskHandle, hbDNNTensor **output,
hbDNNTensor const *input, hbDNNHandle_t dnnHandle,
hbDNNInferCtrlParam *inferCtrlParam);
/**
* Wait util task completed or timeout.
* @param[in] taskHandle: pointer to the task
* @param[in] timeout: timeout of milliseconds
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNWaitTaskDone(hbDNNTaskHandle_t taskHandle, int32_t timeout);
Then we check the followinghbDNNInferCtrlParam *inferCtrlParam
The parameters are defined and passed in as follows:
#define HB_DNN_INITIALIZE_INFER_CTRL_PARAM(param) \
{ \
(param)->bpuCoreId = HB_BPU_CORE_ANY; \
(param)->dspCoreId = HB_DSP_CORE_ANY; \
(param)->priority = HB_DNN_PRIORITY_LOWEST; \
(param)->more = false; \
(param)->customId = 0; \
(param)->reserved1 = 0; \
(param)->reserved2 = 0; \
}
typedef struct {
int32_t bpuCoreId; //// BPU core ID, used to specify which BPU core the inference task is executed on
int32_t dspCoreId; //// DSP core ID, used to specify which DSP core the inference task is executed on
int32_t priority; //// Priority of inference task
int32_t more; //// Whether there are more inference tasks, usually set to false
int64_t customId; //// Custom ID, which can be used to identify inference tasks
int32_t reserved1; //// Reserved field, not used yet
int32_t reserved2; //// Reserved field, not used yet
} hbDNNInferCtrlParam;
After understanding the above, we can start writing our reasoning part! Let's first complete some pre-tasks before performing inference. We create ahbDNNTaskHandle_t
Inference task handle of typetask_handle_
Used to identify the uniqueness of an inference task to facilitate our task management, and then initialize the task handletask_handle_
fornullptr
, to ensure that it is empty before the inference task starts. For each output tensor, we first get its attributes, and then based on the aligned size of the output tensor (alignedByteSize
) allocate the corresponding memory. Memory allocation has been introduced beforehbSysAllocCachedMem
To complete, this function will ensure that each output tensor is allocated an appropriately sized cache memory to ensure that subsequent data processing will not cause memory out-of-bounds or access errors.So our code is as follows:
//Add private class member variables
class BPU_Detect {
private:
hbDNNTaskHandle_t task_handle_; // Inference task handle
}
//Initialize the task handle to nullptr
task_handle_ = nullptr;
//Initialize input tensor attributes
input_tensor_.properties = input_properties_;
// Get the output tensor attributes
for(int i = 0; i < 3; i++) {
hbDNNTensorProperties output_properties;
RDK_CHECK_SUCCESS(
hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, i),
"Get output tensor properties failed");
output_tensors_[i].properties = output_properties;
// Allocate memory for output
int out_aligned_size = output_properties.alignedByteSize;
RDK_CHECK_SUCCESS(
hbSysAllocCachedMem(&output_tensors_[i].sysMem[0], out_aligned_size),
"Allocate output memory failed");
}
After completing the prerequisite tasks, we can start to perform inference! ! ! We first usehbDNNInferCtrlParam
Create inference parameters and use the officially providedHB_DNN_INITIALIZE_INFER_CTRL_PARAM
Pass it in, and then we can callhbDNNInfer
Execute the reasoning, and at the same time wehbDNNWaitTaskDone
function to wait for the inference task to complete
hbDNNInferCtrlParam infer_ctrl_param;
HB_DNN_INITIALIZE_INFER_CTRL_PARAM(&infer_ctrl_param);
RDK_CHECK_SUCCESS(
hbDNNInfer(&task_handle_, &output_tensors_, &input_tensor_, dnn_handle_, &infer_ctrl_param),
"Model inference failed");
RDK_CHECK_SUCCESS(
hbDNNWaitTaskDone(task_handle_, 0),
"Wait task done failed");
So far ourInference()
That’s it! ! ! The specific complete code is as follows:
//Inference implementation
bool BPU_Detect::Inference() {
//Initialize the task handle to nullptr
task_handle_ = nullptr;
//Initialize input tensor attributes
input_tensor_.properties = input_properties_;
// Get the output tensor attributes
for(int i = 0; i < 3; i++) {
hbDNNTensorProperties output_properties;
RDK_CHECK_SUCCESS(
hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, i),
"Get output tensor properties failed");
output_tensors_[i].properties = output_properties;
// Allocate memory for output
int out_aligned_size = output_properties.alignedByteSize;
RDK_CHECK_SUCCESS(
hbSysAllocCachedMem(&output_tensors_[i].sysMem[0], out_aligned_size),
"Allocate output memory failed");
}
hbDNNInferCtrlParam infer_ctrl_param;
HB_DNN_INITIALIZE_INFER_CTRL_PARAM(&infer_ctrl_param);
RDK_CHECK_SUCCESS(
hbDNNInfer(&task_handle_, &output_tensors_, &input_tensor_, dnn_handle_, &infer_ctrl_param),
"Model inference failed");
RDK_CHECK_SUCCESS(
hbDNNWaitTaskDone(task_handle_, 0),
"Wait task done failed");
return true;
}
(8) Complete the private ProcessFeatureMap() function
We still need to complete the post-processingProcessFeatureMap
Function, this function is a feature map processing auxiliary function. It is mainly used to extract the bounding box of the target detection and its corresponding score from the output feature map of the network, and store this information for subsequent NMS (non-maximum value) suppression) processing, first, we output the quantization type of the tensor (quantiType
) to check if the quantization type of the output tensor is notNONE
, an error message will be output and returned, because the inference task here assumes that the output data is an unquantized floating point number. If it is quantized data, the processing method will be different.
if (output_tensor. != NONE) {
std::cout << "Output tensor quantization type should be NONE!" << std::endl;
return;
}
Then in order to ensure that the data read from the memory is the latest, we callhbSysFlushMem
function to refresh the memory cache. This operation will synchronize the data in the memory to the main memory to prevent read and write inconsistencies caused by the cache.
/**
* Flush cachable system memory
* @param[in] mem
* @return 0 if success, return defined error code otherwise
*/
int32_t hbSysFlushMem(hbSysMem *mem, int32_t flag);
hbSysFlushMem(&output_tensor.sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);
Then we passoutput_tensor.sysMem[0].virAddr
to get the data address of the output tensor and convert it tofloat*
Type, this address points to the original data output by model inference
auto* raw_data = reinterpret_cast<float*>(output_tensor.sysMem[0].virAddr);
with For loop to traverse each position of the output feature map (height
andwidth
), each position here contains some prediction data, including the center coordinates, width and height of the bounding box, and category score, and each anchor point (anchors
) represents the shape of a possible target
for(int h = 0; h < height; h++) {
for(int w = 0; w < width; w++) {
for(const auto& anchor : anchors) {
For each location, we first read the current prediction data (including bounding box location and category score, etc.), and then based on the confidence of the location (cur_raw[4]
, usually the probability of object existence) is filtered, if the confidence is lower than the preset threshold (conf_thres_raw
), skip the processing of this position
if(cur_raw[4] < conf_thres_raw) continue;
Next, we will find the maximum class probability among the scores of all classes (cur_raw[5]
arrivecur_raw[classes_num_+5]
), that is, predict the target category to which the current anchor point belongs
int cls_id = 5;
int end = classes_num_ + 5;
for(int i = 6; i < end; i++) {
if(cur_raw[i] > cur_raw[cls_id]) {
cls_id = i;
}
}
After finding the maximum category probability, we can calculate the final score of the current anchor point. This final score is calculated based on the confidence of the anchor point and the maximum category probability. Bysigmoid
The inversion of the function is calculated to obtain the final target score, which is lower thanscore_threshold_
The detection results will be filtered out
float score = 1.0f / (1.0f + std::exp(-cur_raw[4])) *
1.0f / (1.0f + std::exp(-cur_raw[cls_id]));
if(score < score_threshold_) continue;
Finally we will decode the specific position and size of the bounding box. according tosigmoid
To calculate the function inversion, we take the center coordinate (cur_raw[0]
、cur_raw[1]
) and width and height (cur_raw[2]
、cur_raw[3]
) are recovered from the output of the network to the actual bounding box coordinates, and then we convert them to the actual size of the image, while saving the calculated bounding box and score to the corresponding category.bboxes_ (array that stores the positions of all detection boxes)
andscores_(storage the corresponding scores)
middle
float stride = input_h_ / height;
float center_x = ((1.0f / (1.0f + std::exp(-cur_raw[0]))) * 2 - 0.5f + w) * stride;
float center_y = ((1.0f / (1.0f + std::exp(-cur_raw[1]))) * 2 - 0.5f + h) * stride;
float bbox_w = std::pow((1.0f / (1.0f + std::exp(-cur_raw[2]))) * 2, 2) * ;
float bbox_h = std::pow((1.0f / (1.0f + std::exp(-cur_raw[3]))) * 2, 2) * ;
float bbox_x = center_x - bbox_w / 2.0f;
float bbox_y = center_y - bbox_h / 2.0f;
bboxes_[cls_id].push_back(cv::Rect2d(bbox_x, bbox_y, bbox_w, bbox_h));
scores_[cls_id].push_back(score);
So far ourProcessFeatureMap()
That’s it! ! ! The specific complete code is as follows:
// Feature map processing auxiliary function
void BPU_Detect::ProcessFeatureMap(hbDNNTensor& output_tensor,
int height, int width,
const std::vector<std::pair<double, double>>& anchors,
float conf_thres_raw) {
// Check the quantization type
if (output_tensor. != NONE) {
std::cout << "Output tensor quantization type should be NONE!" << std::endl;
return;
}
// refresh memory
hbSysFlushMem(&output_tensor.sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);
//Get the output data pointer
auto* raw_data = reinterpret_cast<float*>(output_tensor.sysMem[0].virAddr);
// Traverse each position of the feature map
for(int h = 0; h < height; h++) {
for(int w = 0; w < width; w++) {
for(const auto& anchor : anchors) {
// Get prediction data for the current location
float* cur_raw = raw_data;
raw_data += (5 + classes_num_);
// Conditional probability filtering
if(cur_raw[4] < conf_thres_raw) continue;
// Find the maximum class probability
int cls_id = 5;
int end = classes_num_ + 5;
for(int i = 6; i < end; i++) {
if(cur_raw[i] > cur_raw[cls_id]) {
cls_id = i;
}
}
// Calculate final score
float score = 1.0f / (1.0f + std::exp(-cur_raw[4])) *
1.0f / (1.0f + std::exp(-cur_raw[cls_id]));
// score filter
if(score < score_threshold_) continue;
cls_id -= 5;
// decode bounding box
float stride = input_h_ / height;
float center_x = ((1.0f / (1.0f + std::exp(-cur_raw[0]))) * 2 - 0.5f + w) * stride;
float center_y = ((1.0f / (1.0f + std::exp(-cur_raw[1]))) * 2 - 0.5f + h) * stride;
float bbox_w = std::pow((1.0f / (1.0f + std::exp(-cur_raw[2]))) * 2, 2) * ;
float bbox_h = std::pow((1.0f / (1.0f + std::exp(-cur_raw[3]))) * 2, 2) * ;
float bbox_x = center_x - bbox_w / 2.0f;
float bbox_y = center_y - bbox_h / 2.0f;
//Save test results
bboxes_[cls_id].push_back(cv::Rect2d(bbox_x, bbox_y, bbox_w, bbox_h));
scores_[cls_id].push_back(score);
}
}
}
}
(9) Complete the private PostProcess() function
After the inference is completed, of course it’s time to post-process. Our post-processing is mainly divided into the following three steps:Clear the last results, process the output feature map, and perform NMS (non-maximum suppression) on each category, before each inference and post-processing starts, we first clear the previously stored detection results.bboxes_
stores the detected bounding box,scores_
Store the score of each bounding box,indices_
Store the category index corresponding to each bounding box, and then we based on the number of categories of the detection task (classes_num_
) to adjust the size of the bounding box, score and index to adapt to different categories of detection results, and according to the presetscore_threshold_
(score threshold), convert it to its original logarithmic formconf_thres_raw
。(PS: This conversion is to match the output format of the model, because usually the score range output by the deep learning model is based on logarithmic calculation) The specific code is as follows:
//Add private class member variables
class BPU_Detect {
private:
//Storage of detection results
std::vector<std::vector<cv::Rect2d>> bboxes_; // Bounding boxes for each category
std::vector<std::vector<float>> scores_; // Score for each category
std::vector<std::vector<int>> indices_; // Index after NMS
// YOLOv5 anchors information
std::vector<std::pair<double, double>> s_anchors_; // small target anchors
std::vector<std::pair<double, double>> m_anchors_; // Target anchors
std::vector<std::pair<double, double>> l_anchors_; // Large target anchors
}
bboxes_.clear(); // Clear bounding boxes
scores_.clear(); // Clear scores
indices_.clear(); // Clear indexes
bboxes_.resize(classes_num_); // Adjust the size of the bounding box array according to the number of classes
scores_.resize(classes_num_); // Adjust the size of the score array according to the number of categories
indices_.resize(classes_num_); // Adjust the size of the index array according to the number of categories
float conf_thres_raw = -log(1 / score_threshold_ - 1);
Since multi-scale output is often used in target detection tasks, each scale is responsible for the detection of targets of different sizes. At this time, we can call the function we defined.ProcessFeatureMap
The feature map auxiliary processing function is responsible for processing these feature maps
// Process the output of three scales
ProcessFeatureMap(output_tensors_[0], H_8, W_8, s_anchors_, conf_thres_raw);
ProcessFeatureMap(output_tensors_[1], H_16, W_16, m_anchors_, conf_thres_raw);
ProcessFeatureMap(output_tensors_[2], H_32, W_32, l_anchors_, conf_thres_raw);
Finally we use thecv::dnn::NMSBoxes
function to suppress duplicate boxes based on the score of the bounding box, the degree of overlap (IOU), and the set threshold, and finally obtain the bounding box index of each category (indices_
)
for(int i = 0; i < classes_num_; i++) {
cv::dnn::NMSBoxes(bboxes_[i], scores_[i], score_threshold_,
nms_threshold_, indices_[i], , nms_top_k_);
}
So far ourProcessFeatureMap()
That’s it! ! ! The specific complete code is as follows:
// Post-processing implementation
bool BPU_Detect::PostProcess() {
//Clear the last result
bboxes_.clear();
scores_.clear();
indices_.clear();
//Resize
bboxes_.resize(classes_num_);
scores_.resize(classes_num_);
indices_.resize(classes_num_);
float conf_thres_raw = -log(1 / score_threshold_ - 1);
// Process the output of three scales
ProcessFeatureMap(output_tensors_[0], H_8, W_8, s_anchors_, conf_thres_raw);
ProcessFeatureMap(output_tensors_[1], H_16, W_16, m_anchors_, conf_thres_raw);
ProcessFeatureMap(output_tensors_[2], H_32, W_32, l_anchors_, conf_thres_raw);
// Perform NMS for each category
for(int i = 0; i < classes_num_; i++) {
cv::dnn::NMSBoxes(bboxes_[i], scores_[i], score_threshold_,
nms_threshold_, indices_[i], , nms_top_k_);
}
return true;
}
(10) Complete the private DrawResults() function
Then we will complete our result drawing display tool auxiliary functionDrawResults
La! Since the requirements for the result box are different during development or debugging, we first create a macro definition to select whether we need to draw the result box.
#define ENABLE_DRAW 0 // Drawing switch: 0-disable, 1-enable
Since this part belongs entirely to OpenCV, we will not describe it in detail. We only need to traverse the detection results after NMS for each category. The only thing to note is that our images are preprocessed usingLetterBox
method to resize, so we need to passx_shift_
、y_shift_
、x_scale_
、y_scale_
Perform coordinate transformation with other parameters to restore the bounding box to the correct image space.
float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;
float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;
float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;
float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;
at lastDrawResults
The complete code of the function is as follows:
// Drawing result implementation
void BPU_Detect::DrawResults(cv::Mat& img) {
#if ENABLE_DRAW
for(int cls_id = 0; cls_id < classes_num_; cls_id++) {
if(!indices_[cls_id].empty()) {
for(size_t i = 0; i < indices_[cls_id].size(); i++) {
int idx = indices_[cls_id][i];
float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;
float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;
float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;
float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;
float score = scores_[cls_id][idx];
// draw bounding box
cv::rectangle(img, cv::Point(x1, y1), cv::Point(x2, y2),
cv::Scalar(255, 0, 0), line_size_);
// draw labels
std::string text = class_names_[cls_id] + ": " +
std::to_string(static_cast<int>(score * 100)) + "%";
cv::putText(img, text, cv::Point(x1, y1 - 5),
cv::FONT_HERSHEY_SIMPLEX, font_size_,
cv::Scalar(0, 0, 255), font_thickness_, cv::LINE_AA);
}
}
}
#endif
//Print test results
PrintResults();
}
(11) Complete the private PrintResults() function
There is only one leftPrintResult
Function, there is nothing to say about this function. You only need to use a For loop to normalize the output results of the printed model. You only need to knowindices_
The data in is arranged by category (cls_id
) stored, each category contains the index of all detection frames filtered by NMS under the category,So the complete code is as follows:
//Print detection results implementation
void BPU_Detect::PrintResults() const {
//Print the overall information of the test results
int total_detections = 0;
for(int cls_id = 0; cls_id < classes_num_; cls_id++) {
total_detections += indices_[cls_id].size();
}
std::cout << "\n============ Detection Results ============" << std::endl;
std::cout << "Total detections: " << total_detections << std::endl;
for(int cls_id = 0; cls_id < classes_num_; cls_id++) {
if(!indices_[cls_id].empty()) {
std::cout << "\nClass: " << class_names_[cls_id] << std::endl;
std::cout << "Number of detections: " << indices_[cls_id].size() << std::endl;
std::cout << "Details:" << std::endl;
for(size_t i = 0; i < indices_[cls_id].size(); i++) {
int idx = indices_[cls_id][i];
float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;
float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;
float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;
float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;
float score = scores_[cls_id][idx];
//Print detailed information of each detection frame
std::cout << " Detection " << i + 1 << ":" << std::endl;
std::cout << " Position: (" << x1 << ", " << y1 << ") to (" << x2 << ", " << y2 << ")" << std:: endl;
std::cout << " Confidence: " << std::fixed << std::setprecision(2) << score * 100 << "%" << std::endl;
}
}
}
std::cout << "========================================\n" << std::endl;
}
At this point we have completed all the private auxiliary functions and can start completing the three public functions! ! !
(12) Complete the public Init() function
We first complete our initialization function. In the initialization phase, we only need to load the model and obtain and check the model information, so we directly call ourLoadModel
function sumGetModelInfo
function
if(!LoadModel()) {
std::cout << "Failed to load model!" << std::endl;
return false;
}
if(!GetModelInfo()) {
std::cout << "Failed to get model info!" << std::endl;
return false;
}
Finally, we add the initialization flag and time output to complete our initialization function! ! ! The complete code is as follows:
// Initialization function implementation
bool BPU_Detect::Init() {
if(is_initialized_) {
std::cout << "Already initialized!" << std::endl;
return true;
}
auto init_start = std::chrono::high_resolution_clock::now();
if(!LoadModel()) {
std::cout << "Failed to load model!" << std::endl;
return false;
}
if(!GetModelInfo()) {
std::cout << "Failed to get model info!" << std::endl;
return false;
}
is_initialized_ = true;
auto init_end = std::chrono::high_resolution_clock::now();
float init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;
std::cout << "\n============ Model Loading Time ============" << std::endl;
std::cout << "Total init time: " << std::fixed << std::setprecision(2) << init_time << " ms" << std::endl;
std::cout << "==========================================\n " << std::endl;
return true;
}
(13) Complete the public Detect() function
Then we complete ourDetect
To detect the function, we first check whether it is successfully initialized:
if(!is_initialized_) {
std::cout << "Please initialize first!" << std::endl;
return false;
}
Then we call in turnPreProcess
preprocessing function,Inference
inference function andPostProcess
The post-processing function also calls ourDrawResults
Just function:
if(!PreProcess(input_img)) {
return false;
}
if(!Inference()) {
return false;
}
if(!PostProcess()) {
return false;
}
DrawResults(output_img);
Finally, we add the output of time to complete our initialization function! ! ! The complete code is as follows:
// Detection function implementation
bool BPU_Detect::Detect(const cv::Mat& input_img, cv::Mat& output_img) {
if(!is_initialized_) {
std::cout << "Please initialize first!" << std::endl;
return false;
}
auto total_start = std::chrono::high_resolution_clock::now();
#if ENABLE_DRAW
input_img.copyTo(output_img);
#endif
// Preprocessing time statistics
auto preprocess_start = std::chrono::high_resolution_clock::now();
if(!PreProcess(input_img)) {
return false;
}
auto preprocess_end = std::chrono::high_resolution_clock::now();
float preprocess_time = std::chrono::duration_cast<std::chrono::microseconds>(preprocess_end - preprocess_start).count() / 1000.0f;
//Inference time statistics
auto infer_start = std::chrono::high_resolution_clock::now();
if(!Inference()) {
return false;
}
auto infer_end = std::chrono::high_resolution_clock::now();
float infer_time = std::chrono::duration_cast<std::chrono::microseconds>(infer_end - infer_start).count() / 1000.0f;
// Post-processing time statistics
auto postprocess_start = std::chrono::high_resolution_clock::now();
if(!PostProcess()) {
return false;
}
auto postprocess_end = std::chrono::high_resolution_clock::now();
float postprocess_time = std::chrono::duration_cast<std::chrono::microseconds>(postprocess_end - postprocess_start).count() / 1000.0f;
// Draw result time statistics
auto draw_start = std::chrono::high_resolution_clock::now();
DrawResults(output_img);
auto draw_end = std::chrono::high_resolution_clock::now();
float draw_time = std::chrono::duration_cast<std::chrono::microseconds>(draw_end - draw_start).count() / 1000.0f;
//Total time statistics
auto total_end = std::chrono::high_resolution_clock::now();
float total_time = std::chrono::duration_cast<std::chrono::microseconds>(total_end - total_start).count() / 1000.0f;
//Print time statistics
std::cout << "\n============ Time Statistics ============" << std::endl;
std::cout << "Preprocess time: " << std::fixed << std::setprecision(2) << preprocess_time << " ms" << std::endl;
std::cout << "Inference time: " << std::fixed << std::setprecision(2) << infer_time << " ms" << std::endl;
std::cout << "Postprocess time: " << std::fixed << std::setprecision(2) << postprocess_time << " ms" << std::endl;
std::cout << "Draw time: " << std::fixed << std::setprecision(2) << draw_time << " ms" << std::endl;
std::cout << "Total time: " << std::fixed << std::setprecision(2) << total_time << " ms" << std::endl;
std::cout << "FPS: " << std::fixed << std::setprecision(2) << 1000.0f / total_time << std::endl;
std::cout << "======================================\n" << std::endl;
return true;
}
(14) Complete the public Release() function
The last thing we need to complete is our resource release function. We first check whether our function has been initialized. If not, there is no need to release the resource:
if(!is_initialized_) {
return true;
}
Then we check whether our reasoning task is over. If it is not over, we need to use ithbDNNReleaseTask
To release the inference task, the explanation of this API is as follows,
/**
* Release a task and its related resources. If the task has not been executed then it will be canceled,
* and if the task has not been finished then it will be stopped.
* This interface will return immediately, and all operations will run in the background
* @param[in] taskHandle: pointer to the task
* @return 0 if success, return defined error code otherwise
*/
int32_t hbDNNReleaseTask(hbDNNTaskHandle_t taskHandle);
So our code only needs to call this function and set the task handle pointer to empty
if(task_handle_) {
hbDNNReleaseTask(task_handle_);
task_handle_ = nullptr;
}
Finally we usehbSysFreeMem
API to release input, output and model memory in sequence:
/**
* Free mem
* @param[in] mem
* @return 0 if success, return defined error code otherwise
*/
int32_t hbSysFreeMem(hbSysMem *mem);
// Release input memory
if(input_tensor_.sysMem[0].virAddr) {
hbSysFreeMem(&(input_tensor_.sysMem[0]));
}
// Release output memory
for(int i = 0; i < 3; i++) {
if(output_tensors_ && output_tensors_[i].sysMem[0].virAddr) {
hbSysFreeMem(&(output_tensors_[i].sysMem[0]));
}
}
if(output_tensors_) {
delete[] output_tensors_;
output_tensors_ = nullptr;
}
// release model
if(packed_dnn_handle_) {
hbDNNRelease(packed_dnn_handle_);
packed_dnn_handle_ = nullptr;
}
Finally, we add some details and complete our resource release function! ! ! The complete code is as follows:
//Release resource implementation
bool BPU_Detect::Release() {
if(!is_initialized_) {
return true;
}
// Release task
if(task_handle_) {
hbDNNReleaseTask(task_handle_);
task_handle_ = nullptr;
}
try {
// Release input memory
if(input_tensor_.sysMem[0].virAddr) {
hbSysFreeMem(&(input_tensor_.sysMem[0]));
}
// Release output memory
for(int i = 0; i < 3; i++) {
if(output_tensors_ && output_tensors_[i].sysMem[0].virAddr) {
hbSysFreeMem(&(output_tensors_[i].sysMem[0]));
}
}
if(output_tensors_) {
delete[] output_tensors_;
output_tensors_ = nullptr;
}
// release model
if(packed_dnn_handle_) {
hbDNNRelease(packed_dnn_handle_);
packed_dnn_handle_ = nullptr;
}
} catch(const std::exception& e) {
std::cout << "Exception during release: " << () << std::endl;
}
is_initialized_ = false;
return true;
}
(15) Implement the Main function
The tutorial is coming to an end here. Next, we only need to implement the logic of calling the class and then reasoning to complete our teaching in this section.The current code logic has not been optimized, and the reasoning has not reached the best performance. Please look forward to the specific optimization tutorial to be released in the next year! ! !
To use this detection class is actually very simple. We only need to create an instance of the detector, and then perform initialization operations on the detection class. Then we only need to input the picture or frame to be detected into our()
Just use the example, and finally release the resources! ! !
BPU_Detect detector;
// initialization
if (!()) {
std::cout << "Failed to initialize detector" << std::endl;
return -1;
}
if (!(input_img, output_img)) {
std::cout << "Detection failed" << std::endl;
return -1;
}
// Release resources
();
Remember the single image and real-time detection macro definitions we added above? We add the judgment of this macro definition and some details to the main function. The complete code is as follows:
int main() {
//Create detector instance
BPU_Detect detector;
// initialization
if (!()) {
std::cout << "Failed to initialize detector" << std::endl;
return -1;
}
#if DETECT_MODE == 0
//Single picture detection mode
std::cout << "Single image detection mode" << std::endl;
//Read test image
cv::Mat input_img = cv::imread("/path/to/img");
if (input_img.empty()) {
std::cout << "Failed to load image" << std::endl;
return -1;
}
//Perform detection
cv::Mat output_img;
#if ENABLE_DRAW
if (!(input_img, output_img)) {
std::cout << "Detection failed" << std::endl;
return -1;
}
// save results
cv::imwrite("cpp_result.jpg", output_img);
#else
if (!(input_img, output_img)) {
std::cout << "Detection failed" << std::endl;
return -1;
}
#endif
#else
// Real-time detection mode
std::cout << "Real-time detection mode" << std::endl;
//Open camera
cv::VideoCapture cap(0);
if (!()) {
std::cout << "Failed to open camera" << std::endl;
return -1;
}
cv::Mat frame, output_frame;
while (true) {
// read a frame
cap >> frame;
if (()) {
std::cout << "Failed to read frame" << std::endl;
break;
}
//Execute detection
if (!(frame, output_frame)) {
std::cout << "Detection failed" << std::endl;
break;
}
#if ENABLE_DRAW
//display results
cv::imshow("Real-time Detection", output_frame);
// Press 'q' to exit
if (cv::waitKey(1) == 'q') {
break;
}
#endif
}
#if ENABLE_DRAW
// Release the camera
();
cv::destroyAllWindows();
#endif
#endif
// Release resources
();
return 0;
}
The complete code is for reference only
// Standard C++ library
#include <iostream> //Input and output streams
#include <vector> // vector container
#include <algorithm> // algorithm library
#include <chrono> // Time related functions
#include <iomanip> // Input and output format control
// OpenCV library
#include <opencv2/> // OpenCV main header file
#include <opencv2/dnn/> // OpenCV deep learning module
// Horizon RDK BPU API
#include "dnn/hb_dnn.h" //BPU basic functions
#include "dnn/hb_dnn_ext.h" // BPU extension function
#include "dnn/plugin/hb_dnn_layer.h" // BPU layer definition
#include "dnn/plugin/hb_dnn_plugin.h" // BPU plug-in
#include "dnn/hb_sys.h" // BPU system functions
// Error checking macro definition
#define RDK_CHECK_SUCCESS(value, errmsg) \
do \
{ \
auto ret_code = value; \
if (ret_code != 0) \
{ \
std::cout << errmsg << ", error code:" << ret_code; \
return ret_code; \
} \
} while (0);
//Default parameter definitions related to models and detection
#define DEFAULT_MODEL_PATH "/root/Deep_Learning/YOLOv5/models/tennis_detect_640x640_bayese_.bin" //Default model path
#define DEFAULT_CLASSES_NUM 1 //Default number of categories
#define CLASSES_LIST "tennis_ball" // Category name
#define DEFAULT_NMS_THRESHOLD 0.45f // Non-maximum suppression threshold
#define DEFAULT_SCORE_THRESHOLD 0.25f // Confidence threshold
#define DEFAULT_NMS_TOP_K 300 //The maximum number of frames reserved by NMS
#define DEFAULT_FONT_SIZE 1.0f // Drawing text size
#define DEFAULT_FONT_THICKNESS 1.0f // Draw text thickness
#define DEFAULT_LINE_SIZE 2.0f // Draw line thickness
//Run mode selection
#define DETECT_MODE 0 // Detection mode: 0-single picture, 1-real-time detection
#define ENABLE_DRAW 0 // Drawing switch: 0-disable, 1-enable
#define LOAD_FROM_DDR 1 //Model loading method: 0-load from file, 1-load from memory
// Feature map scale definition (based on the multiple relationship of the input size)
#define H_8 (input_h_ / 8) // 1/8 of the input height
#define W_8 (input_w_ / 8) // 1/8 of the input width
#define H_16 (input_h_ / 16) // 1/16 of the input height
#define W_16 (input_w_ / 16) // 1/16 of the input width
#define H_32 (input_h_ / 32) // 1/32 of the input height
#define W_32 (input_w_ / 32) // 1/32 of the input width
// BPU target detection class
class BPU_Detect {
public:
//Constructor: initialize detector parameters
// @param model_path: model file path
// @param classes_num: Number of detection categories
// @param nms_threshold: NMS threshold
// @param score_threshold: Confidence threshold
// @param nms_top_k: The maximum number of frames retained by NMS
BPU_Detect(const std::string& model_path = DEFAULT_MODEL_PATH,
int classes_num = DEFAULT_CLASSES_NUM,
float nms_threshold = DEFAULT_NMS_THRESHOLD,
float score_threshold = DEFAULT_SCORE_THRESHOLD,
int nms_top_k = DEFAULT_NMS_TOP_K);
// Destructor: release resources
~BPU_Detect();
//Main functional interface
bool Init(); // Initialize BPU and model
bool Detect(const cv::Mat& input_img, cv::Mat& output_img); //Perform target detection
bool Release(); // Release all resources
private:
// Internal utility function
bool LoadModel(); // Load model file
bool GetModelInfo(); // Get the input and output information of the model
bool PreProcess(const cv::Mat& input_img); // Image preprocessing (resize and format conversion)
bool Inference(); //Perform model inference
bool PostProcess(); // Post-processing (NMS, etc.)
void DrawResults(cv::Mat& img); // Draw detection results on the image
void PrintResults() const; // Print detection results to the console
// Feature map processing auxiliary function
// @param output_tensor: output tensor
// @param height, width: feature map size
// @param anchors: anchor boxes corresponding to the scale
// @param conf_thres_raw: original confidence threshold
void ProcessFeatureMap(hbDNNTensor& output_tensor,
int height, int width,
const std::vector<std::pair<double, double>>& anchors,
float conf_thres_raw);
//Member variables (arranged according to constructor initialization order)
std::string model_path_; //Model file path
int classes_num_; // Number of categories
float nms_threshold_; // NMS threshold
float score_threshold_; // Confidence threshold
int nms_top_k_; //The maximum number of frames retained by NMS
bool is_initialized_; // Initialization status flag
float font_size_; // draw text size
float font_thickness_; // draw text thickness
float line_size_; // draw line thickness
// BPU related variables
hbPackedDNNHandle_t packed_dnn_handle_; // Packed model handle
hbDNNHandle_t dnn_handle_; // Model handle
const char* model_name_; // model name
// Input and output tensors
hbDNNTensor input_tensor_; //Input tensor
hbDNNTensor* output_tensors_; // Output tensor array
hbDNNTensorProperties input_properties_; //Input tensor properties
//Task related
hbDNNTaskHandle_t task_handle_; // Inference task handle
//Model input parameters
int input_h_; //Input height
int input_w_; // input width
//Storage of detection results
std::vector<std::vector<cv::Rect2d>> bboxes_; // Bounding boxes for each category
std::vector<std::vector<float>> scores_; // Score for each category
std::vector<std::vector<int>> indices_; // Index after NMS
//Image processing parameters
float x_scale_; //X-direction scaling ratio
float y_scale_; // Y direction scaling ratio
int x_shift_; // X direction offset
int y_shift_; // Y direction offset
cv::Mat resized_img_; // Scaled image
// YOLOv5 anchors information
std::vector<std::pair<double, double>> s_anchors_; // small target anchors
std::vector<std::pair<double, double>> m_anchors_; // Target anchors
std::vector<std::pair<double, double>> l_anchors_; // Large target anchors
// Output processing
int output_order_[3]; // Output order mapping
std::vector<std::string> class_names_; // Category name list
};
//Constructor implementation
BPU_Detect::BPU_Detect(const std::string& model_path,
int classes_num,
float nms_threshold,
float score_threshold,
int nms_top_k)
: model_path_(model_path),
classes_num_(classes_num),
nms_threshold_(nms_threshold),
score_threshold_(score_threshold),
nms_top_k_(nms_top_k),
is_initialized_(false),
font_size_(DEFAULT_FONT_SIZE),
font_thickness_(DEFAULT_FONT_THICKNESS),
line_size_(DEFAULT_LINE_SIZE) {
//Initialize category name
class_names_ = {CLASSES_LIST};
//Initialize anchors
std::vector<float> anchors = {10.0, 13.0, 16.0, 30.0, 33.0, 23.0,
30.0, 61.0, 62.0, 45.0, 59.0, 119.0,
116.0, 90.0, 156.0, 198.0, 373.0, 326.0};
//Set small, medium, large anchors
for(int i = 0; i < 3; i++) {
s_anchors_.push_back({anchors[i*2], anchors[i*2+1]});
m_anchors_.push_back({anchors[i*2+6], anchors[i*2+7]});
l_anchors_.push_back({anchors[i*2+12], anchors[i*2+13]});
}
}
// Destructor implementation
BPU_Detect::~BPU_Detect() {
if(is_initialized_) {
Release();
}
}
// Initialization function implementation
bool BPU_Detect::Init() {
if(is_initialized_) {
std::cout << "Already initialized!" << std::endl;
return true;
}
auto init_start = std::chrono::high_resolution_clock::now();
if(!LoadModel()) {
std::cout << "Failed to load model!" << std::endl;
return false;
}
if(!GetModelInfo()) {
std::cout << "Failed to get model info!" << std::endl;
return false;
}
is_initialized_ = true;
auto init_end = std::chrono::high_resolution_clock::now();
float init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;
std::cout << "\n============ Model Loading Time ============" << std::endl;
std::cout << "Total init time: " << std::fixed << std::setprecision(2) << init_time << " ms" << std::endl;
std::cout << "==========================================\n " << std::endl;
return true;
}
//Load model implementation
bool BPU_Detect::LoadModel() {
// Record the starting point of the total loading time
auto load_start = std::chrono::high_resolution_clock::now();
#if LOAD_FROM_DDR
// Used to record the time of reading model data from the file
float read_time = 0.0f;
#endif
// Used to record the time of model initialization
float init_time = 0.0f;
#if LOAD_FROM_DDR
// =============== Read model from file to memory ===============
auto read_start = std::chrono::high_resolution_clock::now();
//Open model file
FILE* fp = fopen(model_path_.c_str(), "rb");
if (!fp) {
std::cout << "Failed to open model file: " << model_path_ << std::endl;
return false;
}
// Get file size:
fseek(fp, 0, SEEK_END); // 1. Move the file pointer to the end
size_t model_size = static_cast<size_t>(ftell(fp)); // 2. Get the current position (i.e. file size)
fseek(fp, 0, SEEK_SET); // 3. Reset the file pointer to the beginning
// Allocate memory for model data
void* model_data = malloc(model_size);
if (!model_data) {
std::cout << "Failed to allocate memory for model data" << std::endl;
fclose(fp);
return false;
}
//Read model data into memory
size_t read_size = fread(model_data, 1, model_size, fp);
fclose(fp);
// Calculate file reading time
auto read_end = std::chrono::high_resolution_clock::now();
read_time = std::chrono::duration_cast<std::chrono::microseconds>(read_end - read_start).count() / 1000.0f;
// Verify that the file has been read completely
if (read_size != model_size) {
std::cout << "Failed to read model data, expected " << model_size
<< " bytes, but got " << read_size << " bytes" << std::endl;
free(model_data);
return false;
}
// =============== Initialize model from memory ===============
auto init_start = std::chrono::high_resolution_clock::now();
// Prepare model data array and length array
const void* model_data_array[] = {model_data};
int32_t model_data_length[] = {static_cast<int32_t>(model_size)};
// Initialize the model from memory using the BPU API
RDK_CHECK_SUCCESS(
hbDNNInitializeFromDDR(&packed_dnn_handle_, model_data_array, model_data_length, 1),
"Initialize model from DDR failed");
// Release temporarily allocated memory
free(model_data);
// Calculate model initialization time
auto init_end = std::chrono::high_resolution_clock::now();
init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;
#else
// =============== Initialize the model directly from the file ===============
auto init_start = std::chrono::high_resolution_clock::now();
// Get the model file path
const char* model_file_name = model_path_.c_str();
// Initialize model from file using BPU API
RDK_CHECK_SUCCESS(
hbDNNInitializeFromFiles(&packed_dnn_handle_, &model_file_name, 1),
"Initialize model from file failed");
// Calculate model initialization time
auto init_end = std::chrono::high_resolution_clock::now();
init_time = std::chrono::duration_cast<std::chrono::microseconds>(init_end - init_start).count() / 1000.0f;
#endif
// =============== Calculate and print total time statistics ===============
auto load_end = std::chrono::high_resolution_clock::now();
float total_load_time = std::chrono::duration_cast<std::chrono::microseconds>(load_end - load_start).count() / 1000.0f;
//Print time statistics
std::cout << "\n============ Model Loading Details ============" << std::endl;
#if LOAD_FROM_DDR
std::cout << "File reading time: " << std::fixed << std::setprecision(2) << read_time << " ms" << std::endl;
#endif
std::cout << "Model init time: " << std::fixed << std::setprecision(2) << init_time << " ms" << std::endl;
std::cout << "Total loading time: " << std::fixed << std::setprecision(2) << total_load_time << " ms" << std::endl;
std::cout << "============================================== \n" << std::endl;
return true;
}
// Get model information implementation
bool BPU_Detect::GetModelInfo() {
// Get the list of model names
const char** model_name_list;
int model_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle_),
"hbDNNGetModelNameList failed");
if(model_count > 1) {
std::cout << "Model count: " << model_count << std::endl;
std::cout << "Please check the model count!" << std::endl;
return false;
}
model_name_ = model_name_list[0];
// Get model handle
RDK_CHECK_SUCCESS(
hbDNNGetModelHandle(&dnn_handle_, packed_dnn_handle_, model_name_),
"hbDNNGetModelHandle failed");
// Get input information
int32_t input_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetInputCount(&input_count, dnn_handle_),
"hbDNNGetInputCount failed");
RDK_CHECK_SUCCESS(
hbDNNGetInputTensorProperties(&input_properties_, dnn_handle_, 0),
"hbDNNGetInputTensorProperties failed");
if(input_count > 1){
std::cout << "Model input node is greater than 1, please check!" << std::endl;
return false;
}
if(input_properties_. == 4){
std::cout << "Input tensor type: HB_DNN_IMG_TYPE_NV12" << std::endl;
}
else{
std::cout << "The input tensor type is not HB_DNN_IMG_TYPE_NV12, please check!" << std::endl;
return false;
}
if(input_properties_.tensorType == 1){
std::cout << "Input tensor data layout: HB_DNN_LAYOUT_NCHW" << std::endl;
}
else{
std::cout << "The input tensor data layout is not HB_DNN_LAYOUT_NCHW, please check!" << std::endl;
return false;
}
// Get input size
input_h_ = input_properties_.[2];
input_w_ = input_properties_.[3];
if (input_properties_. == 4)
{
std::cout << "The input size is: (" << input_properties_.[0];
std::cout << ", " << input_properties_.[1];
std::cout << ", " << input_h_;
std::cout << ", " << input_w_ << ")" << std::endl;
}
else
{
std::cout << "The input size is not (1,3,640,640), please check!" << std::endl;
return false;
}
// Get the output information and adjust the output order
int32_t output_count = 0;
RDK_CHECK_SUCCESS(
hbDNNGetOutputCount(&output_count, dnn_handle_),
"hbDNNGetOutputCount failed");
//Allocate output tensor memory
output_tensors_ = new hbDNNTensor[output_count];
// =============== Adjust output header sequence mapping ===============
// YOLOv5 has 3 output heads, corresponding to 3 different scales of feature maps.
// Need to ensure that the output order is: small target (8x downsampling) -> medium target (16x downsampling) -> large target (32x downsampling)
//Initialize default order
output_order_[0] = 0; // Default 1st output
output_order_[1] = 1; // Default 2nd output
output_order_[2] = 2; // Default 3rd output
// Define the desired output feature map size and number of channels
int32_t expected_shapes[3][3] = {
{H_8, W_8, 3 * (5 + classes_num_)}, // Small target feature map: H/8 x W/8
{H_16, W_16, 3 * (5 + classes_num_)}, // Medium target feature map: H/16 x W/16
{H_32, W_32, 3 * (5 + classes_num_)} // Large target feature map: H/32 x W/32
};
// Iterate through each desired output scale
for(int i = 0; i < 3; i++) {
// Traverse the actual output nodes
for(int j = 0; j < 3; j++) {
// Get the properties of the current output node
hbDNNTensorProperties output_properties;
RDK_CHECK_SUCCESS(
hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, j),
"Get output tensor properties failed");
// Get the actual feature map size and number of channels
int32_t actual_h = output_properties.[1];
int32_t actual_w = output_properties.[2];
int32_t actual_c = output_properties.[3];
// If actual size and number of channels match expected
if(actual_h == expected_shapes[i][0] &&
actual_w == expected_shapes[i][1] &&
actual_c == expected_shapes[i][2]) {
//Record the correct output sequence
output_order_[i] = j;
break;
}
}
}
//Print out sequence mapping information
std::cout << "\n============ Output Order Mapping ============" << std::endl;
std::cout << "Small object (1/" << 8 << "): output[" << output_order_[0] << "]" << std::endl;
std::cout << "Medium object (1/" << 16 << "): output[" << output_order_[1] << "]" << std::endl;
std::cout << "Large object (1/" << 32 << "): output[" << output_order_[2] << "]" << std::endl;
std::cout << "==========================================\ n" << std::endl;
return true;
}
// Detection function implementation
bool BPU_Detect::Detect(const cv::Mat& input_img, cv::Mat& output_img) {
if(!is_initialized_) {
std::cout << "Please initialize first!" << std::endl;
return false;
}
auto total_start = std::chrono::high_resolution_clock::now();
#if ENABLE_DRAW
input_img.copyTo(output_img);
#endif
// Preprocessing time statistics
auto preprocess_start = std::chrono::high_resolution_clock::now();
if(!PreProcess(input_img)) {
return false;
}
auto preprocess_end = std::chrono::high_resolution_clock::now();
float preprocess_time = std::chrono::duration_cast<std::chrono::microseconds>(preprocess_end - preprocess_start).count() / 1000.0f;
//Inference time statistics
auto infer_start = std::chrono::high_resolution_clock::now();
if(!Inference()) {
return false;
}
auto infer_end = std::chrono::high_resolution_clock::now();
float infer_time = std::chrono::duration_cast<std::chrono::microseconds>(infer_end - infer_start).count() / 1000.0f;
// Post-processing time statistics
auto postprocess_start = std::chrono::high_resolution_clock::now();
if(!PostProcess()) {
return false;
}
auto postprocess_end = std::chrono::high_resolution_clock::now();
float postprocess_time = std::chrono::duration_cast<std::chrono::microseconds>(postprocess_end - postprocess_start).count() / 1000.0f;
// Draw result time statistics
auto draw_start = std::chrono::high_resolution_clock::now();
DrawResults(output_img);
auto draw_end = std::chrono::high_resolution_clock::now();
float draw_time = std::chrono::duration_cast<std::chrono::microseconds>(draw_end - draw_start).count() / 1000.0f;
//Total time statistics
auto total_end = std::chrono::high_resolution_clock::now();
float total_time = std::chrono::duration_cast<std::chrono::microseconds>(total_end - total_start).count() / 1000.0f;
//Print time statistics
std::cout << "\n============ Time Statistics ============" << std::endl;
std::cout << "Preprocess time: " << std::fixed << std::setprecision(2) << preprocess_time << " ms" << std::endl;
std::cout << "Inference time: " << std::fixed << std::setprecision(2) << infer_time << " ms" << std::endl;
std::cout << "Postprocess time: " << std::fixed << std::setprecision(2) << postprocess_time << " ms" << std::endl;
std::cout << "Draw time: " << std::fixed << std::setprecision(2) << draw_time << " ms" << std::endl;
std::cout << "Total time: " << std::fixed << std::setprecision(2) << total_time << " ms" << std::endl;
std::cout << "FPS: " << std::fixed << std::setprecision(2) << 1000.0f / total_time << std::endl;
std::cout << "======================================\n" << std::endl;
return true;
}
// Preprocessing implementation
bool BPU_Detect::PreProcess(const cv::Mat& input_img) {
//Use letterbox method for preprocessing
x_scale_ = std::min(1.0f * input_h_ / input_img.rows, 1.0f * input_w_ / input_img.cols);
y_scale_ = x_scale_;
int new_w = input_img.cols * x_scale_;
x_shift_ = (input_w_ - new_w) / 2;
int x_other = input_w_ - new_w - x_shift_;
int new_h = input_img.rows * y_scale_;
y_shift_ = (input_h_ - new_h) / 2;
int y_other = input_h_ - new_h - y_shift_;
cv::resize(input_img, resized_img_, cv::Size(new_w, new_h));
cv::copyMakeBorder(resized_img_, resized_img_, y_shift_, y_other,
x_shift_, x_other, cv::BORDER_CONSTANT, cv::Scalar(127, 127, 127));
//Convert to NV12 format
cv::Mat yuv_mat;
cv::cvtColor(resized_img_, yuv_mat, cv::COLOR_BGR2YUV_I420);
// Prepare to input tensor
hbSysAllocCachedMem(&input_tensor_.sysMem[0], int(3 * input_h_ * input_w_ / 2));
uint8_t* yuv = yuv_mat.ptr<uint8_t>();
uint8_t* ynv12 = (uint8_t*)input_tensor_.sysMem[0].virAddr;
// Calculate the height and width of the UV part, and the size of the Y part
int uv_height = input_h_ / 2;
int uv_width = input_w_ / 2;
int y_size = input_h_ * input_w_;
//Copy the Y component data to the input tensor
memcpy(ynv12, yuv, y_size);
// Get the UV component position in NV12 format
uint8_t* nv12 = ynv12 + y_size;
uint8_t* u_data = yuv + y_size;
uint8_t* v_data = u_data + uv_height * uv_width;
//Write U and V components alternately into NV12 format
for(int i = 0; i < uv_width * uv_height; i++) {
*nv12++ = *u_data++;
*nv12++ = *v_data++;
}
//Clear the memory cache to ensure that the data is ready for use by the model
hbSysFlushMem(&input_tensor_.sysMem[0], HB_SYS_MEM_CACHE_CLEAN);//Clear the cache to ensure data synchronization
return true;
}
//Inference implementation
bool BPU_Detect::Inference() {
//Initialize the task handle to nullptr
task_handle_ = nullptr;
//Initialize input tensor attributes
input_tensor_.properties = input_properties_;
// Get the output tensor attributes
for(int i = 0; i < 3; i++) {
hbDNNTensorProperties output_properties;
RDK_CHECK_SUCCESS(
hbDNNGetOutputTensorProperties(&output_properties, dnn_handle_, i),
"Get output tensor properties failed");
output_tensors_[i].properties = output_properties;
// Allocate memory for output
int out_aligned_size = output_properties.alignedByteSize;
RDK_CHECK_SUCCESS(
hbSysAllocCachedMem(&output_tensors_[i].sysMem[0], out_aligned_size),
"Allocate output memory failed");
}
hbDNNInferCtrlParam infer_ctrl_param;
HB_DNN_INITIALIZE_INFER_CTRL_PARAM(&infer_ctrl_param);
RDK_CHECK_SUCCESS(
hbDNNInfer(&task_handle_, &output_tensors_, &input_tensor_, dnn_handle_, &infer_ctrl_param),
"Model inference failed");
RDK_CHECK_SUCCESS(
hbDNNWaitTaskDone(task_handle_, 0),
"Wait task done failed");
return true;
}
// Post-processing implementation
bool BPU_Detect::PostProcess() {
//Clear the last result
bboxes_.clear();
scores_.clear();
indices_.clear();
//Resize
bboxes_.resize(classes_num_);
scores_.resize(classes_num_);
indices_.resize(classes_num_);
float conf_thres_raw = -log(1 / score_threshold_ - 1);
// Process the output of three scales
ProcessFeatureMap(output_tensors_[0], H_8, W_8, s_anchors_, conf_thres_raw);
ProcessFeatureMap(output_tensors_[1], H_16, W_16, m_anchors_, conf_thres_raw);
ProcessFeatureMap(output_tensors_[2], H_32, W_32, l_anchors_, conf_thres_raw);
// Perform NMS for each category
for(int i = 0; i < classes_num_; i++) {
cv::dnn::NMSBoxes(bboxes_[i], scores_[i], score_threshold_,
nms_threshold_, indices_[i], , nms_top_k_);
}
return true;
}
//Print detection results implementation
void BPU_Detect::PrintResults() const {
//Print the overall information of the test results
int total_detections = 0;
for(int cls_id = 0; cls_id < classes_num_; cls_id++) {
total_detections += indices_[cls_id].size();
}
std::cout << "\n============ Detection Results ============" << std::endl;
std::cout << "Total detections: " << total_detections << std::endl;
for(int cls_id = 0; cls_id < classes_num_; cls_id++) {
if(!indices_[cls_id].empty()) {
std::cout << "\nClass: " << class_names_[cls_id] << std::endl;
std::cout << "Number of detections: " << indices_[cls_id].size() << std::endl;
std::cout << "Details:" << std::endl;
for(size_t i = 0; i < indices_[cls_id].size(); i++) {
int idx = indices_[cls_id][i];
float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;
float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;
float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;
float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;
float score = scores_[cls_id][idx];
//Print detailed information of each detection frame
std::cout << " Detection " << i + 1 << ":" << std::endl;
std::cout << " Position: (" << x1 << ", " << y1 << ") to (" << x2 << ", " << y2 << ")" << std:: endl;
std::cout << " Confidence: " << std::fixed << std::setprecision(2) << score * 100 << "%" << std::endl;
}
}
}
std::cout << "========================================\n" << std::endl;
}
// Drawing result implementation
void BPU_Detect::DrawResults(cv::Mat& img) {
#if ENABLE_DRAW
for(int cls_id = 0; cls_id < classes_num_; cls_id++) {
if(!indices_[cls_id].empty()) {
for(size_t i = 0; i < indices_[cls_id].size(); i++) {
int idx = indices_[cls_id][i];
float x1 = (bboxes_[cls_id][idx].x - x_shift_) / x_scale_;
float y1 = (bboxes_[cls_id][idx].y - y_shift_) / y_scale_;
float x2 = x1 + (bboxes_[cls_id][idx].width) / x_scale_;
float y2 = y1 + (bboxes_[cls_id][idx].height) / y_scale_;
float score = scores_[cls_id][idx];
// draw bounding box
cv::rectangle(img, cv::Point(x1, y1), cv::Point(x2, y2),
cv::Scalar(255, 0, 0), line_size_);
// draw labels
std::string text = class_names_[cls_id] + ": " +
std::to_string(static_cast<int>(score * 100)) + "%";
cv::putText(img, text, cv::Point(x1, y1 - 5),
cv::FONT_HERSHEY_SIMPLEX, font_size_,
cv::Scalar(0, 0, 255), font_thickness_, cv::LINE_AA);
}
}
}
#endif
//Print test results
PrintResults();
}
// Feature map processing auxiliary function
void BPU_Detect::ProcessFeatureMap(hbDNNTensor& output_tensor,
int height, int width,
const std::vector<std::pair<double, double>>& anchors,
float conf_thres_raw) {
// Check the quantization type
if (output_tensor. != NONE) {
std::cout << "Output tensor quantization type should be NONE!" << std::endl;
return;
}
// refresh memory
hbSysFlushMem(&output_tensor.sysMem[0], HB_SYS_MEM_CACHE_INVALIDATE);
//Get the output data pointer
auto* raw_data = reinterpret_cast<float*>(output_tensor.sysMem[0].virAddr);
// Traverse each position of the feature map
for(int h = 0; h < height; h++) {
for(int w = 0; w < width; w++) {
for(const auto& anchor : anchors) {
// Get prediction data for the current location
float* cur_raw = raw_data;
raw_data += (5 + classes_num_);
// Conditional probability filtering
if(cur_raw[4] < conf_thres_raw) continue;
// Find the maximum class probability
int cls_id = 5;
int end = classes_num_ + 5;
for(int i = 6; i < end; i++) {
if(cur_raw[i] > cur_raw[cls_id]) {
cls_id = i;
}
}
// Calculate final score
float score = 1.0f / (1.0f + std::exp(-cur_raw[4])) *
1.0f / (1.0f + std::exp(-cur_raw[cls_id]));
// score filter
if(score < score_threshold_) continue;
cls_id -= 5;
// decode bounding box
float stride = input_h_ / height;
float center_x = ((1.0f / (1.0f + std::exp(-cur_raw[0]))) * 2 - 0.5f + w) * stride;
float center_y = ((1.0f / (1.0f + std::exp(-cur_raw[1]))) * 2 - 0.5f + h) * stride;
float bbox_w = std::pow((1.0f / (1.0f + std::exp(-cur_raw[2]))) * 2, 2) * ;
float bbox_h = std::pow((1.0f / (1.0f + std::exp(-cur_raw[3]))) * 2, 2) * ;
float bbox_x = center_x - bbox_w / 2.0f;
float bbox_y = center_y - bbox_h / 2.0f;
//Save test results
bboxes_[cls_id].push_back(cv::Rect2d(bbox_x, bbox_y, bbox_w, bbox_h));
scores_[cls_id].push_back(score);
}
}
}
}
//Release resource implementation
bool BPU_Detect::Release() {
if(!is_initialized_) {
return true;
}
// Release task
if(task_handle_) {
hbDNNReleaseTask(task_handle_);
task_handle_ = nullptr;
}
try {
// Release input memory
if(input_tensor_.sysMem[0].virAddr) {
hbSysFreeMem(&(input_tensor_.sysMem[0]));
}
// Release output memory
for(int i = 0; i < 3; i++) {
if(output_tensors_ && output_tensors_[i].sysMem[0].virAddr) {
hbSysFreeMem(&(output_tensors_[i].sysMem[0]));
}
}
if(output_tensors_) {
delete[] output_tensors_;
output_tensors_ = nullptr;
}
// release model
if(packed_dnn_handle_) {
hbDNNRelease(packed_dnn_handle_);
packed_dnn_handle_ = nullptr;
}
} catch(const std::exception& e) {
std::cout << "Exception during release: " << () << std::endl;
}
is_initialized_ = false;
return true;
}
//Modify main function
int main() {
//Create detector instance
BPU_Detect detector;
// initialization
if (!()) {
std::cout << "Failed to initialize detector" << std::endl;
return -1;
}
#if DETECT_MODE == 0
//Single picture detection mode
std::cout << "Single image detection mode" << std::endl;
//Read test image
cv::Mat input_img = cv::imread("/root/Deep_Learning/YOLOv5/imgs/tennis_1_frame_0001.jpg");
if (input_img.empty()) {
std::cout << "Failed to load image" << std::endl;
return -1;
}
//Execute detection
cv::Mat output_img;
#if ENABLE_DRAW
if (!(input_img, output_img)) {
std::cout << "Detection failed" << std::endl;
return -1;
}
// save results
cv::imwrite("cpp_result.jpg", output_img);
#else
if (!(input_img, output_img)) {
std::cout << "Detection failed" << std::endl;
return -1;
}
#endif
#else
// Real-time detection mode
std::cout << "Real-time detection mode" << std::endl;
//Open camera
cv::VideoCapture cap(0);
if (!()) {
std::cout << "Failed to open camera" << std::endl;
return -1;
}
cv::Mat frame, output_frame;
while (true) {
// read a frame
cap >> frame;
if (()) {
std::cout << "Failed to read frame" << std::endl;
break;
}
//Execute detection
if (!(frame, output_frame)) {
std::cout << "Detection failed" << std::endl;
break;
}
#if ENABLE_DRAW
//display results
cv::imshow("Real-time Detection", output_frame);
// Press 'q' to exit
if (cv::waitKey(1) == 'q') {
break;
}
#endif
}
#if ENABLE_DRAW
// Release the camera
();
cv::destroyAllWindows();
#endif
#endif
// Release resources
();
return 0;
}