Happy Zhang's Raspberry Pi AI Developer's Guide: (III) Compiling Custom Models into .hef Models for Hailo NPUs

Python Environment Configuration
conversions
quantize
compiling
consultation

In the last blog, it was explored how to use Python and thehailo_model_zoo The pre-compiled models in the Hailo NPU are used to achieve target detection. This blog will provide an in-depth look at how to convert and optimize a user-defined trained model into one that can run efficiently on Hailo NPUs..hef Model.

Python Environment Configuration

In order to compile the custom model as.hef model, which needs to be installedHailo Dataflow Compiler(DFC) tool. Log in to Hailo's website/developer-zone/software-downloadsto find the Python version of the.whl file and follow the steps below to create a virtual environment and install the necessary packages:

conda create -n hailo-model python=3.10 # Creating a Virtual Environment
conda activate hailo-model # Activate the virtual environment
sudo apt install libgraphviz-dev
pip install hailo_dataflow_compiler-3.29.0-py3-none-linux_x86_64.whl # mounting Hailo Dataflow Compiler Python hold or embrace

Convert custom models to.hef The model requires three steps:

Convert Tensorflow or ONNX models to Hailo Archive models (.har）。
commander-in-chief (military).har The model is quantified.
Compile to the Hailo Executable File model (.hef）。

conversions

Both Tensorflow and ONNX models can be converted, and here is an example of how to do it with theyolov8n of the ONNX model as an example, first introducing the package and defining the relevant variables.

from hailo_sdk_client import ClientRunner
import os
import cv2
import numpy as np

input_size = 640 # size of model inputs
chosen_hw_arch = "hailo8l" # Architecture of the Hailo hardware to be used, in this case Hailo-8L.
onnx_model_name = "yolov8n" # model name
onnx_path = "" # Path to the model
hailo_model_har_path = f"{onnx_model_name}_hailo_model.har" # Path to where the converted model is kept
hailo_quantized_har_path = f"{onnx_model_name}_hailo_quantized_model.har" # Save path for quantized model
hailo_model_hef_path = f"{onnx_model_name}.hef" # Save path for compiled model

Then instantiate theClientRunner class and call thetranslate_onnx_model() method to perform the conversion.

runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(model=onnx_path, net_name=onnx_model_name) # commander-in-chief (military) onnx Model to har
runner.save_har(hailo_model_har_path) # Saving the converted model

When the model structure is relatively simple, there is usually no error. When the model structure is more complex, there may be operators that are not supported by Hailo NPU, which may cause the conversion to fail, you can check the datasheet on the official website, or see the link in the reference below. For example, when converting a YOLOv8 model, the following error message will be displayed:

hailo_sdk_client.model_translator.: Parsing failed. The errors found in the graph are:
 UnsupportedShuffleLayerError in op /model.22/dfl/Reshape: Failed to determine type of layer to create in node /model.22/dfl/Reshape
Please try to parse the model again, using these end node names: /model.22/Concat_3

There are two solutions when an error occurs. The first is to use Netron to resolve the error according to the error message. Look at the model structure and modify the original model to remove or replace the operators that are not supported by Hailo NPU. Second, the error message will recommend a solution to bypass the unsupported operators in the conversion, then thetranslate_onnx_model() method then you need to pass additional parameters:

start_node_names: The name of the node in the original model (corresponding to the input of the new model) where the transformation was started.
end_node_names: The name of the node in the original model (corresponding to the output of the new model) where the transformation stopped.
net_input_shapes：start_node_names The size of the input, such as the common[b, c, h, w]。

The names of the nodes can be viewed using Netron, or traversed to print the names of the nodes using the following program.

import onnx

onnx_path = ""
model = (onnx_path)

print("Input Nodes:")
for input in :
    print()
print("Output Nodes:")
for output in :
    print()
print("Nodes:")
for node in :
    print()

Based on the above error message prompt, the node to stop the conversion should be changed to/model.22/Concat_3The revised program is as follows.

hn, npz = runner.translate_onnx_model(model=onnx_path, net_name=onnx_model_name, start_node_names=["images"], end_node_names=["/model.22/Concat_3"], net_input_shapes={"images": [1, 3, input_size, input_size]})

The program executes without reporting errors, but in the last step of the compilation there is a situation where the Hailo NPU does not have enough memory, so let's take a look at the logs that are output during the conversion:

[info] Translation started on ONNX model yolov8n
[info] Restored ONNX model yolov8n (completion time: 00:00:00.06)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.21)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3.2/cv3.2.2/Conv.
...

The log suggests changing the node that stops the conversion to/model.22/cv2.0/cv2.0.2/Conv /model.22/cv3.0/cv3.0.2/Conv /model.22/cv2.1/cv2.1.2/Conv /model.22/cv3.1/cv3.1.2/Conv /model.22/cv2.2/cv2.2.2/Conv /model.22/cv3The Hailo NPU does not have the ability to do NMS operations. That is, the model is cut up before NMS processing, and a check of the Hailo developer forums tells us that the Hailo NPU is not equipped to do NMS operations, and that this part will be run on the CPU.Hailo's GitHub repository provides the name of the node that ends the mainstream model transformation; see the link in the reference below for details. Eventually, the program was modified to:

hn, npz = runner.translate_onnx_model(model=onnx_path, net_name=onnx_model_name, start_node_names=["images"], end_node_names=["/model.22/cv2.0/cv2.0.2/Conv", "/model.22/cv3.0/cv3.0.2/Conv", "/model.22/cv2.1/cv2.1.2/Conv", "/model.22/cv3.1/cv3.1.2/Conv", "/model.22/cv2.2/cv2.2.2/Conv", "/model.22/cv3.2/cv3.2.2/Conv"], net_input_shapes={"images": [1, 3, input_size, input_size]})

quantize

Model quantization (Quantization) is the process of converting the weights and activation values (outputs) in a deep learning model from high-precision floating-point numbers (e.g., float32) to low-precision data types (e.g., int8) to reduce the model's storage requirements, speed up inference, and reduce power consumption, a process that is particularly important for deploying deep learning models into edge devices. Post-training quantization is used here, i.e., quantization is performed directly on the already trained model without retraining or fine-tuning, but may result in some loss of accuracy.

The calibration dataset used in quantization needs to be prepared first. The calibration dataset is mainly used to help determine the quantization parameters to minimize the impact of the quantization process on the model performance. The quality of the calibration dataset directly affects the final performance of the quantization model, and it should cover all data variations as much as possible to ensure that the quantized model has good generalization ability under different conditions. The calibration dataset does not require labels, and it is primarily used to collect statistics on the activation values at each level, such as minimum, maximum, mean, and standard deviation. These statistics are used to determine how best to map floating-point numbers to integers in order to maintain model performance, a process that does not require knowledge of the labels corresponding to the input data, only the distributional properties of the data.

The YOLOv8 model used in this blog was trained using the COCO dataset, which is used as an example for the preparation of the calibration dataset below.

images_path = "data/images" # Dataset Image Path
dataset_output_path = "calib_set.npy" # Save path after processing is complete

images_list = [img_name for img_name in (images_path) if (img_name)[1] in [".jpg", ".png", "bmp"]][:1500] # Get a list of image names
calib_dataset = ((len(images_list), input_size, input_size, 3))  # initialization numpy arrays

for idx, img_name in enumerate(sorted(images_list)):
    img = ((images_path, img_name))
    resized = (img, (input_size, input_size))  # Resize the original image to the size of the model inputs
    calib_dataset[idx,:,:,:]=(resized)
(dataset_output_path, calib_dataset)

Then instantiate theClientRunner class and call theoptimize() Methodology for quantification.

calib_dataset = (dataset_output_path)
runner = ClientRunner(har=hailo_model_har_path)
(calib_dataset) # quantitative model
runner.save_har(hailo_quantized_har_path) # Save the quantized model

There are also scripts that can be added to the quantization process to set the parameters, for examplemodel_optimization_flavor() Setting quantitative levels,resources_param() Set the amount of resources the model can use, etc.hailo_model_zoo The repository provides parameterization scripts for the mainstream models, see the links in the references below. Example programs are listed below.

alls_lines = [
    'model_optimization_flavor(optimization_level=1, compression_level=2)',
    'resources_param(max_control_utilization=0.6, max_compute_utilization=0.6, max_memory_utilization=0.6)',
    'performance_param(fps=5)'
]
runner.load_model_script('\n'.join(alls_lines))
(calib_dataset)

compiling

final usecompile() method completes the compilation of the model.

runner = ClientRunner(har=hailo_quantized_har_path)
compiled_hef = ()
with open(hailo_model_hef_path, "wb") as f:
    (compiled_hef)

The full program is below.

from hailo_sdk_client import ClientRunner
import os
import cv2
import numpy as np

input_size = 640 # Dimensions of the model input
chosen_hw_arch = "hailo8l" # To be used Hailo hardware architecture，Here it is. Hailo-8L
onnx_model_name = "yolov8n" # Name of the model
onnx_path = ""  # The path of the model
hailo_model_har_path = f"{onnx_model_name}_hailo_model.har" # Save path of the converted model
hailo_quantized_har_path = f"{onnx_model_name}_hailo_quantized_model.har" # Save path of the quantized model
hailo_model_hef_path = f"{onnx_model_name}.hef" # Save path for compiled models
images_path = "data/images" # Dataset Image Path

# commander-in-chief (military) onnx Model to har
runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(model=onnx_path, net_name=onnx_model_name, start_node_names=["images"], end_node_names=["/model.22/cv2.0/cv2.0.2/Conv", "/model.22/cv3.0/cv3.0.2/Conv", "/model.22/cv2.1/cv2.1.2/Conv", "/model.22/cv3.1/cv3.1.2/Conv", "/model.22/cv2.2/cv2.2.2/Conv", "/model.22/cv3.2/cv3.2.2/Conv"], net_input_shapes={"images": [1, 3, input_size, input_size]})
runner.save_har(hailo_model_har_path)

# Calibration data set preparation
images_list = [img_name for img_name in (images_path) if (img_name)[1] in [".jpg", ".png", "bmp"]][:1500] # Get a list of image names
calib_dataset = ((len(images_list), input_size, input_size, 3))  # initialization numpy arrays
for idx, img_name in enumerate(sorted(images_list)):
    img = ((images_path, img_name))
    resized = (img, (input_size, input_size))  # 调整原始图像的尺寸为Dimensions of the model input
    calib_dataset[idx,:,:,:]=(resized)

# quantitative model
runner = ClientRunner(har=hailo_model_har_path)
alls_lines = [
    'model_optimization_flavor(optimization_level=1, compression_level=2)',
    'resources_param(max_control_utilization=0.6, max_compute_utilization=0.6, max_memory_utilization=0.6)',
    'performance_param(fps=5)'
]
runner.load_model_script('\n'.join(alls_lines))
(calib_dataset)
runner.save_har(hailo_quantized_har_path)

# Compile to hef
runner = ClientRunner(har=hailo_quantized_har_path)
compiled_hef = ()
with open(hailo_model_hef_path, "wb") as f:
    (compiled_hef)

consultation

Supported operators - Hailo Community：/t/supported-operators/5046/2
hailo_model_zoo - GitHub：/hailo-ai/hailo_model_zoo/tree/master/hailo_model_zoo/cfg/networks
Dataflow Compiler v3.29.0：/developer-zone/documentation/dataflow-compiler-v3-29-0