Raspberry Pi AI Developer's Guide: (II) Real-Time Target Detection with Python and HailoRT by Joyce Zhang

Python Environment Configuration
Implementing Target Detection for USB Cameras
consultation

In the previous blog, the use ofrpicam-apps Configure and run the target detection sample program through a JSON file. While this approach enables effective detection, it limits the ability of developers to utilize the detection results directly in their code. Therefore, in this blog, an in-depth look at how to call a Neural Processing Unit (NPU) with the help of the HailoRT Python API will be used to implement target detection functionality in Python programs.

Python Environment Configuration

Already installed in the previous bloghailo-allThis contains all the necessary components for the Hailo NPU. However, depending on hardware and operating system requirements, a separate driver installation or update may be required. For non-Raspberry Pi devices or when experiencing problems with incompatible driver versions, this can be done by visiting Hailo's website at/developer-zone/software-downloadsand select the appropriate driver for your system to download and install.

For example, if you are using the Ubuntu operating system based on the arm64 architecture and need the 4.19.0 version of the driver, you can download the appropriate PCIe driver packages and HailoRT packages and execute the following commands to complete the installation:

sudo apt purge -y hailo-all # Uninstall the existing integration packages
sudo dpkg -i hailort-pcie-driver_4.19.0_all.deb # install new driver
sudo dpkg -i hailort_4.19.0_arm64.deb # Install HailoRT

In order to call NPU from Python, you also need to install Python libraries. Similarly, you can find the Python version of the library in the official Hailo website..whl file and follow the steps below to create a virtual environment and install the necessary packages:

conda create -n hailort python=3.10 # Creating a Virtual Environment
conda activate hailort # Activate the virtual environment
pip install hailort-4.19.0-cp310-cp310-linux_aarch64.whl # mounting HailoRT Python hold or embrace

You will also need to install OpenCV to process the images. Since OpenCV cannot read the Raspberry Pi's CSI camera, if you need to use it, please install the additionalpicamera2 cap (a poem)rpi-libcamera。

pip install opencv-python
pip install picamera2 rpi-libcamera

Implementing Target Detection for USB Cameras

To make target detection more practical, the live video stream acquired by the camera needs to be used as input and deep learning models need to be applied on each frame to recognize the object. Regardless of whether Hailo-8 is used for target detection or not, the following steps need to be followed to write the code.

Turn on the camera;
Load the target detection model;
Processes the video stream and displays the results.

A basic code framework is provided here, which will be completed step-by-step below.

import cv2

# TODO: loading the model

# Turn on the default camera
cap = (0)

while True.
    # Read the frame
    ret, frame = ()
    if not ret.
        break

    # TODO: perform inference

    # Show frames
    ('Detections', frame)

    # Press 'q' to exit the loop
    if (1) & 0xFF == ord('q').
        break

# Release the camera and close the window
()
()

Let's start with the content of the first TODOLoading Models The following is a list of the classes that are required to be in HailoRT. Introduce the necessary classes in HailoRT at the top of the code.

import numpy as np
from hailo_platform import HEF, Device, VDevice, InputVStreamParams, OutputVStreamParams, FormatType, HailoStreamInterface, InferVStreams, ConfigureParams

Running on the Hailo NPU is the.hef model files, Hailo's GitHub repository/hailo-ai/hailo_model_zoo Most of the major pre-compiled models are provided and can be downloaded and used directly. Here we use YOLOv8s as a test.

# COCO Labeling of data sets
class_names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
               "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
               elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee,
               "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
               tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple,
               sandwich, orange, broccoli, carrot, hot dog, pizza, donut, cake, chair, couch,
               potted plant, bed, dining table, toilet, TV, laptop, mouse, remote control, keyboard, cell phone,
               microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear,
               'hair drier', 'toothbrush']
# (of cargo etc) load YOLOv8s mould
hef_path = ''
hef = HEF(hef_path)

Once the model is loaded, some configuration of the Hailo device is required.

# initialization Hailo installations
devices = ()
target = VDevice(device_ids=devices)
# Configuring Network Groups
configure_params = ConfigureParams.create_from_hef(hef, interface=)
network_group = (hef, configure_params)[0]
network_group_params = network_group.create_params()
# Getting information about input and output streams
input_vstream_info = hef.get_input_vstream_infos()[0]
output_vstream_info = hef.get_output_vstream_infos()[0]
# Create input and output virtual stream parameters
input_vstreams_params = InputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=FormatType.FLOAT32)
output_vstreams_params = OutputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=FormatType.FLOAT32)

Now that the first TODO has been completed, here is the content of the second TODO.draw inferences . Before inference, the image in the input model needs to be transformed and adjusted to the size of the model input.

# Preprocessing of images
resized_frame = (frame, (input_vstream_info.shape[0], input_vstream_info.shape[1]))
input_data = {input_vstream_info.name: np.expand_dims((resized_frame), axis=0).astype(np.float32)}

After the image adjustment is complete, use theinfer() Methods for reasoning.tf_nms_format parameter controls the output form of the result, defaulting toFalseThe output is in Hailo format, a list where each element represents the detection results of the class, in the format of[number_of_detections，BBOX_PARAMS]The value isTrue When outputting TensorFlow-formatted data, a value of type[class_count, BBOX_PARAMS, detections_count]。

# Create input-output virtual streams and reason about them
with InferVStreams(network_group, input_vstreams_params, output_vstreams_params, tf_nms_format = True) as infer_pipeline:
    with network_group.activate(network_group_params):
        output_data = infer_pipeline.infer(input_data)

Inference needs to be followed by parsing of the result, regardless of the type of formatting thatBBOX_PARAMS All are normalized values. Therefore it is necessary to calculate the ratio of the original image to the input image, inverse normalize the result and then draw the detection frame.

colors = (0, 255, size=(len(class_names), 3))

# Draw the detection frame according to the coordinates
def draw_bboxes(image, bboxes, confidences, class_ids, class_names, colors):
    for i, bbox in enumerate(bboxes):
        x1, y1, x2, y2 = bbox
        label = f'{class_names[class_ids[i]]}: {confidences[i]:.2f}'
        color = colors[class_ids[i]]
        (image, (x1, y1), (x2, y2), color, 2)
        (image, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

# Image scaling
scale_x = [1] / input_vstream_info.shape[1]
scale_y = [0] / input_vstream_info.shape[0]

# Extracting detection frame coordinates、Category and other information，and draw on the original frame
for key in output_data.keys():
    num_classes, bbox_params, num_detections = output_data[key][0].shape

    boxes = []
    confidences = []
    class_ids = []

    for class_id in range(num_classes):
        for detection_id in range(num_detections):
            bbox = output_data[key][0][class_id, :, detection_id]
            if bbox[4] > 0.5:
                x1, y1, x2, y2, confidence = bbox[:5]

                x1 = int(x1 * input_vstream_info.shape[0] * scale_x)
                y1 = int(y1 * input_vstream_info.shape[1] * scale_y)
                x2 = int(x2 * input_vstream_info.shape[0] * scale_x)
                y2 = int(y2 * input_vstream_info.shape[1] * scale_y)
                    
                print(f'{class_names[class_id]}: {[x1, y1, x2, y2]} {bbox[:5]}')

                ([x1, y1, x2, y2])
                (float(confidence))
                class_ids.append(class_id)

    draw_bboxes(frame, boxes, confidences, class_ids, class_names, colors)

At this point, the content of the second TODO has also been implemented, the complete program is as follows:

import cv2
import numpy as np
from hailo_platform import HEF, Device, VDevice, InputVStreamParams, OutputVStreamParams, FormatType, HailoStreamInterface, InferVStreams, ConfigureParams

class_names = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
               'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
               'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
               'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
               'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
               'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
               'hair drier', 'toothbrush']

colors = (0, 255, size=(len(class_names), 3))

# Draw the detection frame according to the coordinates
def draw_bboxes(image, bboxes, confidences, class_ids, class_names, colors):
    for i, bbox in enumerate(bboxes):
        x1, y1, x2, y2 = bbox
        label = f'{class_names[class_ids[i]]}: {confidences[i]:.2f}'
        color = colors[class_ids[i]]
        (image, (x1, y1), (x2, y2), color, 2)
        (image, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)

# (of cargo etc) loadYOLOv8mould
hef_path = ''
hef = HEF(hef_path)
# initializationHailoinstallations
devices = ()
target = VDevice(device_ids=devices)
# Configuring Network Groups
configure_params = ConfigureParams.create_from_hef(hef, interface=)
network_group = (hef, configure_params)[0]
network_group_params = network_group.create_params()
# Getting information about input and output streams
input_vstream_info = hef.get_input_vstream_infos()[0]
output_vstream_info = hef.get_output_vstream_infos()[0]
# Create input and output virtual stream parameters
input_vstreams_params = InputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=FormatType.FLOAT32)
output_vstreams_params = OutputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=FormatType.FLOAT32)

# Using the camera0As a video source
cap = (0)

while True:
    ret, frame = ()
    if not ret:
        break

    # Preprocessing of images
    resized_frame = (frame, (input_vstream_info.shape[0], input_vstream_info.shape[1]))
    input_data = {input_vstream_info.name: np.expand_dims((resized_frame), axis=0).astype(np.float32)}
    # Create input-output virtual streams and reason about them
    with InferVStreams(network_group, input_vstreams_params, output_vstreams_params, tf_nms_format = True) as infer_pipeline:
        with network_group.activate(network_group_params):
            output_data = infer_pipeline.infer(input_data)

    # Image scaling
    scale_x = [1] / input_vstream_info.shape[1]
    scale_y = [0] / input_vstream_info.shape[0]

    # Extracting the bounding box、Category and other information，and draw on the original frame
    for key in output_data.keys():
        num_classes, bbox_params, num_detections = output_data[key][0].shape

        boxes = []
        confidences = []
        class_ids = []

        for class_id in range(num_classes):
            for detection_id in range(num_detections):
                bbox = output_data[key][0][class_id, :, detection_id]
                if bbox[4] > 0.5:
                    x1, y1, x2, y2, confidence = bbox[:5]

                    x1 = int(x1 * input_vstream_info.shape[0] * scale_x)
                    y1 = int(y1 * input_vstream_info.shape[1] * scale_y)
                    x2 = int(x2 * input_vstream_info.shape[0] * scale_x)
                    y2 = int(y2 * input_vstream_info.shape[1] * scale_y)

                    print(f'{class_names[class_id]}: {[x1, y1, x2, y2]} {bbox[:5]}')

                    ([x1, y1, x2, y2])
                    (float(confidence))
                    class_ids.append(class_id)

        draw_bboxes(frame, boxes, confidences, class_ids, class_names, colors)

    ('Detection', frame)
    if (1) & 0xFF == ord('q'):
        break

# Release of resources
()
()

The effect of the program is as follows:

Hailo's GitHub repository also provides other types of applications, for more usage check out the/hailo-ai/Hailo-Application-Code-Examples and official documentation.

consultation

Hailo Documentation：/developer-zone/documentation/
Hailo Application Code Examples：/hailo-ai/Hailo-Application-Code-Examples