Here are the steps to configure RoCEv2 for Kirin Server V10:
Step 1: Confirm hardware and driver support
Before you begin configuration, you must first make sure your server hardware meets the requirements. Usually requiredMellanox ConnectX series network card(for example, mlx5 series), and the latest OFED driver package is installed. You can check the driver status by following command:
modinfo mlx5_core # View kernel module information
lspci | grep Mellanox # Confirm network card model
If you find that the driver is not loading correctly, you need to download the corresponding version of the driver from the Mellanox official website and install it.
Step 2: Switch the network card to RoCEv2 mode
By default, RDMA may run in RoCEv1 mode (based on Ethernet layer 2), while RoCEv2 needs to switch to Layer 3 IP mode. usecma_roce_mode
Tool adjustment (assuming the network card device name ismlx5_1
):
cma_roce_mode -d mlx5_1 -p 1 -m 2
Here-m 2
Indicates that RoCEv2 is enabled. After completion, it is recommended to passdmesg | grep RDMA
Check the kernel log and confirm that the mode switch is successful.
Step 3: Configure flow control and priority
RoCEv2 is sensitive to network quality and needs to be coordinatedDCQCN (Dynamic Congestion Control)andPFC (Priority Flow Control). Assume that the network card interface name isens1np0
, need to be set in the system:
-
Turn on ECN and Priority:
Enable the ECN function of Priority 3 (usually used for RoCE traffic):echo 1 > /sys/class/net/ens1np0/ecn/roce_np/enable/3 echo 1 > /sys/class/net/ens1np0/ecn/roce_rp/enable/3
-
Mark CNP messages:
Set the DSCP value and 802.1p priority of the congestion notification message (CNP):echo 48 > /sys/class/net/ens1np0/ecn/roce_np/cnp_dscp # DSCP=48 echo 6 > /sys/class/net/ens1np0/ecn/roce_np/cnp_802p_prio # 802.1p priority 6
Step 4: Optimize network card queue scheduling
By Mellanoxmlnx_qos
Tools adjust QoS policies to ensure that RoCE traffic has sufficient bandwidth. For example, assign higher weights to priority 3:
mlnx_qos -i ens1np0 --trust=dscp # Trust DSCP tag
mlnx_qos -i ens1np0 -f 0,0,0,1,0,0,0,0 # Enable PFC in priority 3
mlnx_qos -i ens1np0 -s ets,ets,ets,ets,ets,ets,ets,strict,strict -t 10,10,50,10,10,0,0 # Queue weight allocation
The key to this step is to enable the queue of priority 3 (corresponding to RoCEv2 traffic) to obtain a higher bandwidth ratio and avoid other traffic from preempting resources.
Step 5: Configure the switch side
If the server is connected to the switch, make sure the switch configuration is consistent with the network card. For example:
- Enable on the switchDSCP-based PFC, and enable flow control for DSCP=48 (i.e. priority 3).
- Confirm that the switch's ECN function is enabled and matches the server's DSCP/802.1p mapping.
The specific configuration commands vary according to the switch model, and it is recommended to refer to the switch manufacturer's documentation.
Step 6: Verify the configuration
The last step is to test whether RoCEv2 works properly. Recommended useib_send_bw
Tools perform bandwidth testing:
Server:
ib_send_bw -d mlx5_1 --report_gbits -F -R
Client:
ib_send_bw -d mlx5_1 --report_gbits -F -R <Server IP>
If you see a stable high bandwidth (such as 25Gbps or 100Gbps, depending on the network card model), the configuration is successful. If packet loss or low bandwidth occurs, you can passethtool -S ens1np0
Check network card statistics, or use Wireshark to capture packets and analyze ECN and CNP packets.
Things to note
-
Restart the network service: After the configuration is completed, it is recommended to restart the network service to make the settings take effect:
systemctl restart NetworkManager # or traditional network service
-
Kernel parameters: If you use the network card binding, you need to
/etc//
Medium configurationmiimon=100 mode=4
(802.3ad dynamic aggregation). - Firmware upgrade: If you encounter compatibility issues, you may need to upgrade the network card firmware.
Through the above steps, you should be able to successfully deploy RoCEv2 on Kirin V10. If you encounter problems during operation, you can prioritize checking whether the driver version and switch configuration match, which is the most common point of failure.