Location>code7788 >text

Solution to the NVME disk not available in Centos system cloud host

Popularity:327 ℃/2025-04-03 19:22:07

This article is shared from Tianyi Cloud Developer CommunitySolution to the NVME disk not available in Centos system cloud host》, author: P****n

Problem description

After the cloud host of Linux system uses NVMe disk, unexpected slow IO read and write occurs, causing the system or application to fail to operate the IO on the NVMe disk. The system kicks out the nvme disk, and the nvme disk cannot be viewed through lsblk in the system, and subsequent read and write operations fail, resulting in system and application exceptions or business interruptions.

Cause of the problem

The io_timeout parameter in the NVMe driver controls the maximum tolerant IO timeout time, and is configured by default to 30 seconds in most Linux distributions. If the delay of the IO read and write operation is too high and exceeds the configuration value of this parameter, the NVMe driver will return IO failure. In certain circumstances, the system or application can retry the IO operation. However, in some cases, the IO operation of the system or application on the NVMe disk may fail. The system kicks out the nvme disk, and the nvme disk cannot be viewed through lsblk in the system, and subsequent read and write operations fail, resulting in system and application exceptions or business interruptions.

In order to reduce the exceptions of IO operation timeout on NVMe disks, the io_timeout parameter is usually set to the maximum possible value to increase tolerance for IO latency. In the new version of the kernel, the maximum value of the io_timeout parameter is 4,294,967,295, and in the earlier versions it is 255. Among different versions of kernels, the kernel modules driven by NVMe are also different. Some kernel modules are  , or some kernel modules are nvme_core.ko, so the complete parameter names also have two possibilities: nvme.io_timeout and nvme_core.io_timeout.

Solution

1. Remotely connect to the CentOS cloud host.

2. Execute the following command to check whether the NVMe driver has been loaded in the system kernel.

cat /boot/config-`uname -r` | grep -i nvme | grep -v "^#"

The system displays similar to the following. If CONFIG_BLK_DEV_NVME=y exists, it means that the image has loaded the NVMe driver.

CONFIG_NVME_CORE=m

CONFIG_BLK_DEV_NVME=y

CONFIG_BLK_DEV_NVME_SCSI=y

CONFIG_NVME_FABRICS=m

CONFIG_NVME_RDMA=m

CONFIG_NVME_FC=m

CONFIG_NVME_TARGET=m

CONFIG_NVME_TARGET_LOOP=m

CONFIG_NVME_TARGET_RDMA=m

CONFIG_NVME_TARGET_FC=m

CONFIG_NVME_TARGET_FCLOOP=m

CONFIG_NVMEM=y

3. Add NVMe-related nvme timeout parameters in GRUB.

1) Execute the following command to open the grub file.

vi /etc/default/grub

2) PressiThe key enters edit mode to confirm the full parameter name of the io_timeout parameter and the maximum acceptable value. For example, the full parameter name is nvme_core.io_timeout, and the acceptable maximum value is 4,294,967,295. Then please add nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295 parameter information in the line GRUB_CMDLINE_LINUX=. After adding parameters, the file content is shown in the figure below:

3) Press the Esc key to exit the editing mode, enter: wq and press Enter to save the exit file.

4) Depending on the startup method of the operating system, execute the following commands suitable for your operating system to make the configured GRUB take effect:

Legacy startup method

grub2-mkconfig -o /boot/grub2/

UEFI startup method

grub2-mkconfig -o /boot/efi/EFI/centos/

5) Execute the following command to restart the instance to make the configuration take effect.

reboot

6) Execute the following command to confirm that the relevant configuration has been correctly passed to the kernel.

cat /proc/cmdline

The system displays similar to the following.

... nvme_core.io_timeout=4294967295 nvme_core.admin_timeout=4294967295

7) Execute the following command to confirm that the IO timeout parameters have been correctly configured with the NVMe driver parameters.

cat /sys/module/nvme_core/parameters/io_timeout

The system displays similar to the following.

4294967295