Location>code7788 >text

Getting Started | Building an AI Model Development Environment

Popularity:465 ℃/2024-09-03 08:31:05

catalogs
  • Installation of graphics card drivers and development libraries
    • For Tesla series graphics cards
    • For N-card
  • Installing CUDA and cuDNN
  • Installing Miniconda
  • Install PyTorch and Transformers
  • Using Modelscope to Download and Load Models
  • PyCharm Project Configuration
  • Model loading and dialog
  • CPU and GPU issues
  • transformers version error
  • TORCH_USE_CUDA_DSA Error

Learning model development, build the environment may encounter a lot of twists and turns, here to provide some general environment construction and installation methods, so that readers can quickly build a set of AI model development debugging environment.

Installation of graphics card drivers and development libraries

This article is only about how to install the NVIDIA graphics card driver.

There are several series of NVIDIA graphics cards, the commonly used ones are the Tensor and GeForce RTX series. The driver installation for the two types of graphics cards is different, and the following sections will introduce how to install the driver separately.

The first step is to detect whether the computer correctly recognizes the graphics card or has installed the driver.

Open Device Manager and clickDisplay Adapter , check the device list for the presence of a graphics card.

image-20240831193543224

image-20240831193501897

If the computer already recognizes the graphics card, you can update to the latest version of the driver through NVIDIA GeForce Experience or in other driver management tools.

1725110469061

Or go directly to the official driver page to search for the driver to be installed for the graphics card model, Nvida Official Driver Search Download Page:/drivers/lookup/

image-20240831194432476

For Tesla series graphics cards

For example, after creating a GPU server in a cloud platform such as Azure, if the graphics card is a Tesla, the graphics card may not be recognized when you first turn on the computer, and you need to install the driver before the graphics card device can be displayed.

Windows can be installed by referring to this link:/zh-CN/azure/virtual-machines/windows/n-series-driver-setup

Linux can be installed by referring to this link:/zh-CN/azure/virtual-machines/linux/n-series-driver-setup

For Windows, the installation is relatively simple, just follow the documentation and download the GRID program installer.

image-20240831193113478

After installing the driver, launch the command to see the supported CUDA versions:

nvidia-smi

file

As you can see, this driver version only supports the 12.2 version of CUDA.

For N-card

For graphics cards such as GeForce RTX 4060TI, GeForce RTX 4070, etc., you can download the driver installer directly from the official website:

/geforce/drivers/

Generally speaking, home hosts are shipped with good drivers installed.

Installing CUDA and cuDNN

image-20240831195641685

CUDA is NVIDIA's parallel computing platform and programming model developed specifically for general-purpose computing on graphics processing units (GPUs). With CUDA, developers can leverage the power of GPUs to dramatically accelerate computing applications.

Simply put, CUDA is a programming model that supports CPU distribution and GPU parallel computing. In order to use CUDA, you need to install a development kit.

Introduction to CUDA:

/cuda-zone

/zh-cn/blog/cuda-intro-cn/

CUDA installation package download address:/cuda-downloads

Open the installation package under the following, follow the prompts to install, simple installation will be installed on the C disk, advanced installation can be customized to install the location, it is recommended to use simple installation to avoid additional situations.

1725105003545

After the installation is complete, the environment variable will have two extra records:

image-20240831195802036

cuDNN is a GPU-based deep learning acceleration library. cuDNN is downloaded as a zip file.

Download Address:/cudnn-downloads

1725105639726

show (a ticket)C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\, locate the version directory, or use the environment variableCUDA_PATH Locate the installation directory and copy and merge the contents of the cuDNN tarball into the CUDA directory.

image-20240831220117612

Finally the bin, lib,lib\x64Add the five directories, including, libnvvp, and libnvvp, to the environment variable Path.

It's not clear exactly how many environment variables are actually needed, just add them.

Installing Miniconda

Miniconda is a Python package manager that creates multiple environmentally isolated Python environments on your system.

Download Address:/miniconda/

After downloading, search for miniconda3 shortcut menu, run it as administrator, click on it to open the console, there will be cmd and powershell shortcut links in the menu list, it is recommended to use powershell entry.

Subsequent executions of the conda command should be run as an administrator.

image-20240901072421293

Configure domestic sources to accelerate downloads:

conda config --add channels /anaconda/pkgs/free/

fulfillmentconda env list Directory to view the default environment installation directory.

image-20240901072824863

If you already have Python installed on your computer and have added environment variables, do not set theG:\ProgramData\miniconda3 Add it to the environment variable, as this will cause the environment to be dazzled.

If you don't already have Python installed on your computer, you can simply set theG:\ProgramData\miniconda3G:\ProgramData\miniconda3\Scripts Add it to the environment variables.

The author's computer uninstalled the manually installed Python and used only the environment provided by miniconda3.

If Python and pip are self-installed, they are isolated from the miniconda3 environment when installing dependencies by executing the command directly. If you want to install dependencies in the miniconda3 environment, you need to open the miniconda3 console and execute the pip command so that the installed packages will appear in the miniconda3 environment.

After installing the dependency packages in one environment, different projects can share the downloaded dependency packages without the need to download them once for each project.

Install PyTorch and Transformers

Flax, PyTorch or TensorFlow are all deep learning frameworks, and Transformers underlying can use Flax, PyTorch or TensorFlow deep learning frameworks for model loading, training and other functions.

PyTorch installation reference documentation:/get-started/locally/

Either the GPU version (CUDA) or the CPU version can be installed, and then copy the install command from the prompt below.

1725148068180(1)

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Then you also need to run commands to install Transformers and some dependent libraries.

pip install protobuf 'transformers>=4.41.2' cpm_kernels 'torch>=2.0' gradio mdtex2html sentencepiece accelerate

The latest version of transformers may be installed automatically, which can cause problems that are described in later chapters.

Using Modelscope to Download and Load Models

ModelScope is a domestic AI modeling community led by Aliyun, which provides all kinds of models and datasets as well as development tool libraries. Due to the difficulty of getting started with huggingface and foreign networks, Modelscope is used here to download and load models.

Install modelscope:

pip install modelscope

PyCharm Project Configuration

PyCharm is the most commonly used Python programming tool, so here's how to configure the miniconda3 environment in PyCharm.

Open PyCharm and add the miniconda3 environment in the settings as shown.

1725148940379

1725148968981(1)

Then create a project where you select a conda-based environment.

1725149018283

Model loading and dialog

Create the file in the project directory.

image-20240901080538372

Paste the following code to , then run the code and it will automatically download the model, load the model and the dialog.

from modelscope import AutoTokenizer, AutoModel, snapshot_download

# Download model
# ZhipuAI/chatglm3-6b model store
# D:/modelscope Model file cache storage directory
model_dir = snapshot_download("ZhipuAI/chatglm3-6b",cache_dir="D:/modelscope", revision="v1.0.0")

# Loading Models
# float be 32,half be16 bit float,Memory can be cut in half
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()
model = ()

# Starting a conversation
response, history = (tokenizer, "How are you?", history=[])
print(response)
response, history = (tokenizer, "What should I do if I can't sleep at night?", history=history)
print(response)

1725150688028

"ZhipuAI/chatglm3-6b" refer toZhipuAI warehousechatglm3-6b Models, you can view the various models that have been uploaded by the community through ModelScope:

/models

revision="v1.0.0" The download version number is the same as the repository branch name, you can fill in different branch names to download different versions.

image-20240901093307337

CPU and GPU issues

If you get the following error, you may have installed the CPU and not the GPU version of PyTorch.

    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

image-20240901111744905

Execute the code:

import torch
print(torch.__version__)

image-20240901113934658

As a rule of thumb, if you used pip to install the relevant libraries instead of the conda command, you need to uninstall pytorch by executing the following command:

pip uninstall torch torchvision torchaudio
conda uninstall pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Then run the command to reinstall pytorch:

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Re-execute the command and it works fine:

image-20240901120654336

transformers version error

Due to the installation of various libraries are installed when installing the latest version of the installation, there may be some libraries are not compatible, the execution of the following lines of code when the error is thrown.

response, history = (tokenizer, "How are you?", history=[])

First the following warning appears, then an error is reported:

1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  context_layer = .scaled_dot_product_attention(query_layer, key_layer, value_layer,

file

It is necessary to install the latest version of the transformers version requested (upgrade).

pip install transformers==4.41.2

file

Through all the twists and turns, it finally worked:

image-20240901122852869

TORCH_USE_CUDA_DSA Error

The problem I encountered was supposedly caused by insufficient GPU performance, which appeared on Azure A10 machines, and did not occur on the RTX 4060TI at home.

But it's also possible that the video card driver doesn't match the CUDA version.

  File "C:\ProgramData\miniconda3\Lib\site-packages\transformers\generation\", line 2410, in _sample
    next_token_scores = logits_processor(input_ids, next_token_logits)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\miniconda3\Lib\site-packages\transformers\generation\logits_process.py", line 98, in __call__
    scores = processor(input_ids, scores)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\xxx\.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 55, in __call__
    if (scores).any() or (scores).any():
       ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

file

It is possible to use the CPU:

file

Ran a random demo and it works.

/pytorch/examples/blob/main/mnist/

This may be caused by the inconsistency between the CUDA library and the driver library version, first execute thenvidia-smi command to check the compatible CUDA version of the graphics card driver library.

file

Download and install the corresponding version of CUDA, then re-install cuDNN and set the environment variables.

file

Finally, the server has also successfully built the AI environment.

file