Running Stable Diffusion in a Windows environment with an AMD graphics card

The computer I'm using now was purchased in 21 years, and at that time there was no need for AI related programs. For various reasons, I chose to use AMD graphics cards, but today, in 2024, using AI to do some work is no longer a rare need, so I wanted to try it out, but I found that AMD graphics cards hit a brick wall in every aspect. After some research, I found out that AMD graphics cards have made a lot of progress in AI support through various efforts, but due to historical reasons, NVIDIA graphics cards have better support in this area, so I'm here to record the process of running an AI program with an AMD graphics card in a Windows environment.

These steps can now be completed through the integration package, search for "Eisai Integration Pack" can be found, here is just a record of the learning process, easy to learn to check.

CUDA

The first thing to mention is CUDA. Unlike CPUs, GPUs are more capable of parallel computing, and CUDA is NVIDIA's parallel computing platform and programming model that allows developers to program GPUs and then run them on NVIDIA's GPUs so they can take full advantage of the GPU's parallel computing capabilities instead of just using them to display graphics. CUDA is NVIDIA's proprietary technology, so AMD cards can't use CUDA. CUDA is a proprietary technology of NVIDIA, so AMD graphics cards cannot use CUDA. Moreover, thanks to its early introduction, the CUDA ecosystem is more complete, and there are a lot of libraries and frameworks based on CUDA, such as TensorFlow, PyTorch, and so on.

The popular AI painting tool Stable Diffusion requires PyTorch - in other words, if an AMD graphics card can support PyTorch, it can run Stable Diffusion.

Linux

First I learned that AMD cards can run PyTorch in a Linux environment, and AMD has released ROCm, an open-source parallel computing platform that allows AMD cards to run PyTorch, but in a Windows environment, ROCm doesn't support PyTorch, so we needed to find another way.

I also considered installing a dual system during my research, but using AI paint on a dual system is a bit of a hassle, and I might not want to switch systems after a while, so I preferred to run Stable Diffusion on a Windows environment with an AMD graphics card.

Linux environment is said to have no loss of efficiency, there is a lot of information about it, I will not put it here.

DirectML

The 2023 program sees a project called pytorch-directml, a backend for PyTorch that allows PyTorch to run in a Windows environment using DirectML, a machine learning acceleration library from Microsoft that allows PyTorch to on Windows using AMD graphics cards.

After a lot of hard work by the community, the pytorch-directml project is now able to run PyTorch on Windows with an AMD graphics card, which gives us a solution to run Stable Diffusion on Windows with an AMD graphics card.

stable-diffusion Most famous integration packsAUTOMATIC1111/stable-diffusion-webui There is also a wiki for AMD cards, but this project does not directly support AMD cards, so we need to use a fork project. wiki screenshot below:

Jumping to the corresponding project via the link, I found that the name had changed -lshqqytiger/stable-diffusion-webui-amdgpuThe program was supposed to be calledstable-diffusion-webui-directml, you can see that it is now not limited to the use of pytorch-directml, which we will mention later.

Below is a screenshot of GPU-Z, you can see the graphics card's support for DirectML and CUDA, although this article has been talking about AMD cards, but in fact, some of Intel's graphics cards also support DirectML, theoretically you can also use this program, but there are fewer cases, do not have to understand, not detailed description here.

References:

Accelerating stable-diffusion on windows with AMD graphics via pytorch-directml

ROCm

AMD ROCm is an open software stack of drivers, development tools and APIs that facilitate GPU programming from the underlying kernel to the end-user application.ROCm is optimized for generative AI and HPC applications, and helps to easily migrate existing code.

Positioning should be similar to CUDA, this just need to go to the official website to download and install, the current Stable Diffusion should only be able to use the 5.7 version, the download package named in 23 years Q4.

PyTorch

PyTorch is a fully functional framework for building deep learning models, a type of machine learning commonly used in applications such as image recognition and language processing. Written in Python and therefore relatively simple to learn and use for most machine learning developers, PyTorch is unique in that it fully supports GPUs and uses inverse mode auto-differentiation techniques so that computational graphs can be dynamically modified. This makes it a popular choice for rapid experimentation and prototyping.

If using the DirectML scheme, then pytorch_directml needs to be installed:

pip install torch-directml

Python

Since PyTorch has been mentioned repeatedly, here's a little bit about Python. There are a lot of related learning materials, so I'm not going to introduce them here, but I'm going to talk about versioning, which is similar to front-end environments, where different versions are adapted to different situations, and the work they can do is also different, so you'll need a version-switching tool like nvm, which uses nvm, or Conda, which is a tool that allows you to switch between different versions of Python in different projects. The principle is not quite the same. Conda creates a virtual environment with the corresponding version, and then installs the corresponding version of Python in the virtual environment, so that you can use different versions of Python in different projects. If you don't use Python a lot, you can use MiniConda, which is a lightweight Conda that contains only the most basic features and doesn't take up much space.

download and install

latest-miniconda-installer-links

Creating a Virtual Environment

In the DirectML scenario above, the Wiki mentions that Python 3.10.6 is recommended.

# Create the virtual environment
conda create -n pytdml python=3.10.6

# Initialize for the first time
conda init

# Activate the virtual environment
conda activate pytdml

# Install pytorch_directml
pip install torch-directml

References:

Enabling PyTorch on Windows with DirectML

ZLUDA

In fact, after all this talk about DirectML, but the fact is that this program is not the best now, and DirectML still does not bring out the full performance of AMD cards, in February of this year.ZLUDA A version for AMD graphics cards has been released that allows AMD graphics cards to run CUDA programs, making it possible to run Stable Diffusion with AMD graphics cards in Windows environments.

ZLUDA is an alternative to CUDA on non-NVIDIA GPUs. zLUDA allows unmodified CUDA applications to be run using non-NVIDIA GPUs with near-native performance.

Theoretically, all you need to do is install ROCm, then use the CUDA version of pytorch directly, and replace the corresponding CUDA dll file with the version compiled by lshqqytiger, and you can run it directly on Windows.

lshqqytiger Version:/lshqqytiger/ZLUDA

Also, webui recommendsautomaticThis version has better support for AMD graphics cards, so I finally ran it with this version. Run , the default startup is on port 7860, the first startup will take more than 20 minutes to compile, wait patiently, the interface is as follows:

I've mounted models before, if you enter for the first time it may ask you to select a model. Add a cue word to output a random one, preferably not too large in size, 512×512 is fine, for reasons that will be covered in another post dedicated to generation:

You can see that the GPU usage has gone up, indicating that the graphics card is already being used for calculations at this point:

summarize

Installing ROCm
Installing Python
Installing PyTorch
Installing ZLUDA (mostly configuring environment variables)
Replacing CUDA dll files
Run webui (I found out later that the integration package seems to automate PyTorch, etc., but I couldn't verify that since I already had it installed on my end)

This article is mainly a record of their own learning process, the follow-up in the generation of the intention to directly use the integration of the package, the function is more comprehensive, will be more convenient than building some of their own, so that you can focus more on the specific generation, rather than wasting too much time on the environment.