GPT-SoVITS Windows Configuration and Inference Notes (for self-use)

This is a backup for yourself, so that you can check it next time. Windows-side configuration and reasoning are the main focus, code is the core, and direct practical information.

Environmental preparation

System: Windows 10/11
Python: 3.9 (don't use other versions, there are many pitfalls)
GPU (optional): NVIDIA + CUDA 11.8 (can run without GPU, slow down)
FFmpeg: After decompression, throw it to the root directory (the latest version comes with it, but it is recommended to confirm manually)

Download and install

Git download (optional):

git clone /RVC-Boss/

Or go directlyGPT-SoVITSOn the project page, click the "Code" button and select "Download ZIP". Unzip toD:\GPT-SoVITS(The path is determined by yourself).

Double-tap after entering, the first time you run, you will automatically install dependencies and download models. What you need to install in advance:

Python 3.9：
```
python --version # Check if it is 3.
```
FFmpeg (the latest version of GPT-SoVITS comes with it, but it is recommended to put one manually):
Download and putD:\GPT-SoVITS,verify:
```
ffmpeg -version
```

Depend on installation (if it fails automatically)

It is usually done automatically, but if it is stuck (such as network problems), manually tap:

python -m venv venv
venv\Scripts\activate
pip install -r  -i /simple

PyTorch installation on demand:

GPU：
```
pip install torch torchvision torchaudio --index-url /whl/cu118
```
(If the CUDA version is wrong, goPyTorch official websiteFind the corresponding link)

CPU：

pip install torch torchvision torchaudio

examine:

python -c "import torch; print(torch.__version__); print(.is_available())"

Run WebUI

double click, will pop the address:

http://127.0.0.1:9880

The browser opens. There are several tabs in the interface, and I only use reasoning:

Model selection: Pick GPT and SoVITS weights
Audio input: Initial audio
Text input: What you want to say
Generate button: Click to get the result

pit: Inference interface, after selecting the model, you must check it[Enable TTS Inference WebUI], otherwise you won’t jump to the voice synthesis page.

Reasoning steps

Weight playback:
- D:\GPT-SoVITS\GPT_weights
- D:\GPT-SoVITS\SoVITS_weights
  Download address:【Azure Blue File】All students' AI tone model(Pick your favorite character).
Initial audio: 3-10 seconds WAV, uploaded to WebUI, and the tone depends entirely on it.
Enter text: "Test" or something, select the weight, click to generate, and the audio is released below.
The text is too long: use the "slicing" function and process it in segments, otherwise it will easily collapse.

Notice

The tone is biased towards the initial audio, choose clear ones and do not contain background noise.
Want to "sing": use UVR5 (tools/uvr5/There is) Decompose the vocals and spell each piece of reasoning and then make the effect more like the original song.
Inference time: GPU is fast, CPU has to wait for a few seconds.

question

WebUI cannot be opened: port9880Occupated, edited, change it to something else (for example9881）。
Dependencies cannot be installed: change the source, or confirm that Python is 3.9.
Model loading failed: Check the weight path and file name, do not have more spaces or Chinese.

Replenish

Initial audio quality: record a clean one, don’t use your phone to record it casually, as there are many noises and poor effects.
Weight version: The model of the Blue Archives on Bilibili may be updated. Please read the comments before downloading to confirm that they are compatible.