GPT-SoVITS Windows Configuration and Inference Notes (for self-use)
This is a backup for yourself, so that you can check it next time. Windows-side configuration and reasoning are the main focus, code is the core, and direct practical information.
Environmental preparation
- System: Windows 10/11
- Python: 3.9 (don't use other versions, there are many pitfalls)
- GPU (optional): NVIDIA + CUDA 11.8 (can run without GPU, slow down)
- FFmpeg: After decompression, throw it to the root directory (the latest version comes with it, but it is recommended to confirm manually)
Download and install
Git download (optional):
git clone /RVC-Boss/
Or go directlyGPT-SoVITSOn the project page, click the "Code" button and select "Download ZIP". Unzip toD:\GPT-SoVITS
(The path is determined by yourself).
Double-tap after entering, the first time you run, you will automatically install dependencies and download models. What you need to install in advance:
- Python 3.9:
python --version # Check if it is 3.
- FFmpeg (the latest version of GPT-SoVITS comes with it, but it is recommended to put one manually):
Download and putD:\GPT-SoVITS
,verify:ffmpeg -version
Depend on installation (if it fails automatically)
It is usually done automatically, but if it is stuck (such as network problems), manually tap:
python -m venv venv
venv\Scripts\activate
pip install -r -i /simple
PyTorch installation on demand:
- GPU:
(If the CUDA version is wrong, goPyTorch official websiteFind the corresponding link)pip install torch torchvision torchaudio --index-url /whl/cu118
- CPU:
pip install torch torchvision torchaudio
examine:
python -c "import torch; print(torch.__version__); print(.is_available())"
Run WebUI
double click, will pop the address:
http://127.0.0.1:9880
The browser opens. There are several tabs in the interface, and I only use reasoning:
- Model selection: Pick GPT and SoVITS weights
- Audio input: Initial audio
- Text input: What you want to say
- Generate button: Click to get the result
pit: Inference interface, after selecting the model, you must check it[Enable TTS Inference WebUI]
, otherwise you won’t jump to the voice synthesis page.
Reasoning steps
- Weight playback:
D:\GPT-SoVITS\GPT_weights
-
D:\GPT-SoVITS\SoVITS_weights
Download address:【Azure Blue File】All students' AI tone model(Pick your favorite character).
- Initial audio: 3-10 seconds WAV, uploaded to WebUI, and the tone depends entirely on it.
- Enter text: "Test" or something, select the weight, click to generate, and the audio is released below.
- The text is too long: use the "slicing" function and process it in segments, otherwise it will easily collapse.
Notice
- The tone is biased towards the initial audio, choose clear ones and do not contain background noise.
- Want to "sing": use UVR5 (
tools/uvr5/
There is) Decompose the vocals and spell each piece of reasoning and then make the effect more like the original song. - Inference time: GPU is fast, CPU has to wait for a few seconds.
question
- WebUI cannot be opened: port
9880
Occupated, edited, change it to something else (for example
9881
)。 - Dependencies cannot be installed: change the source, or confirm that Python is 3.9.
- Model loading failed: Check the weight path and file name, do not have more spaces or Chinese.
Replenish
- Initial audio quality: record a clean one, don’t use your phone to record it casually, as there are many noises and poor effects.
- Weight version: The model of the Blue Archives on Bilibili may be updated. Please read the comments before downloading to confirm that they are compatible.