Location>code7788 >text

GPT-SoVITS Windows Configuration and Inference Notes (for self-use)

Popularity:545 ℃/2025-04-09 21:49:05

GPT-SoVITS Windows Configuration and Inference Notes (for self-use)

This is a backup for yourself, so that you can check it next time. Windows-side configuration and reasoning are the main focus, code is the core, and direct practical information.

Environmental preparation

  • System: Windows 10/11
  • Python: 3.9 (don't use other versions, there are many pitfalls)
  • GPU (optional): NVIDIA + CUDA 11.8 (can run without GPU, slow down)
  • FFmpeg: After decompression, throw it to the root directory (the latest version comes with it, but it is recommended to confirm manually)

Download and install

Git download (optional):

git clone /RVC-Boss/

Or go directlyGPT-SoVITSOn the project page, click the "Code" button and select "Download ZIP". Unzip toD:\GPT-SoVITS(The path is determined by yourself).

Double-tap after entering, the first time you run, you will automatically install dependencies and download models. What you need to install in advance:

  • Python 3.9:
    python --version # Check if it is 3.
  • FFmpeg (the latest version of GPT-SoVITS comes with it, but it is recommended to put one manually):
    Download and putD:\GPT-SoVITS,verify:
    ffmpeg -version
    

Depend on installation (if it fails automatically)

It is usually done automatically, but if it is stuck (such as network problems), manually tap:

python -m venv venv
venv\Scripts\activate
pip install -r  -i /simple

PyTorch installation on demand:

  • GPU:
    pip install torch torchvision torchaudio --index-url /whl/cu118
    
    (If the CUDA version is wrong, goPyTorch official websiteFind the corresponding link)
  • CPU:
    pip install torch torchvision torchaudio
    

examine:

python -c "import torch; print(torch.__version__); print(.is_available())"

Run WebUI

double click, will pop the address:

http://127.0.0.1:9880

The browser opens. There are several tabs in the interface, and I only use reasoning:

  • Model selection: Pick GPT and SoVITS weights
  • Audio input: Initial audio
  • Text input: What you want to say
  • Generate button: Click to get the result

pit: Inference interface, after selecting the model, you must check it[Enable TTS Inference WebUI], otherwise you won’t jump to the voice synthesis page.

Reasoning steps

  1. Weight playback:
    • D:\GPT-SoVITS\GPT_weights
    • D:\GPT-SoVITS\SoVITS_weights
      Download address:【Azure Blue File】All students' AI tone model(Pick your favorite character).
  2. Initial audio: 3-10 seconds WAV, uploaded to WebUI, and the tone depends entirely on it.
  3. Enter text: "Test" or something, select the weight, click to generate, and the audio is released below.
  4. The text is too long: use the "slicing" function and process it in segments, otherwise it will easily collapse.

Notice

  • The tone is biased towards the initial audio, choose clear ones and do not contain background noise.
  • Want to "sing": use UVR5 (tools/uvr5/There is) Decompose the vocals and spell each piece of reasoning and then make the effect more like the original song.
  • Inference time: GPU is fast, CPU has to wait for a few seconds.

question

  • WebUI cannot be opened: port9880Occupated, edited, change it to something else (for example9881)。
  • Dependencies cannot be installed: change the source, or confirm that Python is 3.9.
  • Model loading failed: Check the weight path and file name, do not have more spaces or Chinese.

Replenish

  • Initial audio quality: record a clean one, don’t use your phone to record it casually, as there are many noises and poor effects.
  • Weight version: The model of the Blue Archives on Bilibili may be updated. Please read the comments before downloading to confirm that they are compatible.