How to setup the TTS LLM model GPT-SoVITS
Features:
(GPT-SoVITS)[https://github.com/RVC-Boss/GPT-SoVITS]
- Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
- Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
- Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese and Chinese.
- WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
Install in local
- Clone the repo:
git clone https://github.com/RVC-Boss/GPT-SoVITS.git
- Setup the virtual enviroment
1conda create -n GPTSoVits python=3.9
2conda activate GPTSoVits
3bash install.sh
4# or use the china pypi source to installation
5# pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
- Download the pretrained models to GPT_SoVITS/pretrained_models sub folder
1# in GPTSoVits virtual environment to operations
2pip install -U huggingface_hub
3export HF_ENDPOINT=https://hf-mirror.com
4# this following command sould be download the huggingface repo and move to local relative directory
5# The HuggingFace url: https://huggingface.co/lj1995/GPT-SoVITS/tree/main
6huggingface-cli download --resume-download lj1995/GPT-SoVITS --local-dir /GPT-SoVITS/GPT_SoVITS/pretrained_models
- Upload the human voice sample model
1# for example: this is chinese voice sample model data
2# https://huggingface.co/baicai1145/GPT-SoVITS-STAR/tree/main
3# Download one of some and upload to `GPT-SoVITS/char_model/example001` sub folder
4# uncompress the files to get the: `*.pth` and `*.ckpt` and reference voice `*.wav`
5huggingface-cli download --resume-download baicai1145/GPT-SoVITS-STAR 三月七.zip --local-dir /GPT-SoVITS/char_model/
- Launch the
GPT-SoVITS
api
1# this command should start the GPT-SoVITS api service in 9880
2# and should laod the model weights
3python api_v2.py
- Load the human voice role weights use api
1# use previously human voice model
2curl -X GET https://xxxx.ngrok-free.app/set_gpt_weights?weights_path=char_model/yq/yq-e10.ckpt
- To testing generate the voice
1curl -x GET https://xxxx.ngrok-free.app/tts?text=Hello There, nice to meet you&text_lang=zh&ref_audio_path=char_model/yq/ref/yq01.wav&prompt_lang=zh&top_k=5&top_p=1&temperature=1&text_split_method=cut1&batch_size=1&batch_threshold=0.75&split_bucket=true&speed_factor=1&fragment_interval=0.3&seed=-1&media_type=wav&streaming_mode=false&repetition_penalty=1.35¶llel_infer=true
Use the GTP-SoVITS
to train our own voice model
- Start the webui
1# this sould be launch the webui in local: 9874
2python webui.py
- Launch the
Ngrok
to inverse proxy to access
1ngrok http --url=xxxxx.ngrok-free.app 9874
- Access the webui and start the train
- Upload the source voice (*.wav) format to server folder and fill into the Audio slicer input for example:
GPT-SoVITS/data/common/six-movies.wav
- Step by Step in webui
0-Fetch dataset
panel, main change the input path over *.wav files to save the folder path - Uninstall the
ctranslate2==4.5.0
for the whisper if use faster whisper
1# reinstall the ctranslate2 if appearance error
2# Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
3conda activate GPTSoVits
4pip uninstall ctranslate2
5pip install ctranslate2==4.4.0
- Check and labeling in the webui, let
Ngrok
bind to the9871
port verify the text and voice is correct, and submit the text or continue check click the next idex
1ngrok http --url=xxxxx.ngrok-free.app 9871
- Go to the
1-GPT-SOVITS-TTS
panel- To ordinal click the
1Aa-Text
buttons and wait complete- Final click the
One-click formatting
- Final click the
- Go to the
1B-Fine-tuned training
panel- Start the
1Ba-SoVITS
and1Bb-GPT
training - Wait the process complete
- Start the
- Go to the
1C-inference
panel- Refreshing model paths to select previously traing model
comon-tts
models - Start
Open TTS Inference WebUI
- Change the
Ngrok
to bind to9872
port
- Refreshing model paths to select previously traing model
- To ordinal click the
- Testing and inference the models
- Upload the source wav format voice (can download from slicer_opt)
- Type the test text
- Maybe to download the nltk data if the text include multiple languages
1# download the nltk data 2# maybe (You know) use this url: git clone http://gitclone.com/github.com/nltk/nltk_data.git 3git clone https://github.com/nltk/nltk_data 4# delete the exist packages 5rm -rf /root/nltk_data/corpora 6rm -rf /root/nltk_data/taggers/ 7# override the packages 8mv -f ./packages/* /root/nltk_data/
- uncompress the
*.zip
file in/root/nltk_data
use shell command Or you can uncompress the/root/nltk_data/taggers/averaged_perceptron_tagger_eng.zip
if you use chinese and english mix only
1#!/usr/bin/env bash 2for zipfile in $(find /root/nltk_data -name "*.zip"); do 3 unzip -o "$zipfile" -d "$(dirname "$zipfile")" 4done
Use Ngrok to inverse proxy to access from external
- (Ngrok)[https://dashboard.ngrok.com/]
- Install the
Ngrok
command line interface to local copy from ngrok dashboard - Launch the ngrok client
ngrok http http://127.0.0.1:9880