How to setup the TTS LLM model GPT-SoVITS
Features:
(GPT-SoVITS)[https://github.com/RVC-Boss/GPT-SoVITS]
- Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
- Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
- Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese and Chinese.
- WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
Install in local
- Clone the repo:
git clone https://github.com/RVC-Boss/GPT-SoVITS.git - Setup the virtual enviroment
1conda create -n GPTSoVits python=3.9
2conda activate GPTSoVits
3bash install.sh
4# or use the china pypi source to installation
5# pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
- Download the pretrained models to GPT_SoVITS/pretrained_models sub folder
1# in GPTSoVits virtual environment to operations
2pip install -U huggingface_hub
3export HF_ENDPOINT=https://hf-mirror.com
4# this following command sould be download the huggingface repo and move to local relative directory
5# The HuggingFace url: https://huggingface.co/lj1995/GPT-SoVITS/tree/main
6huggingface-cli download --resume-download lj1995/GPT-SoVITS --local-dir /GPT-SoVITS/GPT_SoVITS/pretrained_models
- Upload the human voice sample model
1# for example: this is chinese voice sample model data
2# https://huggingface.co/baicai1145/GPT-SoVITS-STAR/tree/main
3# Download one of some and upload to `GPT-SoVITS/char_model/example001` sub folder
4# uncompress the files to get the: `*.pth` and `*.ckpt` and reference voice `*.wav`
5huggingface-cli download --resume-download baicai1145/GPT-SoVITS-STAR 三月七.zip --local-dir /GPT-SoVITS/char_model/
- Launch the
GPT-SoVITSapi
1# this command should start the GPT-SoVITS api service in 9880
2# and should laod the model weights
3python api_v2.py
- Load the human voice role weights use api
1# use previously human voice model
2curl -X GET https://xxxx.ngrok-free.app/set_gpt_weights?weights_path=char_model/yq/yq-e10.ckpt
- To testing generate the voice
1curl -x GET https://xxxx.ngrok-free.app/tts?text=Hello There, nice to meet you&text_lang=zh&ref_audio_path=char_model/yq/ref/yq01.wav&prompt_lang=zh&top_k=5&top_p=1&temperature=1&text_split_method=cut1&batch_size=1&batch_threshold=0.75&split_bucket=true&speed_factor=1&fragment_interval=0.3&seed=-1&media_type=wav&streaming_mode=false&repetition_penalty=1.35¶llel_infer=true
Use the GTP-SoVITS to train our own voice model
- Start the webui
1# this sould be launch the webui in local: 9874
2python webui.py
- Launch the
Ngrokto inverse proxy to access
1ngrok http --url=xxxxx.ngrok-free.app 9874
- Access the webui and start the train
- Upload the source voice (*.wav) format to server folder and fill into the Audio slicer input for example:
GPT-SoVITS/data/common/six-movies.wav - Step by Step in webui
0-Fetch datasetpanel, main change the input path over *.wav files to save the folder path - Uninstall the
ctranslate2==4.5.0for the whisper if use faster whisper
1# reinstall the ctranslate2 if appearance error
2# Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
3conda activate GPTSoVits
4pip uninstall ctranslate2
5pip install ctranslate2==4.4.0
- Check and labeling in the webui, let
Ngrokbind to the9871port verify the text and voice is correct, and submit the text or continue check click the next idex
1ngrok http --url=xxxxx.ngrok-free.app 9871
- Go to the
1-GPT-SOVITS-TTSpanel- To ordinal click the
1Aa-Textbuttons and wait complete- Final click the
One-click formatting
- Final click the
- Go to the
1B-Fine-tuned trainingpanel- Start the
1Ba-SoVITSand1Bb-GPTtraining - Wait the process complete
- Start the
- Go to the
1C-inferencepanel- Refreshing model paths to select previously traing model
comon-ttsmodels - Start
Open TTS Inference WebUI - Change the
Ngrokto bind to9872port
- Refreshing model paths to select previously traing model
- To ordinal click the
- Testing and inference the models
- Upload the source wav format voice (can download from slicer_opt)
- Type the test text
- Maybe to download the nltk data if the text include multiple languages
1# download the nltk data 2# maybe (You know) use this url: git clone http://gitclone.com/github.com/nltk/nltk_data.git 3git clone https://github.com/nltk/nltk_data 4# delete the exist packages 5rm -rf /root/nltk_data/corpora 6rm -rf /root/nltk_data/taggers/ 7# override the packages 8mv -f ./packages/* /root/nltk_data/- uncompress the
*.zipfile in/root/nltk_datause shell command Or you can uncompress the/root/nltk_data/taggers/averaged_perceptron_tagger_eng.zipif you use chinese and english mix only
1#!/usr/bin/env bash 2for zipfile in $(find /root/nltk_data -name "*.zip"); do 3 unzip -o "$zipfile" -d "$(dirname "$zipfile")" 4done
Use Cloudflare ZeroTrust Tunnels to access in other place use this service
- Create new Tunnel in
Cloudflare->Zero Trust->Network->Tunnels - Install the relative platform packages
- Copy the relative commands in
Configure->Overviewpanel - Config the public hostnames to proxy to your internal host
- Add public hostname
tunnel0.example.com->http://localhost:8080 - Specific some arguments for this tunnel
Use Ngrok to inverse proxy to access from external
- (Ngrok)[https://dashboard.ngrok.com/]
- Install the
Ngrokcommand line interface to local copy from ngrok dashboard - Launch the ngrok client
ngrok http http://127.0.0.1:9880