How to setup the TTS LLM model GPT-SoVITS

Nov 10, 2024 · · LLM TTS VOICE ·

Features:

(GPT-SoVITS)[https://github.com/RVC-Boss/GPT-SoVITS]

Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese and Chinese.
WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.

Install in local

Clone the repo: git clone https://github.com/RVC-Boss/GPT-SoVITS.git
Setup the virtual enviroment

1conda create -n GPTSoVits python=3.9
2conda activate GPTSoVits
3bash install.sh
4# or use the china pypi source to installation
5# pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt

Download the pretrained models to GPT_SoVITS/pretrained_models sub folder

1# in GPTSoVits virtual environment to operations
2pip install -U huggingface_hub
3export HF_ENDPOINT=https://hf-mirror.com
4# this following command sould be download the huggingface repo and move to local relative directory
5# The HuggingFace url: https://huggingface.co/lj1995/GPT-SoVITS/tree/main
6huggingface-cli download --resume-download lj1995/GPT-SoVITS --local-dir /GPT-SoVITS/GPT_SoVITS/pretrained_models

Upload the human voice sample model

1# for example: this is chinese voice sample model data
2# https://huggingface.co/baicai1145/GPT-SoVITS-STAR/tree/main
3# Download one of some and upload to `GPT-SoVITS/char_model/example001` sub folder
4# uncompress the files to get the: `*.pth` and `*.ckpt` and reference voice `*.wav`
5huggingface-cli download --resume-download baicai1145/GPT-SoVITS-STAR 三月七.zip  --local-dir /GPT-SoVITS/char_model/

Launch the GPT-SoVITS api

1# this command should start the GPT-SoVITS api service in 9880
2# and should laod the model weights
3python api_v2.py

Load the human voice role weights use api

1# use previously human voice model
2curl -X GET https://xxxx.ngrok-free.app/set_gpt_weights?weights_path=char_model/yq/yq-e10.ckpt

To testing generate the voice

1curl -x GET https://xxxx.ngrok-free.app/tts?text=Hello There, nice to meet you&text_lang=zh&ref_audio_path=char_model/yq/ref/yq01.wav&prompt_lang=zh&top_k=5&top_p=1&temperature=1&text_split_method=cut1&batch_size=1&batch_threshold=0.75&split_bucket=true&speed_factor=1&fragment_interval=0.3&seed=-1&media_type=wav&streaming_mode=false&repetition_penalty=1.35&parallel_infer=true

Use the `GTP-SoVITS` to train our own voice model

Start the webui

1# this sould be launch the webui in local: 9874
2python webui.py

Launch the Ngrok to inverse proxy to access

1ngrok http --url=xxxxx.ngrok-free.app 9874

Access the webui and start the train
Upload the source voice (*.wav) format to server folder and fill into the Audio slicer input for example: GPT-SoVITS/data/common/six-movies.wav
Step by Step in webui 0-Fetch dataset panel, main change the input path over *.wav files to save the folder path
Uninstall the ctranslate2==4.5.0 for the whisper if use faster whisper

1# reinstall the ctranslate2 if appearance error
2# Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so}
3conda activate GPTSoVits
4pip uninstall ctranslate2
5pip install ctranslate2==4.4.0

Check and labeling in the webui, let Ngrok bind to the 9871 port verify the text and voice is correct, and submit the text or continue check click the next idex

1ngrok http --url=xxxxx.ngrok-free.app 9871

Go to the 1-GPT-SOVITS-TTS panel
- To ordinal click the 1Aa-Text buttons and wait complete
  - Final click the One-click formatting
- Go to the 1B-Fine-tuned training panel
  - Start the 1Ba-SoVITS and 1Bb-GPT training
  - Wait the process complete
- Go to the 1C-inference panel
  - Refreshing model paths to select previously traing model comon-tts models
  - Start Open TTS Inference WebUI
  - Change the Ngrok to bind to 9872 port

Testing and inference the models

Upload the source wav format voice (can download from slicer_opt)
Type the test text
Maybe to download the nltk data if the text include multiple languages

1# download the nltk data 
2# maybe (You know) use this url: git clone http://gitclone.com/github.com/nltk/nltk_data.git
3git clone https://github.com/nltk/nltk_data
4# delete the exist packages
5rm -rf /root/nltk_data/corpora
6rm -rf /root/nltk_data/taggers/
7# override the packages
8mv -f ./packages/* /root/nltk_data/

uncompress the *.zip file in /root/nltk_data use shell command Or you can uncompress the /root/nltk_data/taggers/averaged_perceptron_tagger_eng.zip if you use chinese and english mix only

1#!/usr/bin/env bash
2for zipfile in $(find /root/nltk_data  -name "*.zip"); do
3    unzip  -o "$zipfile" -d "$(dirname "$zipfile")"
4done

Use Cloudflare ZeroTrust Tunnels to access in other place use this service

Create new Tunnel in Cloudflare -> Zero Trust -> Network -> Tunnels
Install the relative platform packages
Copy the relative commands in Configure -> Overview panel
Config the public hostnames to proxy to your internal host
Add public hostname tunnel0.example.com -> http://localhost:8080
Specific some arguments for this tunnel

Use Ngrok to inverse proxy to access from external

(Ngrok)[https://dashboard.ngrok.com/]
Install the Ngrok command line interface to local copy from ngrok dashboard
Launch the ngrok client ngrok http http://127.0.0.1:9880

How to setup the TTS LLM model GPT-SoVITS

Features:

Install in local

Use the GTP-SoVITS to train our own voice model

Use Cloudflare ZeroTrust Tunnels to access in other place use this service

Use Ngrok to inverse proxy to access from external

Use the `GTP-SoVITS` to train our own voice model