GPT-SoVITS
PremiumFew-shot voice cloning with the highest quality output
About GPT-SoVITS
GPT-SoVITS combines GPT-style language modeling with SoVITS voice conversion to achieve state-of-the-art few-shot voice cloning. With just 3-10 seconds of reference audio plus a transcript, it produces remarkably natural speech that closely matches the target voice. It excels at cross-lingual synthesis - train on one language and generate in another.
Key Features
Few-Shot Voice Cloning
Clone any voice from 3-10 seconds of reference audio with a transcript for best quality.
Cross-Lingual Synthesis
Train on one language and generate speech in Chinese, English, Japanese, Korean, or Cantonese.
Highest Quality
GPT-SoVITS consistently ranks among the highest quality voice cloning models available.
Open Source
Fully MIT licensed with active community development and extensive documentation.
Use Cases
Frequently Asked Questions
Technical Specs
- Generation Speed Medium
- Output Quality Excellent
- Voice Cloning Supported
- Languages 5
- GPU VRAM 4-8GB
- Credits/1000 chars 25