Qwen3-TTS
PremiumMultilingual TTS with 3-second voice cloning in 10 languages
About Qwen3-TTS
Qwen3-TTS from Alibaba is a 0.6B parameter text-to-speech model that combines high quality with efficient inference. It supports 10 languages and can clone any voice from just 3 seconds of reference audio. Built on the Qwen3 architecture, it produces natural-sounding speech with excellent prosody and pronunciation across all supported languages.
Key Features
3-Second Voice Cloning
Clone any voice from just 3 seconds of reference audio - the fastest cloning in the industry.
10 Languages
Chinese, English, Japanese, Korean, French, German, Spanish, Italian, Portuguese, and Russian.
Efficient Inference
0.6B parameters for fast inference while maintaining high quality output.
Natural Prosody
Built on the Qwen3 architecture for natural-sounding speech with appropriate intonation.
Use Cases
Frequently Asked Questions
Technical Specs
- Generation Speed Fast
- Output Quality Very Good
- Voice Cloning Supported
- Languages 10
- GPU VRAM 4-8GB
- Credits/1000 chars 25