Kokoro
StandardLightning-fast, lightweight TTS with natural quality
Very Fast
Speed
Good
Quality
No
Cloning
9
Languages
About Kokoro
Kokoro is an ultra-lightweight 82M parameter TTS model that delivers natural-sounding speech at incredible speed. It runs near real-time even on CPU, making it ideal for applications where low latency is critical. Kokoro supports multiple languages and offers voice blending capabilities.
Key Features
Ultra-Lightweight
82M parameters, ~300MB model size. Runs on CPU with minimal resources.
Near Real-Time
Generates speech faster than playback speed, even without GPU acceleration.
Multi-Language
Supports English, French, Spanish, Hindi, Japanese, Chinese, Italian, Portuguese, and Korean.
Voice Blending
Mix two voices together to create unique voice combinations.
Use Cases
Real-time chatbots and virtual assistants
Live streaming text-to-speech
Edge deployment and mobile applications
High-volume batch processing
Frequently Asked Questions
Kokoro is an ultra-lightweight text-to-speech model with only 82 million parameters. Despite its small size, it produces natural-sounding speech across multiple languages at near real-time speed, even on CPU.
Yes, Kokoro is fully Apache 2.0 licensed - both code and model weights. It can be used freely in commercial applications with no restrictions.
Kokoro supports English (US and British), French, Spanish, Hindi, Japanese, Chinese, Italian, Portuguese, and Korean.
Kokoro is one of the fastest TTS models available. It generates speech faster than real-time playback speed even on CPU, making it ideal for interactive applications.
No, Kokoro does not support voice cloning. It uses a curated voice library with voice blending capabilities. For voice cloning, use F5-TTS, Chatterbox, StyleTTS2, OpenVoice, or Tortoise.
Kokoro can mix two voices together to create unique combinations. This allows you to create custom voice characteristics without traditional voice cloning.
Both are fast, lightweight models. Kokoro has a more modern architecture and supports voice blending, while Piper has a larger voice library. Both are excellent for real-time applications.
Kokoro is designed to run on CPU and requires minimal resources - approximately 300MB. No GPU is needed, though GPU acceleration is supported for even faster processing.
Technical Specs
- Generation Speed Very Fast
- Output Quality Good
- Voice Cloning Not Supported
- Languages 9
- GPU VRAM CPU OK
- Credits/1000 chars 10