Pocket TTS
StandardUltra-lightweight voice cloning that runs real-time on CPU
Very Fast
Speed
Good
Quality
Yes
Cloning
2
Languages
About Pocket TTS
Pocket TTS by Kyutai is an ultra-lightweight 100M parameter text-to-speech model that runs in real-time on CPU. Despite its tiny size, it supports voice cloning from just 5 seconds of reference audio. Perfect for edge deployment, mobile applications, and scenarios where GPU resources are limited. Currently supports English and French.
Key Features
Ultra-Lightweight
100M parameters - runs real-time on CPU with minimal resources.
Voice Cloning
Clone any voice from just 5 seconds of reference audio, even on CPU.
Real-Time on CPU
No GPU required. Generates speech at real-time speed on standard hardware.
Edge-Ready
Small enough for mobile devices, Raspberry Pi, and embedded systems.
Use Cases
Edge and mobile deployment
Real-time voice assistants on CPU
IoT and embedded devices
Low-resource voice cloning
Frequently Asked Questions
Pocket TTS is an ultra-lightweight text-to-speech model from Kyutai with only 100 million parameters. It runs in real-time on CPU and supports voice cloning from 5 seconds of audio.
Pocket TTS is licensed under CC-BY-4.0, which allows commercial use with attribution. You must credit Kyutai when using it in commercial applications.
Currently Pocket TTS supports English and French. More languages may be added in future releases.
Yes! With only 100M parameters, Pocket TTS runs at real-time speed on standard CPU hardware. No GPU is needed, making it ideal for edge deployment and mobile applications.
Both are lightweight and run well on CPU. Pocket TTS uniquely supports voice cloning (Kokoro does not). Kokoro supports more languages (9 vs 2). Choose Pocket TTS if you need lightweight voice cloning, Kokoro if you need more language coverage.
Provide 5 seconds of reference audio. Pocket TTS extracts speaker characteristics and can generate new speech in that voice. Quality improves with longer references (up to 10 seconds).
Technical Specs
- Generation Speed Very Fast
- Output Quality Good
- Voice Cloning Supported
- Languages 2
- GPU VRAM CPU OK
- Credits/1000 chars 10