Pocket TTS

Standard

Ultra-lightweight voice cloning that runs real-time on CPU

Very Fast Speed
Good Quality
Yes Cloning
2 Languages

About Pocket TTS

Pocket TTS by Kyutai is an ultra-lightweight 100M parameter text-to-speech model that runs in real-time on CPU. Despite its tiny size, it supports voice cloning from just 5 seconds of reference audio. Perfect for edge deployment, mobile applications, and scenarios where GPU resources are limited. Currently supports English and French.

Key Features

Ultra-Lightweight

100M parameters - runs real-time on CPU with minimal resources.

Voice Cloning

Clone any voice from just 5 seconds of reference audio, even on CPU.

Real-Time on CPU

No GPU required. Generates speech at real-time speed on standard hardware.

Edge-Ready

Small enough for mobile devices, Raspberry Pi, and embedded systems.

Use Cases

Edge and mobile deployment Real-time voice assistants on CPU IoT and embedded devices Low-resource voice cloning

Frequently Asked Questions

Pocket TTS is an ultra-lightweight text-to-speech model from Kyutai with only 100 million parameters. It runs in real-time on CPU and supports voice cloning from 5 seconds of audio.

Pocket TTS is licensed under CC-BY-4.0, which allows commercial use with attribution. You must credit Kyutai when using it in commercial applications.

Currently Pocket TTS supports English and French. More languages may be added in future releases.

Yes! With only 100M parameters, Pocket TTS runs at real-time speed on standard CPU hardware. No GPU is needed, making it ideal for edge deployment and mobile applications.

Both are lightweight and run well on CPU. Pocket TTS uniquely supports voice cloning (Kokoro does not). Kokoro supports more languages (9 vs 2). Choose Pocket TTS if you need lightweight voice cloning, Kokoro if you need more language coverage.

Provide 5 seconds of reference audio. Pocket TTS extracts speaker characteristics and can generate new speech in that voice. Quality improves with longer references (up to 10 seconds).

Technical Specs

  • Generation Speed Very Fast
  • Output Quality Good
  • Voice Cloning Supported
  • Languages 2
  • GPU VRAM CPU OK
  • Credits/1000 chars 10

Try Pocket TTS Now

Generate your first audio free. No credit card required.

Start Free