Parler-TTS

Premium

Text-Described Voice Generation

Moderate Speed
Very Good Quality
No Cloning
1 Languages

About Parler-TTS

Parler-TTS is a unique text-to-speech model that generates voices based on text descriptions. Instead of selecting from pre-defined voices, you describe the voice you want: "A young woman speaks clearly with an American accent" or "An elderly British man speaks slowly in a deep voice." Parler-TTS then generates speech matching your description.

Key Features

Text Descriptions

Generate voices by describing desired characteristics.

Creative Control

Specify age, gender, accent, speed, and speaking style.

Unique Voices

Create voices that do not exist in pre-made libraries.

Natural Output

Generates high-quality, natural-sounding speech.

Efficient

Fast inference for described voice generation.

Open Source

Apache 2.0 licensed for commercial use.

Use Cases

Character Voice Design Creative Projects Prototype Voiceovers Game Development Audiobook Characters Custom Voice Creation

Parler-TTS Voices

View All 10
American Female
EN
American Male
EN
British Female
EN
British Male
EN
Calm Voice
EN
Cheerful Voice
EN
Conversational Voice
EN
Female Narrator
EN
Male Narrator
EN
Professional Voice
EN

Frequently Asked Questions

Parler-TTS is a text-to-speech model that generates voices from text descriptions. Instead of choosing pre-made voices, you describe what you want: "A calm, mature woman with an Australian accent speaking at a moderate pace."

Parler-TTS is open-source under Apache 2.0 license. On TextToSpeechAI, we charge 25 credits per 1000 characters (Premium tier) for its unique voice generation capabilities.

Parler-TTS primarily supports English. The voice descriptions work best in English, though the model can handle various English accents (American, British, Australian, etc.).

Describe voice characteristics naturally: "A young woman speaks clearly with a British accent" or "An elderly man with a deep voice speaks slowly and carefully." Include age, gender, accent, speed, and mood.

Parler-TTS has moderate generation speed, typically 2-5 seconds per sentence on GPU. The voice description processing adds minimal overhead compared to the actual speech generation.

No, Parler-TTS generates voices from descriptions rather than cloning existing voices. For voice cloning, use StyleTTS2, F5-TTS, OpenVoice, or Tortoise.

Parler-TTS produces very good quality audio. The speech sounds natural with appropriate prosody matching the described characteristics. Quality is comparable to F5-TTS.

Parler-TTS requires 4-8GB of VRAM depending on the model size. The mini version works with 4GB, while the full model benefits from 8GB for optimal performance.

Yes, Parler-TTS is Apache 2.0 licensed and supports commercial use. Since voices are generated from descriptions, there are no voice ownership concerns.

Include your voice description in the API request along with your text. Our API processes the description and generates matching speech. You can save favorite descriptions for reuse.

Parler-TTS outputs WAV audio natively. Through TextToSpeechAI, you can request MP3, WAV, or OGG formats with automatic conversion.

Parler-TTS is unique in generating voices from descriptions - no other model offers this. Use it for creative voice design. For existing voice replication, use F5-TTS or other cloning models.

Technical Specs

  • Generation Speed Moderate
  • Output Quality Very Good
  • Voice Cloning Not Supported
  • Languages 1
  • GPU VRAM 4-8GB
  • Credits/1000 chars 25

Try Parler-TTS Now

Generate your first audio free. No credit card required.

Start Free