Moderate
Speed
Very Good
Quality
No
Cloning
1
Languages
About Parler-TTS
Parler-TTS is a unique text-to-speech model that generates voices based on text descriptions. Instead of selecting from pre-defined voices, you describe the voice you want: "A young woman speaks clearly with an American accent" or "An elderly British man speaks slowly in a deep voice." Parler-TTS then generates speech matching your description.
Key Features
Text Descriptions
Generate voices by describing desired characteristics.
Creative Control
Specify age, gender, accent, speed, and speaking style.
Unique Voices
Create voices that do not exist in pre-made libraries.
Natural Output
Generates high-quality, natural-sounding speech.
Efficient
Fast inference for described voice generation.
Open Source
Apache 2.0 licensed for commercial use.
Use Cases
Character Voice Design
Creative Projects
Prototype Voiceovers
Game Development
Audiobook Characters
Custom Voice Creation
Parler-TTS Voices
View All 10American Female
ENAmerican Male
ENBritish Female
ENBritish Male
ENCalm Voice
ENCheerful Voice
ENConversational Voice
ENFemale Narrator
ENMale Narrator
ENProfessional Voice
ENFrequently Asked Questions
Parler-TTS is a text-to-speech model that generates voices from text descriptions. Instead of choosing pre-made voices, you describe what you want: "A calm, mature woman with an Australian accent speaking at a moderate pace."
Parler-TTS is open-source under Apache 2.0 license. On TextToSpeechAI, we charge 25 credits per 1000 characters (Premium tier) for its unique voice generation capabilities.
Parler-TTS primarily supports English. The voice descriptions work best in English, though the model can handle various English accents (American, British, Australian, etc.).
Describe voice characteristics naturally: "A young woman speaks clearly with a British accent" or "An elderly man with a deep voice speaks slowly and carefully." Include age, gender, accent, speed, and mood.
Parler-TTS has moderate generation speed, typically 2-5 seconds per sentence on GPU. The voice description processing adds minimal overhead compared to the actual speech generation.
No, Parler-TTS generates voices from descriptions rather than cloning existing voices. For voice cloning, use StyleTTS2, F5-TTS, OpenVoice, or Tortoise.
Parler-TTS produces very good quality audio. The speech sounds natural with appropriate prosody matching the described characteristics. Quality is comparable to F5-TTS.
Parler-TTS requires 4-8GB of VRAM depending on the model size. The mini version works with 4GB, while the full model benefits from 8GB for optimal performance.
Yes, Parler-TTS is Apache 2.0 licensed and supports commercial use. Since voices are generated from descriptions, there are no voice ownership concerns.
Include your voice description in the API request along with your text. Our API processes the description and generates matching speech. You can save favorite descriptions for reuse.
Parler-TTS outputs WAV audio natively. Through TextToSpeechAI, you can request MP3, WAV, or OGG formats with automatic conversion.
Parler-TTS is unique in generating voices from descriptions - no other model offers this. Use it for creative voice design. For existing voice replication, use F5-TTS or other cloning models.
Technical Specs
- Generation Speed Moderate
- Output Quality Very Good
- Voice Cloning Not Supported
- Languages 1
- GPU VRAM 4-8GB
- Credits/1000 chars 25