Dia

Ultra

Dialogue-oriented TTS with voice cloning and nonverbal sounds

Medium Speed
Excellent Quality
Yes Cloning
1 Languages

About Dia

Dia by Nari Labs is a 1.6B parameter dialogue-focused text-to-speech model. It excels at generating natural conversational speech with support for nonverbal sounds like laughter, sighs, and coughs. Dia supports multi-speaker dialogue generation and voice cloning from 5-10 seconds of reference audio, making it ideal for creating realistic conversations and character voices.

Key Features

Dialogue Generation

Generate natural multi-speaker conversations with distinct voices and turn-taking.

Nonverbal Sounds

Add [laughs], [sighs], [coughs], (gasps) for natural paralinguistic expression.

Voice Cloning

Clone any voice from 5-10 seconds of reference audio for personalized speech.

Natural Conversation

1.6B parameters produce highly natural conversational prosody and intonation.

Use Cases

Dialogue and conversation generation Audiobook production with multiple characters Game character voices Podcast and content creation

Frequently Asked Questions

Dia is a 1.6B parameter dialogue-oriented text-to-speech model from Nari Labs. It specializes in generating natural conversational speech with support for multiple speakers, nonverbal sounds, and voice cloning.

Yes, Dia is fully Apache 2.0 licensed - both code and model weights. It can be used freely in commercial applications.

Currently Dia supports English only. The model is optimized for natural English conversational speech.

Dia supports special tags in your text: [laughs] for laughter, [sighs] for sighing, [coughs] for coughing, and (gasps) for gasping. These add natural non-verbal sounds to make speech more realistic.

Use [S1] and [S2] tags to mark different speakers in your text. Dia generates distinct voices for each speaker with natural turn-taking and conversational dynamics.

Both support nonverbal sounds, but Dia is specifically designed for dialogue with multi-speaker support. Dia produces more natural conversation with better turn-taking. Bark supports more languages but Dia has better dialogue quality.

Dia requires approximately 10GB of VRAM for its 1.6B parameter model. A GPU with at least 12GB is recommended for comfortable operation.

Technical Specs

  • Generation Speed Medium
  • Output Quality Excellent
  • Voice Cloning Supported
  • Languages 1
  • GPU VRAM 10GB
  • Credits/1000 chars 50

Try Dia Now

Generate your first audio free. No credit card required.

Start Free