Dia

Ultra

Dialogue-oriented TTS with voice cloning and nonverbal sounds

Medium Speed

Excellent Quality

Yes Cloning

1 Languages

About Dia

Dia by Nari Labs is a 1.6B parameter dialogue-focused text-to-speech model. It excels at generating natural conversational speech with support for nonverbal sounds like laughter, sighs, and coughs. Dia supports multi-speaker dialogue generation and voice cloning from 5-10 seconds of reference audio, making it ideal for creating realistic conversations and character voices.

Key Features

Dialogue Generation

Generate natural multi-speaker conversations with distinct voices and turn-taking.

Nonverbal Sounds

Add [laughs], [sighs], [coughs], (gasps) for natural paralinguistic expression.

Voice Cloning

Clone any voice from 5-10 seconds of reference audio for personalized speech.

Natural Conversation

1.6B parameters produce highly natural conversational prosody and intonation.

Use Cases

Dialogue and conversation generation Audiobook production with multiple characters Game character voices Podcast and content creation

Frequently Asked Questions

Dia is a 1.6B parameter dialogue-oriented text-to-speech model from Nari Labs. It specializes in generating natural conversational speech with support for multiple speakers, nonverbal sounds, and voice cloning.

Yes, Dia is fully Apache 2.0 licensed - both code and model weights. It can be used freely in commercial applications.

Currently Dia supports English only. The model is optimized for natural English conversational speech.

Dia supports special tags in your text: [laughs] for laughter, [sighs] for sighing, [coughs] for coughing, and (gasps) for gasping. These add natural non-verbal sounds to make speech more realistic.

Use [S1] and [S2] tags to mark different speakers in your text. Dia generates distinct voices for each speaker with natural turn-taking and conversational dynamics.

Both support nonverbal sounds, but Dia is specifically designed for dialogue with multi-speaker support. Dia produces more natural conversation with better turn-taking. Bark supports more languages but Dia has better dialogue quality.

Dia requires approximately 10GB of VRAM for its 1.6B parameter model. A GPU with at least 12GB is recommended for comfortable operation.

Technical Specs

Generation Speed Medium
Output Quality Excellent
Voice Cloning Supported
Languages 1
GPU VRAM 10GB
Credits/1000 chars 50

Try Dia Now

Generate your first audio free. No credit card required.

Start Free

Other TTS Engines

Dia

About Dia

Key Features

Dialogue Generation

Nonverbal Sounds

Voice Cloning

Natural Conversation

Use Cases

Frequently Asked Questions

What is Dia TTS?

Is Dia free to use commercially?

What languages does Dia support?

What are nonverbal tags in Dia?

How does multi-speaker dialogue work?

How does Dia compare to Bark?

How much GPU memory does Dia need?

Technical Specs

Try Dia Now

Other TTS Engines

Bark

Chatterbox

CosyVoice2