How to choose the right model | ElevenLabs Documentation

ElevenLabs offers a range of models optimised for different requirements. The right choice depends on your use case, latency requirements, and quality expectations. Refer to the models reference for full specifications.

By requirement

Quality

Use eleven_v3

The flagship model with the highest fidelity, richest emotional expression, and broadest language support.

Low-latency

Use Flash models (eleven_flash_v2_5 or eleven_flash_v2)

Optimised for real-time applications with ~75ms latency.

Multilingual

Use eleven_v3 or eleven_flash_v2_5

Both support a wide range of languages.

Balanced

Use eleven_flash_v2_5

High-quality output with low latency — the best all-round choice.

By use case

Content creation

Use eleven_v3

Ideal for professional content, audiobooks, and video narration.

Conversational agents

Use eleven_flash_v2_5 or eleven_flash_v2

Use the 2.5 model for language support outside of English.

Optimised for real-time conversational applications.

Transcription

Use scribe_v2 for batch transcription or scribe_v2_realtime for real-time transcription.

State-of-the-art accuracy across 90+ languages with speaker diarisation and word-level timestamps.

Voice changer

Use eleven_multilingual_sts_v2

Specialised for Speech-to-Speech conversion.

Next steps

Models

View full model specifications, latency benchmarks, and feature comparisons.

Latency optimization

Reduce time-to-first-audio with model selection, voice choice, and geographic routing.