How to choose the right model
ElevenLabs offers a range of models optimised for different requirements. The right choice depends on your use case, latency requirements, and quality expectations. Refer to the models reference for full specifications.
By requirement
Use eleven_v3
The flagship model with the highest fidelity, richest emotional expression, and broadest language support.
Use Flash models (eleven_flash_v2_5 or eleven_flash_v2)
Optimised for real-time applications with ~75ms latency.
Use eleven_v3 or eleven_flash_v2_5
Both support a wide range of languages.
Use eleven_flash_v2_5
High-quality output with low latency — the best all-round choice.
By use case
Use eleven_v3
Ideal for professional content, audiobooks, and video narration.
Use eleven_flash_v2_5 or eleven_flash_v2
Use the 2.5 model for language support outside of English.
Optimised for real-time conversational applications.
Use scribe_v2 for batch transcription or scribe_v2_realtime for real-time transcription.
State-of-the-art accuracy across 90+ languages with speaker diarisation and word-level timestamps.
Use eleven_multilingual_sts_v2
Specialised for Speech-to-Speech conversion.