ElevenLabs offers a range of models optimised for different requirements. The right choice depends on your use case, latency requirements, and quality expectations. Refer to the models reference for full specifications.
Use eleven_v3
The flagship model with the highest fidelity, richest emotional expression, and broadest language support.
Use Flash models (eleven_flash_v2_5 or eleven_flash_v2)
Optimised for real-time applications with ~75ms latency.
Use eleven_v3 or eleven_flash_v2_5
Both support a wide range of languages.
Use eleven_flash_v2_5
High-quality output with low latency — the best all-round choice.
Use eleven_v3
Ideal for professional content, audiobooks, and video narration.
Use eleven_flash_v2_5 or eleven_flash_v2
Use the 2.5 model for language support outside of English.
Optimised for real-time conversational applications.
Use scribe_v2 for batch transcription or scribe_v2_realtime for real-time transcription.
State-of-the-art accuracy across 90+ languages with speaker diarisation and word-level timestamps.
Use eleven_multilingual_sts_v2
Specialised for Speech-to-Speech conversion.