Models
Flagship Models
Our most lifelike, emotionally rich speech synthesis model
Our fast, affordable speech synthesis model
Models Overview
The ElevenLabs API offers a range of speech synthesis models optimized for different use cases, quality levels, and performance requirements.
Eleven Multilingual v2
Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker’s unique characteristics and accent.
This model excels in scenarios requiring high-quality, emotionally nuanced speech:
- Audiobook Production: Perfect for long-form narration with complex emotional delivery
- Character Voiceovers: Ideal for gaming and animation due to its emotional range
- Professional Content: Well-suited for corporate videos and e-learning materials
- Multilingual Projects: Maintains consistent voice quality across language switches
While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.
Eleven Flash v2.5
Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms†) across 32 languages.
The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.
This model is particularly well-suited for:
- Conversational AI: Perfect for real-time voice agents and chatbots
- Interactive Applications: Ideal for games and applications requiring immediate response
- Large-Scale Processing: Efficient for bulk text-to-speech conversion
With its lower price point and 75ms latency, Flash v2.5 is the cost-effective choice for developers needing fast, reliable speech synthesis across multiple languages.
Model Selection Guide
Requirements
Quality: Use eleven_multilingual_v2
Low-latency: Use Eleven Flash models
Multilingual support: Use eleven_multilingual_v2
or eleven_flash_v2_5
Use Case
Content Creation: Use eleven_multilingual_v2
Conversational AI: Use eleven_flash_v2_5
or eleven_flash_v2
Voice Changer (Speech to Speech): Use eleven_multilingual_sts_v2
For detailed language support information and troubleshooting guidance, refer to our help center.
† Excluding application & network latency