Models
Flagship models
Our most lifelike, emotionally rich speech synthesis model
Our fast, affordable speech synthesis model
Models overview
The ElevenLabs API offers a range of speech synthesis models optimized for different use cases, quality levels, and performance requirements.
Older Models
These models are maintained for backward compatibility but are not recommended for new projects.
Multilingual v2
Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker’s unique characteristics and accent.
This model excels in scenarios requiring high-quality, emotionally nuanced speech:
- Audiobook Production: Perfect for long-form narration with complex emotional delivery
- Character Voiceovers: Ideal for gaming and animation due to its emotional range
- Professional Content: Well-suited for corporate videos and e-learning materials
- Multilingual Projects: Maintains consistent voice quality across language switches
While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.
Flash v2.5
Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms†) across 32 languages.
The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.
This model is particularly well-suited for:
- Conversational AI: Perfect for real-time voice agents and chatbots
- Interactive Applications: Ideal for games and applications requiring immediate response
- Large-Scale Processing: Efficient for bulk text-to-speech conversion
With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages.
Model selection guide
Requirements
Use eleven_multilingual_v2
Best for high-fidelity audio output with rich emotional expression
Use Flash models
Optimized for real-time applications (~75ms latency)
Use either either eleven_multilingual_v2
or eleven_flash_v2_5
Both support up to 32 languages
Use case
Use eleven_multilingual_v2
Ideal for professional content, audiobooks & video narration.
Use eleven_flash_v2_5
, eleven_flash_v2
or eleven_multilingual_v2
Perfect for real-time conversational applications
Use eleven_multilingual_sts_v2
Specialized for Speech-to-Speech conversion
Supported languages
Our v2 models support 29 languages:
English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.
Flash v2.5 supports 32 languages - all languages from v2 models plus:
Hungarian, Norwegian & Vietnamese
Character limits
The maximum number of characters supported in a single request varies by model.
Concurrency and priority
Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue.
To increase your concurrency limit & queue priority, upgrade your subscription plan.