Flagship models

Models overview

The ElevenLabs API offers a range of speech synthesis models optimized for different use cases, quality levels, and performance requirements.

Model IDDescriptionLanguages
eleven_multilingual_v2Our most lifelike model with rich emotional expressionen, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru
eleven_flash_v2_5Ultra-fast model optimized for real-time use (~75ms†)All eleven_multilingual_v2 languages plus: hu, no, vi
eleven_flash_v2Ultra-fast model optimized for real-time use (~75ms†)en
eleven_multilingual_sts_v2State-of-the-art multilingual voice changer model (Speech to Speech)en, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru
eleven_english_sts_v2English-only voice changer model (Speech to Speech)en

These models are maintained for backward compatibility but are not recommended for new projects.

Model IDDescriptionLanguages
eleven_monolingual_v1First generation TTS model (outclassed by v2 models)en
eleven_multilingual_v1First multilingual model (outclassed by v2 models)en, fr, de, hi, it, pl, pt, es
eleven_turbo_v2_5High quality, low-latency model (~250ms-300ms) (outclassed by Flash models)en, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru, hu, no, vi
eleven_turbo_v2High quality, low-latency model (~250ms-300ms) (outclassed by Flash models)en

Multilingual v2

Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.

The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker’s unique characteristics and accent.

This model excels in scenarios requiring high-quality, emotionally nuanced speech:

  • Audiobook Production: Perfect for long-form narration with complex emotional delivery
  • Character Voiceovers: Ideal for gaming and animation due to its emotional range
  • Professional Content: Well-suited for corporate videos and e-learning materials
  • Multilingual Projects: Maintains consistent voice quality across language switches

While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.

Flash v2.5

Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms†) across 32 languages.

The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.

This model is particularly well-suited for:

  • Conversational AI: Perfect for real-time voice agents and chatbots
  • Interactive Applications: Ideal for games and applications requiring immediate response
  • Large-Scale Processing: Efficient for bulk text-to-speech conversion

With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages.

Model selection guide

Quality

Use eleven_multilingual_v2

Best for high-fidelity audio output with rich emotional expression

Low-latency

Use Flash models

Optimized for real-time applications (~75ms latency)

Multilingual

Use either either eleven_multilingual_v2 or eleven_flash_v2_5

Both support up to 32 languages

Content creation

Use eleven_multilingual_v2

Ideal for professional content, audiobooks & video narration.

Conversational AI

Use eleven_flash_v2_5, eleven_flash_v2 or eleven_multilingual_v2

Perfect for real-time conversational applications

Voice changer

Use eleven_multilingual_sts_v2

Specialized for Speech-to-Speech conversion

Supported languages

Our v2 models support 29 languages:

English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.

Flash v2.5 supports 32 languages - all languages from v2 models plus:

Hungarian, Norwegian & Vietnamese

Character limits

The maximum number of characters supported in a single request varies by model.

Model IDCharacter limitApproximate audio duration
eleven_flash_v2_540,000~40 minutes
eleven_flash_v230,000~30 minutes
eleven_multilingual_v210,000~10 minutes
eleven_multilingual_v110,000~10 minutes
eleven_english_sts_v210,000~10 minutes
eleven_english_sts_v110,000~10 minutes
For longer content, consider splitting the input into multiple requests.

Concurrency and priority

Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue.

PlanConcurrency limitPriority level
Free23
Starter34
Creator55
Pro105
Scale155
Business155
EnterpriseElevatedHighest

To increase your concurrency limit & queue priority, upgrade your subscription plan.

† Excluding application & network latency
Built with