As of December 2024, ElevenLabs offers two families of models: standard (high-quality) models and Flash models, which are optimized for low latency. Each family includes both English-only and multilingual models, tailored for specific use cases with strengths in either speed, accuracy, or language diversity.

  • Standard models (Multilingual v2, Multilingual v1, English v1) are optimized for quality and accuracy, ideal for content creation. These models offer the best quality and stability but have higher latency.
  • Flash models (v2.5 Flash, v2 Flash) are designed for low-latency applications like real-time conversational AI. They deliver great performance with faster processing speeds, though with a slight trade-off in accuracy and stability.

If you want to find more detailed specifications about which languages each model offers, you can find all that information in our help article here.

For advice on how to deal with issues that might arise, please see our guide to troubleshooting.

Model Selection

Standard Models

Eleven Multilingual v2

ur most advanced speech synthesis model, Multilingual v2, offers high stability, diverse language support, and exceptional accuracy in 29 languages. While slower than the Flash models, it delivers more lifelike speech, making it ideal for content creation such as voiceovers, audiobooks, and post-production.

  • English (UK)
  • English (USA)
  • English (Australia)
  • English (Canada)
  • Japanese
  • Chinese
  • German
  • Hindi
  • French (France)
  • French (Canada)
  • Korean
  • Portuguese (Brazil)
  • Portuguese (Portugal)
  • Italian
  • Spanish (Spain)
  • Spanish (Mexico)
  • Indonesian
  • Dutch
  • Turkish
  • Filipino
  • Polish
  • Swedish
  • Bulgarian
  • Romanian
  • Arabic (Saudi Arabia)
  • Arabic (UAE)
  • Czech
  • Greek
  • Finnish
  • Croatian
  • Malay
  • Slovak
  • Danish
  • Tamil
  • Ukrainian
  • Russian

Important notes: The accuracy of this model depends heavily on the quality of the input samples. Lower-quality samples can introduce errors, which the AI might attempt to replicate. For the best results, use high-quality, consistent voice samples, especially when trying to preserve accents or tonal details across languages.

  • Best quality
  • Unparalleled accuracy
  • More stable
  • Higher latency

Eleven English v1

Our very first model, English v1, laid the groundwork for future advancements. While still functional, it is now outclassed by Multilingual v2 (for content creation) and Flash v2 (for low-latency applications). We recommend using our newer models for better quality and speed.

Eleven Multilingual v1

Multilingual v1 was our first attempt at generating speech in multiple languages, but it is now considered experimental and has been surpassed by Multilingual v2 and Flash v2.5. We recommend avoiding this model for production use due to its limitations and lower accuracy.

  • English (USA)
  • English (UK)
  • English (Australia)
  • English (Canada)
  • German
  • Polish
  • Spanish (Spain)
  • Spanish (Mexico)
  • Italian
  • French (France)
  • French (Canada)
  • Portuguese (Portugal)
  • Portuguese (Brazil)
  • Hindi

Flash Models

Eleven v2.5 Flash

v2.5 Flash generates speech in 32 languages with low latency, optimized for real-time conversational AI use cases. This model is much faster than Multilingual v2 and now supports new languages such as Vietnamese, Hungarian, and Norwegian. It is best for developers requiring rapid, natural speech across multiple languages, but it lacks the stylistic range of Multilingual v2.

Model latency is as low as 75ms (excl. network), making it ideal for real-time interactions.

  • Great quality
  • High accuracy with Professional Voice Clones
  • Slightly less stable
  • Optimized for low latency
  • English (USA)
  • English (UK)
  • English (Australia)
  • English (Canada)
  • Japanese
  • Chinese
  • German
  • Hindi
  • French (France)
  • French (Canada)
  • Korean
  • Portuguese (Brazil)
  • Portuguese (Portugal)
  • Italian
  • Spanish (Spain)
  • Spanish (Mexico)
  • Indonesian
  • Dutch
  • Turkish
  • Filipino
  • Polish
  • Swedish
  • Bulgarian
  • Romanian
  • Arabic (Saudi Arabia)
  • Arabic (UAE)
  • Czech
  • Greek
  • Finnish
  • Croatian
  • Malay
  • Slovak
  • Danish
  • Tamil
  • Ukrainian
  • Russian
  • Hungarian
  • Norwegian
  • Vietnamese

Eleven Flash v2

A low-latency, English-only model optimized for conversational applications. Flash v2 is similar in performance to Flash v2.5 but focused exclusively on English, making it ideal for English-only use cases where speed is critical.

  • Great quality
  • High accuracy with Professional Voice Clones
  • Slightly less stable
  • Optimized for low latency
  • English (USA)
  • English (UK)
  • English (Australia)
  • English (Canada)
Built with