Flagship Models

Models Overview

The ElevenLabs API offers a range of speech synthesis models optimized for different use cases, quality levels, and performance requirements.

Model IDDescriptionLanguages
eleven_multilingual_v2Our most lifelike model with rich emotional expression32
eleven_flash_v2_5Ultra-fast model optimized for real-time use (~75ms†)32
eleven_flash_v2Ultra-fast model optimized for real-time use (~75ms†)English
eleven_multilingual_sts_v2State-of-the-art multilingual voice changer model (Speech to Speech)29
eleven_english_sts_v2English-only voice changer model (Speech to Speech)English

Eleven Multilingual v2

Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.

The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker’s unique characteristics and accent.

This model excels in scenarios requiring high-quality, emotionally nuanced speech:

  • Audiobook Production: Perfect for long-form narration with complex emotional delivery
  • Character Voiceovers: Ideal for gaming and animation due to its emotional range
  • Professional Content: Well-suited for corporate videos and e-learning materials
  • Multilingual Projects: Maintains consistent voice quality across language switches

While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.

Eleven Flash v2.5

Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms†) across 32 languages.

The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.

This model is particularly well-suited for:

  • Conversational AI: Perfect for real-time voice agents and chatbots
  • Interactive Applications: Ideal for games and applications requiring immediate response
  • Large-Scale Processing: Efficient for bulk text-to-speech conversion

With its lower price point and 75ms latency, Flash v2.5 is the cost-effective choice for developers needing fast, reliable speech synthesis across multiple languages.

Model Selection Guide

Requirements

Quality: Use eleven_multilingual_v2

Low-latency: Use Eleven Flash models

Multilingual support: Use eleven_multilingual_v2 or eleven_flash_v2_5

Use Case

Content Creation: Use eleven_multilingual_v2

Conversational AI: Use eleven_flash_v2_5 or eleven_flash_v2

Voice Changer (Speech to Speech): Use eleven_multilingual_sts_v2

For detailed language support information and troubleshooting guidance, refer to our help center.

† Excluding application & network latency