Models

Flagship models

Models overview

The ElevenLabs API offers a range of audio models optimized for different use cases, quality levels, and performance requirements.

Model IDDescriptionLanguages
eleven_multilingual_v2Our most lifelike model with rich emotional expressionen, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru
eleven_flash_v2_5Ultra-fast model optimized for real-time use (~75ms†)All eleven_multilingual_v2 languages plus: hu, no, vi
eleven_flash_v2Ultra-fast model optimized for real-time use (~75ms†)en
eleven_turbo_v2_5High quality, low-latency model with a good balance of quality and speed (~250ms-300ms)en, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru, hu, no, vi
eleven_turbo_v2High quality, low-latency model with a good balance of quality and speed (~250ms-300ms)en
eleven_multilingual_sts_v2State-of-the-art multilingual voice changer model (Speech to Speech)en, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru
eleven_english_sts_v2English-only voice changer model (Speech to Speech)en
scribe_v1State-of-the-art speech recognition model99 languages
scribe_v1_experimentalState-of-the-art speech recognition model with experimental features: improved multilingual performance, reduced hallucinations during silence, fewer audio tags, and better handling of early transcript termination99 languages
† Excluding application & network latency

These models are maintained for backward compatibility but are not recommended for new projects.

Model IDDescriptionLanguages
eleven_monolingual_v1First generation TTS model (outclassed by v2 models)en
eleven_multilingual_v1First multilingual model (outclassed by v2 models)en, fr, de, hi, it, pl, pt, es

Multilingual v2

Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.

The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker’s unique characteristics and accent.

This model excels in scenarios requiring high-quality, emotionally nuanced speech:

  • Audiobook Production: Perfect for long-form narration with complex emotional delivery
  • Character Voiceovers: Ideal for gaming and animation due to its emotional range
  • Professional Content: Well-suited for corporate videos and e-learning materials
  • Multilingual Projects: Maintains consistent voice quality across language switches

While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.

Our v2 models support 29 languages:

English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.

Flash v2.5

Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms†) across 32 languages.

The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.

This model is particularly well-suited for:

  • Conversational AI: Perfect for real-time voice agents and chatbots
  • Interactive Applications: Ideal for games and applications requiring immediate response
  • Large-Scale Processing: Efficient for bulk text-to-speech conversion

With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages.

Flash v2.5 supports 32 languages - all languages from v2 models plus:

Hungarian, Norwegian & Vietnamese

† Excluding application & network latency

Considerations

When using Flash v2.5, numbers aren’t normalized in a way you might expect. For example, phone numbers might be read out in way that isn’t clear for the user. Dates and currencies are affected in a similar manner.

This is expected as normalization is disabled for Flash v2.5 to maintain the low latency.

The Multilingual v2 model does a better job of normalizing numbers, so we recommend using it for phone numbers and other cases where number normalization is important.

For low-latency or Conversational AI applications, best practice is to have your LLM normalize the text before passing it to the TTS model.

Turbo v2.5

Eleven Turbo v2.5 is our high-quality, low-latency model with a good balance of quality and speed.

This model is an ideal choice for all scenarios where you’d use Flash v2.5, but where you’re willing to trade off latency for higher quality voice generation.

Model selection guide

Quality

Use eleven_multilingual_v2

Best for high-fidelity audio output with rich emotional expression

Low-latency

Use Flash models

Optimized for real-time applications (~75ms latency)

Multilingual

Use either either eleven_multilingual_v2 or eleven_flash_v2_5

Both support up to 32 languages

Balanced

Use eleven_turbo_v2_5

Good balance between quality and speed

Content creation

Use eleven_multilingual_v2

Ideal for professional content, audiobooks & video narration.

Conversational AI

Use eleven_flash_v2_5, eleven_flash_v2, eleven_multilingual_v2, eleven_turbo_v2_5 or eleven_turbo_v2

Perfect for real-time conversational applications

Voice changer

Use eleven_multilingual_sts_v2

Specialized for Speech-to-Speech conversion

Character limits

The maximum number of characters supported in a single text-to-speech request varies by model.

Model IDCharacter limitApproximate audio duration
eleven_flash_v2_540,000~40 minutes
eleven_flash_v230,000~30 minutes
eleven_turbo_v2_540,000~40 minutes
eleven_turbo_v230,000~30 minutes
eleven_multilingual_v210,000~10 minutes
eleven_multilingual_v110,000~10 minutes
eleven_english_sts_v210,000~10 minutes
eleven_english_sts_v110,000~10 minutes
For longer content, consider splitting the input into multiple requests.

Scribe v1

Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging.

This model excels in scenarios requiring accurate speech-to-text conversion:

  • Transcription Services: Perfect for converting audio/video content to text
  • Meeting Documentation: Ideal for capturing and documenting conversations
  • Content Analysis: Well-suited for audio content processing and analysis
  • Multilingual Recognition: Supports accurate transcription across 99 languages

Key features:

  • Accurate transcription with word-level timestamps
  • Speaker diarization for multi-speaker audio
  • Dynamic audio tagging for enhanced context
  • Support for 99 languages

Read more about Scribe v1 here.

Concurrency and priority

Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue. Speech to Text has an elevated concurrency limit. Once the concurrency limit is met, subsequent requests are processed in a queue alongside lower-priority requests. In practice this typically only adds ~50ms of latency.

PlanConcurrency Limit
(Multilingual v2)
Concurrency Limit
(Turbo & Flash)
STT Concurrency LimitPriority level
Free24103
Starter36154
Creator510255
Pro1020505
Scale1530755
Business1530755
EnterpriseElevatedElevatedElevatedHighest

The response headers include current-concurrent-requests and maximum-concurrent-requests which you can use to monitor your concurrency.

How endpoint requests are made impacts concurrency limits:

  • With HTTP, each request counts individually toward your concurrency limit.
  • With a WebSocket, only the time where our model is generating audio counts towards your concurrency limit, this means a for most of the time an open websocket doesn’t count towards your concurrency limit at all.

Understanding concurrency limits

The concurrency limit associated with your plan should not be interpreted as the maximum number of simultaneous conversations, phone calls character voiceovers, etc that can be handled at once. The actual number depends on several factors, including the specific AI voices used and the characteristics of the use case.

As a general rule of thumb, a concurrency limit of 5 can typically support up to approximately 100 simultaneous audio broadcasts.

This is because of the speed it takes for audio to be generated relative to the time it takes for the TTS request to be processed. The diagram below is an example of how 4 concurrent calls with different users can be facilitated while only hitting 2 concurrent requests.

Concurrency limits

Where TTS is used to facilitate dialogue, a concurrency limit of 5 can support about 100 broadcasts for balanced conversations between AI agents and human participants.

For use cases in which the AI agent speaks less frequently than the human, such as customer support interactions, more than 100 simultaneous conversations could be supported.

Generally, more than 100 simultaneous character voiceovers can be supported for a concurrency limit of 5.

The number can vary depending on the character’s dialogue frequency, the length of pauses, and in-game actions between lines.

Concurrent dubbing streams generally follow the provided heuristic.

If the broadcast involves periods of conversational pauses (e.g. because of a soundtrack, visual scenes, etc), more simultaneous dubbing streams than the suggestion may be possible.

If you exceed your plan’s concurrency limits at any point and you are on the Enterprise plan, model requests may still succeed, albeit slower, on a best efforts basis depending on available capacity.

To increase your concurrency limit & queue priority, upgrade your subscription plan.

Enterprise customers can request a higher concurrency limit by contacting their account manager.