ElevenLabs Documentation
Explore our docs and guides to integrate ElevenLabs
Meet the models
Eleven v3
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
5,000 character limit
Support for natural multi-speaker dialogue
Eleven Multilingual v2
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Eleven Flash v2.5
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
Scribe v2
State-of-the-art speech recognition model
Accurate transcription in 90+ languages
Keyterm prompting, up to 100 terms
Entity detection, up to 56
Precise word-level timestamps
Speaker diarization, up to 32 speakers
Dynamic audio tagging
Smart language detection
Scribe v2 Realtime
Real-time speech recognition model
Accurate transcription in 90+ languages
Real-time transcription
Low latency (~150ms†)
Precise word-level timestamps
Browse by capability
Text to Speech
Convert text into lifelike speech
Speech to Text
Transcribe spoken audio into text
Music
Generate music from text
Text to Dialogue
Create natural-sounding dialogue from text
Image & Video
Generate images and videos from text
Voice changer
Modify and transform voices
Voice isolator
Isolate voices from background noise
Dubbing
Dub audio and videos seamlessly
Sound effects
Create cinematic sound effects
Voices
Clone and design custom voices
Voice Remixing
Transform and enhance existing voices
Forced Alignment
Align text to audio
ElevenAgents
Deploy intelligent voice agents


