> This is a page from the ElevenLabs documentation. For a complete page index, fetch https://elevenlabs.io/docs/llms.txt. For the full documentation in a single file, fetch https://elevenlabs.io/docs/llms-full.txt. # ElevenLabs Documentation ## How ElevenLabs works ElevenLabs provides AI voice infrastructure: text-to-speech, speech-to-text, voice cloning, conversational agents, and generative audio. All capabilities are accessible through a REST API with official Python and TypeScript SDKs, and through a web application for no-code use. **Voices** are the speech persona used in audio generation. Each voice has a unique ID — for example, `JBFqnCBsd6RMkjVDRZzb` — that you pass in every API request. ElevenLabs maintains a [library of 10,000+ voices](https://elevenlabs.io/app/voice-library). You can also clone a voice from an audio recording or generate one from a text description. **Models** control the quality, latency, and language coverage of generated audio. [`eleven_v3`](/docs/overview/models) produces the most expressive output across 70+ languages. [`eleven_flash_v2_5`](/docs/overview/models) targets real-time use at \~75ms latency. Each capability — speech-to-text, music, sound effects — has its own dedicated model. **Credits** are the unit of API consumption. Text-to-speech costs one credit per character of input text. Other operations are charged per second of audio processed. Credits reset monthly and unused credits roll over for up to two months. See [pricing](https://elevenlabs.io/pricing/api) for a full breakdown. ## Choose your path

ElevenCreative

Learn how to use the ElevenCreative platform with step-by-step guides

ElevenAgents

Learn how to build, launch, and scale agents with ElevenLabs

ElevenAPI

Learn how to integrate with the ElevenLabs API with examples and tutorials

## Meet the models Our most emotionally rich, expressive speech synthesis model Dramatic delivery and performance 70+ languages supported 5,000 character limit Support for natural multi-speaker dialogue Lifelike, consistent quality speech synthesis model Natural-sounding output 29 languages supported 10,000 character limit Most stable on long-form generations Our fast, affordable speech synthesis model Ultra-low latency (\~75ms†) 32 languages supported 40,000 character limit Faster model, 50% lower price per character for API generations State-of-the-art speech recognition model Accurate transcription in 90+ languages Keyterm prompting, up to 1000 terms Entity detection, up to 56 Precise word-level timestamps Speaker diarization, up to 32 speakers Dynamic audio tagging Smart language detection Real-time speech recognition model Accurate transcription in 90+ languages Real-time transcription Low latency (\~150ms†) Precise word-level timestamps Explore all † Excluding application & network latency ## Browse by capability Text to Speech

Convert text into lifelike speech

Speech to Text

Transcribe spoken audio into text

Music

Generate music from text

Text to Dialogue

Create natural-sounding dialogue from text

Image & Video

Generate images and videos from text

Voice changer

Modify and transform voices

Voice isolator

Isolate voices from background noise

Dubbing

Dub audio and videos seamlessly

Sound effects

Create cinematic sound effects

Voices

Clone and design custom voices

Voice Remixing

Transform and enhance existing voices

Forced Alignment

Align text to audio

ElevenAgents

Deploy intelligent voice agents