ElevenLabs Documentation

Explore our docs and guides to integrate ElevenLabs

How ElevenLabs works

ElevenLabs provides AI voice infrastructure: text-to-speech, speech-to-text, voice cloning, conversational agents, and generative audio. All capabilities are accessible through a REST API with official Python and TypeScript SDKs, and through a web application for no-code use.

Voices are the speech persona used in audio generation. Each voice has a unique ID — for example, JBFqnCBsd6RMkjVDRZzb — that you pass in every API request. ElevenLabs maintains a library of 5,000+ voices. You can also clone a voice from an audio recording or generate one from a text description.

Models control the quality, latency, and language coverage of generated audio. eleven_v3 produces the most expressive output across 70+ languages. eleven_flash_v2_5 targets real-time use at ~75ms latency. Each capability — speech-to-text, music, sound effects — has its own dedicated model.

Credits are the unit of API consumption. Text-to-speech costs one credit per character of input text. Other operations are charged per second of audio processed. Credits reset monthly and unused credits roll over for up to two months. See pricing for a full breakdown.

Choose your path

Meet the models

† Excluding application & network latency

Browse by capability