# ElevenLabs > ElevenLabs is an AI audio research and deployment company. {/* Light mode wave */}
{/* Dark mode wave */}
## Most popular Learn how to integrate ElevenLabs Deploy voice agents in minutes Learn how to use ElevenLabs Dive into our API reference ## Meet the models Our most lifelike, emotionally rich speech synthesis model
Most natural-sounding output
29 languages supported
10,000 character limit
Rich emotional expression
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Explore all](/docs/models)
## Capabilities
Text to Speech

Convert text into lifelike speech

Speech to Text

Transcribe spoken audio into text

Voice changer

Modify and transform voices

Voice isolator

Isolate voices from background noise

Dubbing

Dub audio and videos seamlessly

Sound effects

Create cinematic sound effects

Voices

Clone and design custom voices

Conversational AI

Deploy intelligent voice agents

## Product guides
Product guides

Explore our product guides for step-by-step guidance

Voice library
† Excluding application & network latency # Developer quickstart > Learn how to make your first ElevenLabs API request. The ElevenLabs API provides a simple interface to state-of-the-art audio [models](/docs/models) and [features](/docs/api-reference/introduction). Follow this guide to learn how to create lifelike speech with our Text to Speech API. See the [developer guides](/docs/quickstart#explore-our-developer-guides) for more examples with our other products. ## Using the Text to Speech API [Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication). Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference. ```js title=".env" ELEVENLABS_API_KEY= ``` We'll also use the `dotenv` library to load our API key from an environment variable. ```python pip install elevenlabs pip install python-dotenv ``` ```typescript npm install elevenlabs npm install dotenv ``` To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/) and/or [ffmpeg](https://ffmpeg.org/). Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code: ```python from dotenv import load_dotenv from elevenlabs.client import ElevenLabs from elevenlabs import play load_dotenv() client = ElevenLabs( api_key=os.getenv("ELEVENLABS_API_KEY"), ) audio = client.text_to_speech.convert( text="The first move is what sets everything in motion.", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) play(audio) ``` ```typescript import { ElevenLabsClient, play } from "elevenlabs"; import "dotenv/config"; const client = new ElevenLabsClient(); const audio = await client.textToSpeech.convert("JBFqnCBsd6RMkjVDRZzb", { text: "The first move is what sets everything in motion.", model_id: "eleven_multilingual_v2", output_format: "mp3_44100_128", }); await play(audio); ``` ```python python example.py ``` ```typescript npx tsx example.mts ``` You should hear the audio play through your speakers. ## Explore our developer guides Now that you've made your first ElevenLabs API request, you can explore the other products that ElevenLabs offers. Convert spoken audio into text Deploy conversational voice agents Clone a voice Generate sound effects from text Transform the voice of an audio file Isolate background noise from audio Generate voices from a single text prompt Dub audio/video from one language to another Generate time-aligned transcripts for audio # Models > Learn about the models that power the ElevenLabs API. ## Flagship models Our most lifelike, emotionally rich speech synthesis model
Most natural-sounding output
29 languages supported
10,000 character limit
Rich emotional expression
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Pricing](https://elevenlabs.io/pricing/api)
## Models overview The ElevenLabs API offers a range of audio models optimized for different use cases, quality levels, and performance requirements. | Model ID | Description | Languages | | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `eleven_multilingual_v2` | Our most lifelike model with rich emotional expression | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` | | `eleven_flash_v2_5` | Ultra-fast model optimized for real-time use (\~75ms†) | All `eleven_multilingual_v2` languages plus: `hu`, `no`, `vi` | | `eleven_flash_v2` | Ultra-fast model optimized for real-time use (\~75ms†) | `en` | | `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` | | `eleven_english_sts_v2` | English-only voice changer model (Speech to Speech) | `en` | | `scribe_v1` | State-of-the-art speech recognition model | [99 languages](/docs/capabilities/speech-to-text#supported-languages) | | `scribe_v1_experimental` | State-of-the-art speech recognition model with experimental features: improved multilingual performance, reduced hallucinations during silence, fewer audio tags, and better handling of early transcript termination | [99 languages](/docs/capabilities/speech-to-text#supported-languages) | † Excluding application & network latency These models are maintained for backward compatibility but are not recommended for new projects. | Model ID | Description | Languages | | ------------------------ | ---------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `eleven_monolingual_v1` | First generation TTS model (outclassed by v2 models) | `en` | | `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models) | `en`, `fr`, `de`, `hi`, `it`, `pl`, `pt`, `es` | | `eleven_turbo_v2_5` | High quality, low-latency model (\~250ms-300ms) (outclassed by Flash models) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`, `hu`, `no`, `vi` | | `eleven_turbo_v2` | High quality, low-latency model (\~250ms-300ms) (outclassed by Flash models) | `en` | ## Multilingual v2 Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages. The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent. This model excels in scenarios requiring high-quality, emotionally nuanced speech: * **Audiobook Production**: Perfect for long-form narration with complex emotional delivery * **Character Voiceovers**: Ideal for gaming and animation due to its emotional range * **Professional Content**: Well-suited for corporate videos and e-learning materials * **Multilingual Projects**: Maintains consistent voice quality across language switches While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important. Our v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* ## Flash v2.5 Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (\~75ms†) across 32 languages. The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages. This model is particularly well-suited for: * **Conversational AI**: Perfect for real-time voice agents and chatbots * **Interactive Applications**: Ideal for games and applications requiring immediate response * **Large-Scale Processing**: Efficient for bulk text-to-speech conversion With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages. Flash v2.5 supports 32 languages - all languages from v2 models plus: *Hungarian, Norwegian & Vietnamese* ### Considerations When using Flash v2.5, numbers aren't normalized in a way you might expect. For example, phone numbers might be read out in way that isn't clear for the user. Dates and currencies are affected in a similar manner. This is expected as normalization is disabled for Flash v2.5 to maintain the low latency. The Multilingual v2 model does a better job of normalizing numbers, so we recommend using it for phone numbers and other cases where number normalization is important. For low-latency or Conversational AI applications, best practice is to have your LLM [normalize the text](/docs/best-practices/prompting/normalization) before passing it to the TTS model. ## Model selection guide Use `eleven_multilingual_v2` Best for high-fidelity audio output with rich emotional expression Use Flash models Optimized for real-time applications (\~75ms latency) Use either either `eleven_multilingual_v2` or `eleven_flash_v2_5` Both support up to 32 languages Use `eleven_multilingual_v2` Ideal for professional content, audiobooks & video narration. Use `eleven_flash_v2_5`, `eleven_flash_v2` or `eleven_multilingual_v2` Perfect for real-time conversational applications Use `eleven_multilingual_sts_v2` Specialized for Speech-to-Speech conversion ## Character limits The maximum number of characters supported in a single text-to-speech request varies by model. | Model ID | Character limit | Approximate audio duration | | ------------------------ | --------------- | -------------------------- | | `eleven_flash_v2_5` | 40,000 | \~40 minutes | | `eleven_flash_v2` | 30,000 | \~30 minutes | | `eleven_multilingual_v2` | 10,000 | \~10 minutes | | `eleven_multilingual_v1` | 10,000 | \~10 minutes | | `eleven_english_sts_v2` | 10,000 | \~10 minutes | | `eleven_english_sts_v1` | 10,000 | \~10 minutes | For longer content, consider splitting the input into multiple requests. ## Scribe v1 Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging. This model excels in scenarios requiring accurate speech-to-text conversion: * **Transcription Services**: Perfect for converting audio/video content to text * **Meeting Documentation**: Ideal for capturing and documenting conversations * **Content Analysis**: Well-suited for audio content processing and analysis * **Multilingual Recognition**: Supports accurate transcription across 99 languages Key features: * Accurate transcription with word-level timestamps * Speaker diarization for multi-speaker audio * Dynamic audio tagging for enhanced context * Support for 99 languages Read more about Scribe v1 [here](/docs/capabilities/speech-to-text). ## Concurrency and priority Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue. Speech to Text has an elevated concurrency limit. Once the concurrency limit is met, subsequent requests are processed in a queue alongside lower-priority requests. In practice this typically only adds \~50ms of latency. | Plan | Concurrency Limit | STT Concurrency Limit | Priority level | | ---------- | ----------------- | --------------------- | -------------- | | Free | 4 | 10 | 3 | | Starter | 6 | 15 | 4 | | Creator | 10 | 25 | 5 | | Pro | 20 | 50 | 5 | | Scale | 30 | 75 | 5 | | Business | 30 | 75 | 5 | | Enterprise | Elevated | Elevated | Highest | To increase your concurrency limit & queue priority, [upgrade your subscription plan](https://elevenlabs.io/pricing/api). Enterprise customers can request a higher concurrency limit by contacting their account manager. The response headers include `current-concurrent-requests` and `maximum-concurrent-requests` which you can use to monitor your concurrency. # April 28, 2025 ### Conversational AI - **Custom Dashboard Charts**: The Conversational AI Dashboard can now be extended with custom charts displaying the results of evaluation criteria over time. See the new [GET](/docs/conversational-ai/api-reference/workspace/get-dashboard-settings) and [PATCH](/docs/conversational-ai/api-reference/workspace/update-dashboard-settings) endpoints for managing dashboard settings. - **Call History Filtering**: Added the ability to filter the call history by start date using the new `call_start_before_unix` parameter in the [List Conversations](/docs/conversational-ai/api-reference/conversations/get-conversations#request.query.call_start_before_unix) endpoint. [Try it here](https://elevenlabs.io/app/conversational-ai/history). - **Server Tools**: Added option of making PUT requests in [server tools](/docs/conversational-ai/customization/tools/server-tools) - **Transfer to human**: Added call forwarding functionality to support forwarding to operators, see docs [here](/docs/conversational-ai/customization/tools/system-tools/transfer-to-human) - **Language detection**: Fixed an issue where the [language detection system tool](/docs/conversational-ai/customization/tools/system-tools/language-detection) would trigger on a user replying yes in non-English language. ### Usage Analytics - **Custom Aggregation**: Added an optional `aggregation_interval` parameter to the [Get Usage Metrics](/docs/api-reference/usage/get-characters-usage-metrics) endpoint to control the interval over which to aggregate character usage (hour, day, week, month, or cumulative). - **New Metric Breakdowns**: The Usage Analytics section now supports additional metric breakdowns including `minutes_used`, `request_count`, `ttfb_avg`, and `ttfb_p95`, selectable via the new `metric` parameter in the [Get Usage Metrics](/docs/api-reference/usage/get-characters-usage-metrics) endpoint. Furthermore, you can now get a breakdown and filter by `request_queue`. ### API ## New Endpoints - Added 2 new endpoints for managing Conversational AI dashboard settings: - [Get Dashboard Settings](/docs/conversational-ai/api-reference/workspace/get-dashboard-settings) - Retrieves custom chart configurations for the ConvAI dashboard. - [Update Dashboard Settings](/docs/conversational-ai/api-reference/workspace/update-dashboard-settings) - Updates custom chart configurations for the ConvAI dashboard. ## Updated Endpoints ### Audio Generation (TTS, S2S, SFX, Voice Design) - Updated endpoints to support new `output_format` option `pcm_48000`: - [Text to Speech](/docs/api-reference/text-to-speech/convert) (`POST /v1/text-to-speech/{voice_id}`) - [Text to Speech with Timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/with-timestamps`) - [Text to Speech Stream](/docs/api-reference/text-to-speech/convert-as-stream) (`POST /v1/text-to-speech/{voice_id}/stream`) - [Text to Speech Stream with Timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/stream/with-timestamps`) - [Speech to Speech](/docs/api-reference/speech-to-speech/convert) (`POST /v1/speech-to-speech/{voice_id}`) - [Speech to Speech Stream](/docs/api-reference/speech-to-speech/convert-as-stream) (`POST /v1/speech-to-speech/{voice_id}/stream`) - [Sound Generation](/docs/api-reference/text-to-sound-effects/convert) (`POST /v1/sound-generation`) - [Create Voice Previews](/docs/api-reference/text-to-voice/create-previews) (`POST /v1/text-to-voice/create-previews`) ### Usage Analytics - Updated usage metrics endpoint: - [Get Usage Metrics](/docs/api-reference/usage/get-characters-usage-metrics) (`GET /v1/usage/character-stats`) - Added optional `aggregation_interval` and `metric` query parameters. ### Conversational AI - Updated conversation listing endpoint: - [List Conversations](/docs/conversational-ai/api-reference/conversations/get-conversations#request.query.call_start_before_unix) (`GET /v1/convai/conversations`) - Added optional `call_start_before_unix` query parameter for filtering by start date. ## Schema Changes ### Conversational AI - Added detailed LLM usage and pricing information to conversation [charging and history models](/docs/conversational-ai/api-reference/conversations/get-conversation#response.body.metadata.charging). - Added `tool_latency_secs` to [tool result schemas](/docs/api-reference/conversations/get-conversation#response.body.transcript.tool_results.tool_latency_secs) - Added `access_info` to [`GET /v1/convai/agents/{agent_id}`](/docs/api-reference/agents/get-agent#response.body.access_info) # April 21, 2025 ### Professional Voice Cloning (PVC) - **PVC API**: Introduced a comprehensive suite of API endpoints for managing Professional Voice Clones (PVC). You can now programmatically create voices, add/manage/delete audio samples, retrieve audio/waveforms, manage speaker separation, handle verification, and initiate training. For a full list of new endpoints check the API changes summary below or read the PVC API reference [here](/docs/api-reference/voices/pvc/create). ### Speech to Text - **Enhanced Export Options**: Added options to include or exclude timestamps and speaker IDs when exporting Speech to Text results in segmented JSON format via the API. ### Conversational AI - **New LLM Models**: Added support for new GPT-4.1 models: `gpt-4.1`, `gpt-4.1-mini`, and `gpt-4.1-nano` [here](/docs/api-reference/agents/create-agent#request.body.conversation_config.agent.prompt.llm) - **VAD Score**: Added a new client event which sends VAD scores to the client, see reference [here](/docs/conversational-ai/customization/events/client-events#vad_score) ### Workspace - **Member Management**: Added a new API endpoint to allow administrators to delete workspace members [here](/docs/api-reference/workspace/delete-member) ### API ## New Endpoints - Added 16 new endpoints: - [Delete Member](/docs/api-reference/workspace/delete-member) - Allows deleting workspace members. - [Create PVC Voice](/docs/api-reference/voices/PVC/create) - Creates a new PVC voice. - [Edit PVC Voice](/docs/api-reference/voices/PVC/update) - Edits PVC voice metadata. - [Add Samples To PVC Voice](/docs/api-reference/voices/PVC/samples/add) - Adds audio samples to a PVC voice. - [Update PVC Voice Sample](/docs/api-reference/voices/PVC/samples/update) - Updates a PVC voice sample (noise removal, speaker selection, trimming). - [Delete PVC Voice Sample](/docs/api-reference/voices/PVC/samples/delete) - Deletes a sample from a PVC voice. - [Retrieve Voice Sample Audio](/docs/api-reference/voices/PVC/samples/audio/get) - Retrieves audio for a PVC voice sample. - [Retrieve Voice Sample Visual Waveform](/docs/api-reference/voices/PVC/samples/waveform/get) - Retrieves the visual waveform for a PVC voice sample. - [Retrieve Speaker Separation Status](/docs/api-reference/voices/PVC/samples/speakers/get-status) - Gets the status of speaker separation for a sample. - [Start Speaker Separation](/docs/api-reference/voices/PVC/samples/speakers/start-separation) - Initiates speaker separation for a sample. - [Retrieve Separated Speaker Audio](/docs/api-reference/voices/PVC/samples/speakers/audio/get) - Retrieves audio for a specific separated speaker. - [Get PVC Voice Captcha](/docs/api-reference/voices/PVC/verification/get-captcha) - Gets the captcha for PVC voice verification. - [Verify PVC Voice Captcha](/docs/api-reference/voices/PVC/verification/verify-captcha) - Submits captcha verification for a PVC voice. - [Run PVC Training](/docs/api-reference/voices/PVC/train) - Starts the training process for a PVC voice. - [Request Manual Verification](/docs/api-reference/voices/PVC/verification/request-manual) - Requests manual verification for a PVC voice. ## Updated Endpoints ### Speech to Text - Updated endpoint with changes: - [Create Forced Alignment Task](/docs/api-reference/forced-alignment/create#request.body.enabled_spooled_file) - Added `enabled_spooled_file` parameter to allow streaming large files (`POST /v1/forced-alignment`). ## Schema Changes ### Conversational AI - `GET conversation details`: Added `has_audio`, `has_user_audio`, `has_response_audio` boolean fields [here](/docs/api-reference/conversations/get-conversation#response.body.has_audio) ### Dubbing - `GET dubbing resource `: Added `status` field to each render [here](/docs/api-reference/dubbing/get-dubbing-resource#response.body.renders.status) # April 14, 2025 ### Voices - **New PVC flow**: Added new flow for Professional Voice Clone creation, try it out [here](https://elevenlabs.io/app/voice-lab?action=create&creationType=professionalVoiceClone) ### Conversational AI - **Agent-agent transfer:** Added support for agent-to-agent transfers via a new system tool, enabling more complex conversational flows. See the [Agent Transfer tool documentation](/docs/conversational-ai/customization/tools/system-tools/agent-transfer) for details. - **Enhanced tool debugging:** Improved how tool execution details are displayed in the conversation history for easier debugging. - **Language detection fix:** Resolved an issue regarding the forced calling of the language detection tool. ### Dubbing - **Render endpoint:** Introduced a new endpoint to regenerate audio or video renders for specific languages within a dubbing project. This automatically handles missing transcriptions or translations. See the [Render Dub endpoint](/docs/api-reference/dubbing/render-dub). - **Increased size limit:** Raised the maximum allowed file size for dubbing projects to 1 GiB. ### API ## New Endpoints - [Added render dub endpoint](/docs/api-reference/dubbing/render-dub) - Regenerate dubs for a specific language. ## Updated Endpoints ### Pronunciation Dictionaries - Updated the response for the [`GET /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/`](/docs/api-reference/pronunciation-dictionary/get#response.body.permission_on_resource) endpoint and related components to include the `permission_on_resource` field. ### Speech to Text - Updated [Speech to Text endpoint](/docs/api-reference/speech-to-text/convert) (`POST /v1/speech-to-text`): - Added `cloud_storage_url` parameter to allow transcription directly from public S3 or GCS URLs (up to 2GB). - Made the `file` parameter optional; exactly one of `file` or `cloud_storage_url` must now be provided. ### Speech to Speech - Added optional `file_format` parameter (`pcm_s16le_16` or `other`) for lower latency with PCM input to [`POST /v1/speech-to-speech/{voice_id}`](/docs/api-reference/speech-to-speech/convert) ### Conversational AI - Updated components to support [agent-agent transfer](/docs/conversational-ai/customization/tools/system-tools/agent-transfer) tool ### Voices - Updated [`GET /v1/voices/{voice_id}`](/docs/api-reference/voices/get#response.body.samples.trim_start) `samples` field to include optional `trim_start` and `trim_end` parameters. ### AudioNative - Updated [`Get /v1/audio-native/{project_id}/settings`](/docs/api-reference/audio-native/get-settings#response.body.settings.status) to include `status` field (`processing` or `ready`). # April 7, 2025 ## Speech to text - **`scribe_v1_experimental`**: Launched a new experimental preview of the [Scribe v1 model](/docs/capabilities/speech-to-text) with improvements including improved performance on audio files with multiple languages, reduced hallucinations when audio is interleaved with silence, and improved audio tags. The new model is available via the API under the model name [`scribe_v1_experimental`](/docs/api-reference/speech-to-text/convert#request.body.model_id) ### Text to speech - **A-law format support**: Added [a-law format](/docs/api-reference/text-to-speech/convert#request.query.output_format) with 8kHz sample rate to enable integration with European telephony systems. - **Fixed quota issues**: Fixed a database bug that caused some requests to be mistakenly rejected as exceeding their quota. ### Conversational AI - **Document type filtering**: Added support for filtering knowledge base documents by their [type](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.types) (file, URL, or text). - **Non-audio agents**: Added support for conversational agents that don't output audio but still send response transcripts and can use tools. Non-audio agents can be enabled by removing the audio [client event](/docs/conversational-ai/customization/events/client-events). - **Improved agent templates**: Updated all agent templates with enhanced configurations and prompts. See more about how to improve system prompts [here](/docs/conversational-ai/best-practices/prompting-guide). - **Fixed stuck exports**: Fixed an issue that caused exports to be stuck for extended periods. ### Studio - **Fixed volume normalization**: Fixed issue with streaming project snapshots when volume normalization is enabled. ### New API endpoints - **Forced alignment**: Added new [forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text, perfect for subtitle generation. - **Batch calling**: Added batch calling [endpoint](/docs/api-reference/conversations/create-batch-call) for scheduling calls to multiple recipients ### API ## New Endpoints - Added [Forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text - Added dedicated endpoints for knowledge base document types: - [Create text document](/docs/api-reference/knowledge-base/text) - [Create file document](/docs/api-reference/knowledge-base/file) - [Create URL document](/docs/api-reference/knowledge-base/url) ## Updated Endpoints ### Text to Speech - Added a-law format (8kHz) to all audio endpoints: - [Text to speech](/docs/api-reference/text-to-speech/convert) - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - [Speech to speech](/docs/api-reference/speech-to-speech) - [Stream speech to speech](/docs/api-reference/speech-to-speech/convert-as-stream) - [Create voice previews](/docs/api-reference/text-to-voice/create-previews) - [Sound generation](/docs/api-reference/sound-generation) ### Voices - [Get voices](/docs/api-reference/voices/get-all) - Added `collection_id` parameter for filtering voices by collection ### Knowledge Base - [Get knowledge base](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Added `types` parameter for filtering documents by type - General endpoint for creating knowledge base documents marked as deprecated in favor of specialized endpoints ### User Subscription - [Get user subscription](/docs/api-reference/user/get-subscription) - Added `professional_voice_slots_used` property to track number of professional voices used in a workspace ### Conversational AI - Added `silence_end_call_timeout` parameter to set maximum wait time before terminating a call - Removed `/v1/convai/agents/{agent_id}/add-secret` endpoint (now handled by workspace secrets endpoints) # March 31, 2025 ### Text to speech - **Opus format support**: Added support for Opus format with 48kHz sample rate across multiple bitrates (32-192 kbps). - **Improved websocket error handling**: Updated TTS websocket API to return more accurate error codes (1011 for internal errors instead of 1008) for better error identification and SLA monitoring. ### Conversational AI - **Twilio outbound**: Added ability to natively run outbound calls. - **Post-call webhook override**: Added ability to override post-call webhook settings at the agent level, providing more flexible configurations. - **Large knowledge base document viewing**: Enhanced the knowledge base interface to allow viewing the entire content of large RAG documents. - **Added call SID dynamic variable**: Added `system__call_sid` as a system dynamic variable to allow referencing the call ID in prompts and tools. ### Studio - **Actor Mode**: Added Actor Mode in Studio, allowing you to use your own voice recordings to direct the way speech should sound in Studio projects. - **Improved keyboard shortcuts**: Updated keyboard shortcuts for viewing settings and editor shortcuts to avoid conflicts and simplified shortcuts for locking paragraphs. ### Dubbing - **Dubbing duplication**: Made dubbing duplication feature available to all users. - **Manual mode foreground generation**: Added ability to generate foreground audio when using manual mode with a file and CSV. ### Voices - **Enhanced voice collections**: Improved voice collections with visual upgrades, language-based filtering, navigation breadcrumbs, collection images, and mouse dragging for carousel navigation. - **Locale filtering**: Added locale parameter to shared voices endpoint for more precise voice filtering. ### API ## Updated Endpoints ### Text to Speech - Updated Text to Speech endpoints: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Added `apply_language_text_normalization` parameter for improved text pronunciation in supported languages (currently Japanese) - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added `apply_language_text_normalization` - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added `apply_language_text_normalization` - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added `apply_language_text_normalization` ### Audio Format - Added Opus format support to multiple endpoints: - [Text to speech](/docs/api-reference/text-to-speech/convert) - Added support for Opus format with 48kHz sample rate at multiple bitrates (32, 64, 96, 128, 192 kbps) - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added Opus format options - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added Opus format options - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added Opus format options - [Speech to speech](/docs/api-reference/speech-to-speech) - Added Opus format options - [Stream speech to speech](/docs/api-reference/speech-to-speech/convert-as-stream) - Added Opus format options - [Create voice previews](/docs/api-reference/text-to-voice/create-previews) - Added Opus format options - [Sound generation](/docs/api-reference/sound-generation) - Added Opus format options ### Conversational AI - Updated Conversational AI endpoints: - [Delete agent](/docs/api-reference/agents/delete-agent) - Changed success response code from 200 to 204 - [Updated RAG embedding model options](docs/api-reference/knowledge-base/rag-index-status#request.body.model) - replaced `gte_Qwen2_15B_instruct` with `multilingual_e5_large_instruct` ### Voices - Updated Voice endpoints: - [Get shared voices](/docs/api-reference/voice-library/get-shared) - Added locale parameter for filtering voices by language region ### Dubbing - Updated Dubbing endpoint: - [Dub a video or audio file](/docs/api-reference/dubbing/dub-a-video-or-an-audio-file) - Renamed beta feature `use_replacement_voices_from_library` parameter to `disable_voice_cloning` for clarity # March 24, 2025 ### Voices - **List Voices V2**: Added a new [V2 voice search endpoint](/docs/api-reference/voices/search) with better search and additional filtering options ### Conversational AI - **Native outbound calling**: Added native outbound calling for Twilio-configured numbers, eliminating the need for complex setup configurations. Outbound calls are now visible in the Call History page. - **Automatic language detection**: Added new system tool for automatic language detection that enables agents to switch languages based on both explicit user requests ("Let's talk in Spanish") and implicit language in user audio. - **Pronunciation dictionary improvements**: Fixed phoneme tags in pronunciation dictionaries to work correctly with conversational AI. - **Large RAG document viewing**: Added ability to view the entire content of large RAG documents in the knowledge base. - **Customizable widget controls**: Updated UI to include an optional mute microphone button and made widget icons customizable via slots. ### Sound Effects - **Fractional duration support**: Fixed an issue where users couldn't enter fractional values (like 0.5 seconds) for sound effect generation duration. ### Speech to Text - **Repetition handling**: Improved detection and handling of repetitions in speech-to-text processing. ### Studio - **Reader publishing fixes**: Added support for mp3_44100_192 output format (high quality) so users below Publisher tier can export audio to Reader. ### Mobile - **Core app signup**: Added signup endpoints for the new Core mobile app. ### API ## New Endpoints - Added 5 new endpoints: - [List voices (v2)](/docs/api-reference/voices/search) - Enhanced voice search capabilities with additional filtering options - [Initiate outbound call](/docs/api-reference/phone-numbers/twilio-outbound-call) - New endpoint for making outbound calls via Twilio integration - [Add pronunciation dictionary from rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Create pronunciation dictionaries directly from rules without file upload - [Get knowledge base document content](/docs/api-reference/knowledge-base/get-knowledge-base-document-content) - Retrieve full document content from the knowledge base - [Get knowledge base document chunk](/docs/api-reference/knowledge-base/get-knowledge-base-document-part-by-id) - Retrieve specific chunks from knowledge base documents ## Updated Endpoints ### Conversational AI - Updated Conversational AI endpoints: - [Create agent](/docs/api-reference/agents/create-agent) - Added `mic_muting_enabled` property for UI control and `workspace_overrides` property for workspace-specific configurations - [Update agent](/docs/api-reference/agents/update-agent) - Added `workspace_overrides` property for customizing agent behavior per workspace - [Get agent](/docs/api-reference/agents/get-agent) - Added `workspace_overrides` property to the response - [Get widget](/docs/api-reference/widget/get-agent-widget) - Added `mic_muting_enabled` property for controlling microphone muting in the widget UI - [Get conversation](/docs/api-reference/conversations/get-conversation) - Added rag information to view knowledge base content used during conversations - [Create phone number](/docs/api-reference/phone-numbers/create-phone-number) - Replaced generic structure with specific twilio phone number and sip trunk options - [Compute RAG index](/docs/api-reference/knowledge-base/rag-index-status) - Removed `force_reindex` query parameter for more controlled indexing - [List knowledge base documents](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Changed response structure to support different document types - [Get knowledge base document](/docs/api-reference/knowledge-base/get-knowledge-base-document-by-id) - Modified to return different response models based on document type ### Text to Speech - Updated Text to Speech endpoints: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made properties optional, including `stability` and `similarity` settings - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made voice settings properties optional for more flexible streaming requests - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made settings optional and modified `pronunciation_dictionary_locators` property - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Made voice settings properties optional for more flexible requests ### Speech to Text - Updated Speech to Text endpoint: - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Removed `biased_keywords` property from form data and improved internal repetition detection algorithm ### Voice Management - Updated Voice endpoints: - [Get voices](/docs/api-reference/voices/get-all) - Updated voice settings properties in the response - [Get default voice settings](/docs/api-reference/voices/get-default-settings) - Made `stability` and `similarity` properties optional - [Get voice settings](/docs/api-reference/voices/get-settings) - Made numeric properties optional for more flexible configuration - [Edit voice settings](/docs/api-reference/voices/edit-settings) - Made `stability` and `similarity` settings optional - [Create voice](/docs/api-reference/voices/add) - Modified array properties to accept null values - [Create voice from preview](/docs/api-reference/text-to-voice/create-voice-from-preview) - Updated voice settings model with optional properties ### Studio - Updated Studio endpoints: - [Get project](/docs/api-reference/studio/get-project) - Added `version_rules_num` to project metadata - [Get project snapshot](/docs/api-reference/studio/get-project-snapshot) - Removed `status` property - [Create pronunciation dictionaries](/docs/api-reference/studio/create-pronunciation-dictionaries) - Modified `pronunciation_dictionary_locators` property and string properties to accept null values ### Pronunciation Dictionary - Updated Pronunciation Dictionary endpoints: - [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Added `sort` and `sort_direction` query parameters, plus `latest_version_rules_num` and `integer` properties to response - [Get pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/get) - Added `latest_version_rules_num` and `integer` properties to response - [Add from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Added `version_rules_num` property to response for tracking rules quantity - [Add rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Added `version_rules_num` to response for rules tracking - [Remove rules](/docs/api-reference/pronunciation-dictionary/remove-rules) - Added `version_rules_num` to response for rules tracking # March 17, 2025 ### Conversational AI - **Default LLM update**: Changed the default agent LLM from Gemini 1.5 Flash to Gemini 2.0 Flash for improved performance. - **Fixed incorrect conversation abandons**: Improved detection of conversation continuations, preventing premature abandons when users repeat themselves. - **Twilio information in history**: Added Twilio call details to conversation history for better tracking. - **Knowledge base redesign**: Redesigned the knowledge base interface. - **System dynamic variables**: Added system dynamic variables to use time, conversation id, caller id and other system values as dynamic variables in prompts and tools. - **Twilio client initialisation**: Adds an agent level override for conversation initiation client data twilio webhook. - **RAG chunks in history**: Added retrieved chunks by RAG to the call transcripts in the [history view](https://elevenlabs.io/app/conversational-ai/history). ### Speech to Text - **Reduced pricing**: Reduced the pricing of our Scribe model, see more [here](/docs/capabilities/speech-to-text#pricing). - **Improved VAD detection**: Enhanced Voice Activity Detection with better pause detection at segment boundaries and improved handling of silent segments. - **Enhanced diarization**: Improved speaker clustering with a better ECAPA model, symmetric connectivity matrix, and more selective speaker embedding generation. - **Fixed ASR bugs**: Resolved issues with VAD rounding, silence and clustering that affected transcription accuracy. ### Studio - **Disable publishing UI**: Added ability to disable the publishing interface for specific workspace members to support enterprise workflows. - **Snapshot API improvement**: Modified endpoints for project and chapter snapshots to return an empty list instead of throwing errors when snapshots can't be downloaded. - **Disabled auto-moderation**: Turned off automatic moderation based on Text to Speech generations in Studio. ### Workspaces - **Fixed API key editing**: Resolved an issue where editing workspace API keys would reset character limits to zero, causing the keys to stop working. - **Optimized free subscriptions**: Fixed an issue with refreshing free subscription character limits, ### API ## New Endpoints - Added 3 new endpoints: - [Get workspace resource](/docs/api-reference/workspace/get-resource) - [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource) - [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource) ## Updated Endpoints ### Dubbing - Updated Dubbing endpoints: - [Dub a video or audio file](/docs/api-reference/dubbing/dub-a-video-or-an-audio-file) - Added `use_replacement_voices_from_library` property and made `source_path`, `target_language`, `source_language` nullable - [Resource dubbing](/docs/api-reference/dubbing/dub-segments) - Made `language_codes` array nullable - [Add language to dubbing resource](/docs/api-reference/dubbing/add-language-to-resource) - Made `language_code` nullable - [Add speaker segment](/docs/api-reference/dubbing/create-segment-for-speaker) - Made `text` nullable - [Translate dubbing resource](/docs/api-reference/dubbing/translate-segments) - Made `target_languages` array nullable - [Update dubbing segment](/docs/api-reference/dubbing/update-segment-language) - Made `start_time` and `end_time` nullable ### Project Management - Updated Project endpoints: - [Add project](/docs/api-reference/studio/add-project) - Made `metadata`, `project_name`, `description` nullable - [Create podcast](/docs/api-reference/studio/create-podcast) - Made `title`, `description`, `author` nullable - [Get project](/docs/api-reference/studio/get-project) - Made `last_modified_at`, `created_at`, `project_name` nullable - [Add chapter](/docs/api-reference/studio/add-chapter) - Made `chapter_id`, `word_count`, `statistics` nullable - [Update chapter](/docs/api-reference/studio/update-chapter) - Made `content` and `blocks` properties nullable ### Conversational AI - Updated Conversational AI endpoints: - [Update agent](/docs/api-reference/agents/update-agent) - Made `conversation_config`, `platform_settings` nullable and added `workspace_overrides` property - [Create agent](/docs/api-reference/agents/create-agent) - Made `agent_name`, `prompt`, `widget_config` nullable and added `workspace_overrides` property - [Add to knowledge base](/docs/api-reference/knowledge-base/add-to-knowledge-base) - Made `document_name` nullable - [Get conversation](/docs/api-reference/conversations/get-conversation) - Added `twilio_call_data` model and made `transcript`, `metadata` nullable ### Text to Speech - Updated Text to Speech endpoints: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made `character_alignment` and `word_alignment` nullable ### Voice Management - Updated Voice endpoints: - [Create voice previews](/docs/api-reference/text-to-voice/create-previews) - Added `loudness`, `quality`, `guidance_scale` properties - [Create voice from preview](/docs/api-reference/text-to-voice/create-voice-from-preview) - Added `speaker_separation` properties and made `voice_id`, `name`, `labels` nullable - [Get voice](/docs/api-reference/voices/get) - Added `speaker_boost`, `speaker_clarity`, `speaker_isolation` properties ### Speech to Text - Updated Speech to Text endpoint: - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `biased_keywords` property ### Other Updates - [Download history](/docs/api-reference/history/download) - Added application/zip content type and 400 response - [Add pronunciation dictionary from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Made `dictionary_name` and `description` nullable # March 10, 2025 ### Conversational AI - **HIPAA compliance**: Conversational AI is now [HIPAA compliant](/docs/conversational-ai/customization/hipaa-compliance) on appropriate plans, when a BAA is signed, zero-retention mode is enabled and appropriate LLMs are used. For access please [contact sales](/contact-sales) - **Cascade LLM**: Added dynamic dispatch during the LLM step to other LLMs if your default LLM fails. This results in higher latency but prevents the turn failing. - **Better error messages**: Added better error messages for websocket failures. - **Audio toggling**: Added ability to select only user or agent audio in the conversation playback. ### Scribe - **HIPAA compliance**: Added a zero retention mode to Scribe to be HIPAA compliant. - **Diarization**: Increased time length of audio files that can be transcribed with diarization from 8 minutes to 2 hours. - **Cheaper pricing**: Updated Scribe's pricing to be cheaper, as low as $0.22 per hour for the Business tier. - **Memory usage**: Shipped improvements to Scribe's memory usage. - **Fixed timestamps**: Fixed an issue that was causing incorrect timestamps to be returned. - **Biased keywords**: Added biased keywords to improve Scribe's performance. ### Text to Speech - **Pronunciation dictionaries**: Fixed pronunciation dictionary rule application for replacements that contain symbols. ### Dubbing - **Studio support**: Added support for creating dubs with `dubbing_studio` enabled, allowing for more advanced dubbing workflows beyond one-off dubs. ### Voices - **Verification**: Fixed an issue where users on probation could not verify their voice clone. ### API ## New Endpoints - Added 7 new endpoints: - [Add a shared voice to your collection](/docs/api-reference/voice-library/add-sharing-voice) - [Archive a project snapshot](/docs/api-reference/studio/archive-snapshot) - [Update a project](/docs/api-reference/studio/edit-project) - [Create an Audio Native enabled project](/docs/api-reference/audio-native/create) - [Get all voices](/docs/api-reference/voices/get-all) - [Download a pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/download) - [Get Audio Native project settings](/docs/api-reference/audio-native/get-settings) ## Updated Endpoints ### Studio Projects - Updated Studio project endpoints to add `source_type` property and deprecate `quality_check_on` and `quality_check_on_when_bulk_convert` properties: - [Get projects](/docs/api-reference/studio/get-projects) - [Get project](/docs/api-reference/studio/get-project) - [Add project](/docs/api-reference/studio/add-project) - [Update content](/docs/api-reference/studio/update-content) - [Create podcast](/docs/api-reference/studio/create-podcast) ### Voice Management - Updated Voice endpoints with several property changes: - [Get voice](/docs/api-reference/voices/get) - Made several properties optional and added `preview_url` - [Create voice](/docs/api-reference/voices/add) - Made several properties optional and added `preview_url` - [Create voice from preview](/docs/api-reference/text-to-voice/create-voice-from-preview) - Made several properties optional and added `preview_url` - [Get similar voices](/docs/api-reference/voices/get-similar-library-voices) - Made `language`, `description`, `preview_url`, and `rate` properties optional ### Conversational AI - Updated Conversational AI agent endpoints: - [Update agent](/docs/api-reference/agents/update-agent) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties - [Create agent](/docs/api-reference/agents/create-agent) - Modified `conversation_config`, `agent`, `prompt`, platform_settings, widget properties and added `shareable_page_show_terms` - [Get agent](/docs/api-reference/agents/get-agent) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties - [Get widget](/docs/api-reference/widget/get-agent-widget) - Modified `widget_config` property and added `shareable_page_show_terms` ### Knowledge Base - Updated Knowledge Base endpoints to add metadata property: - [List knowledge base documents](/docs/api-reference/knowledge-base/get-knowledge-base-document-by-id#response.body.metadata) - [Get knowledge base document](/docs/api-reference/knowledge-base/get-knowledge-base-document-by-id#response.body.metadata) ### Other Updates - [Dub a video or audio file](/docs/api-reference/dubbing/dub-a-video-or-an-audio-file) - Added `dubbing_studio` property - [Convert text to sound effects](/docs/api-reference/text-to-sound-effects/convert) - Added `output_format` query parameter - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `enable_logging` query parameter - [Get secrets](/docs/api-reference/workspace/get-secrets) - Modified `secrets` and `used_by` properties - [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Made `next_cursor` property optional ## Removed Endpoints - Temporarily removed Conversational AI tools endpoints: - Get tool - List tools - Update tool - Create tool - Delete tool # March 3, 2025 ### Dubbing - **Scribe for speech recognition**: Dubbing Studio now uses Scribe by default for speech recognition to improve accuracy. ### Speech to Text - **Fixes**: Shipped several fixes improving the stability of Speech to Text. ### Conversational AI - **Speed control**: Added speed control to an agent's settings in Conversational AI. - **Post call webhook**: Added the option of sending [post-call webhooks](/docs/conversational-ai/customization/personalization/post-call-webhooks) after conversations are completed. - **Improved error messages**: Added better error messages to the Conversational AI websocket. - **Claude 3.7 Sonnet**: Added Claude 3.7 Sonnet as a new LLM option in Conversational AI. ### API #### New Endpoints - Added new Dubbing resource management endpoints: - for adding [languages to dubs](/docs/api-reference/dubbing/add-language-to-resource) - for retrieving [dubbing resources](/docs/api-reference/dubbing/get-dubbing-resource) - for creating [segments](/docs/api-reference/dubbing/create-segment-for-speaker) - for modifying [segments](/docs/api-reference/dubbing/update-segment-language) - for removing [segments](/docs/api-reference/dubbing/delete-segment) - for dubbing [segments](/docs/api-reference/dubbing/dub-segments) - for transcribing [segments](/docs/api-reference/dubbing/transcribe-segments) - for translating [segments](/docs/api-reference/dubbing/translate-segments) - Added Knowledge Base RAG indexing [endpoint](/docs/api-reference/knowledge-base/rag-index-status) - Added Studio snapshot retrieval endpoints for [projects](docs/api-reference/studio/get-project-snapshot-by-id) and [chapters](docs/api-reference/studio/get-chapter-snapshot-by-id) #### Updated Endpoints - Added `prompt_injectable` property to knowledge base [endpoints](docs/api-reference/knowledge-base/get-knowledge-base-document-by-id#response.body.prompt_injectable) - Added `name` property to Knowledge Base document [creation](/docs/api-reference/knowledge-base/add-to-knowledge-base#request.body.name) and [retrieval](/docs/api-reference/knowledge-base/get-knowledge-base-document-by-id#response.body.name) endpoints: - Added `speed` property to [agent creation](/docs/api-reference/agents/create-agent#request.body.conversation_config.tts.speed) - Removed `secrets` property from agent endpoints (now handled by dedicated secrets endpoints) - Added [secret deletion endpoint](/docs/api-reference/workspace/delete-secret) for removing secrets - Removed `secrets` property from settings [endpoints](/docs/api-reference/workspace/get-settings) # February 25, 2025 ### Speech to Text - **ElevenLabs launched a new state of the art [Speech to Text API](/docs/capabilities/speech-to-text) available in 99 languages.** ### Text to Speech - **Speed control**: Added speed control to the Text to Speech API. ### Studio - **Auto-assigned projects**: Increased token limits for auto-assigned projects from 1 month to 3 months worth of tokens, addressing user feedback about working on longer projects. - **Language detection**: Added automatic language detection when generating audio for the first time, with suggestions to switch to Eleven Turbo v2.5 for languages not supported by Multilingual v2 (Hungarian, Norwegian, Vietnamese). - **Project export**: Enhanced project exporting in ElevenReader with better metadata tracking. ### Dubbing - **Clip overlap prevention**: Added automatic trimming of overlapping clips in dubbing jobs to ensure clean audio tracks for each speaker and language. ### Voice Management - **Instant Voice Cloning**: Improved preview generation for Instant Voice Cloning v2, making previews available immediately. ### Conversational AI - **Agent ownership**: Added display of agent creators in the agent list, improving visibility and management of shared agents. ### Web app - **Dark mode**: Added dark mode to the web app. ### API - Launched **/v1/speech-to-text** [endpoint](/docs/api-reference/speech-to-text/convert) - Added `agents.level` property to [Conversational AI agents endpoint](/docs/api-reference/agents/get-agents#response.body.agents.access_level) - Added `platform_settings` to [Conversational AI agent endpoint](/docs/api-reference/agents/update-agent#request.body.platform_settings) - Added `expandable` variant to `widget_config`, with configuration options `show_avatar_when_collapsed` and `disable_banner` to [Conversational AI agent widget endpoint](/docs/api-reference/agents/get-agent#response.body.widget) - Added `webhooks` property and `used_by` to `secrets` to [secrets endpoint](/docs/api-reference/workspace/get-secrets#response.body.secrets.used_by) - Added `verified_languages` to [voices endpoint](/docs/api-reference/voices/get#response.body.verified_languages) - Added `speed` property to [voice settings endpoints](/docs/api-reference/voices/get#response.body.settings.speed) - Added `verified_languages`, `is_added_by_user` to `voices` and `min_notice_period_days` query parameter to [shared voices endpoint](/docs/api-reference/voice-library/get-shared#request.query) - Added `verified_languages`, `is_added_by_user` to `voices` in [similar voices endpoint](/docs/api-reference/voices/get-similar-library-voices) - Added `search`, `show_only_owned_documents`, `use_typesense` query parameters to [knowledge base endpoint](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.search) - Added `used_by` to Conversation AI [secrets endpoint](/docs/api-reference/workspace/get-secrets) - Added `invalidate_affected_text` property to Studio [pronunciation dictionaries endpoint](/docs/api-reference/studio/create-pronunciation-dictionaries#request.body.invalidate_affected_text) # February 17, 2025 ### Conversational AI - **Tool calling fix**: Fixed an issue where tool calling was not working with agents using gpt-4o mini. This was due to a breaking change in the OpenAI API. - **Tool calling improvements**: Added support for tool calling with dynamic variables inside objects and arrays. - **Dynamic variables**: Fixed an issue where dynamic variables of a conversation were not being displayed correctly. ### Voice Isolator - **Fixed**: Fixed an issue that caused the voice isolator to not work correctly temporarily. ### Workspace - **Billing**: Improved billing visibility by differentiating rollover, cycle, gifted, and usage-based credits. - **Usage Analytics**: Improved usage analytics load times and readability. - **Fine grained fiat billing**: Added support for customizable pricing based on several factors. ### API - Added `phone_numbers` property to [Agent responses](/docs/api-reference/agents/get-agent) - Added usage metrics to subscription_extras in [User endpoint](/docs/api-reference/user/get): - `unused_characters_rolled_over_from_previous_period` - `overused_characters_rolled_over_from_previous_period` - `usage` statistics - Added `enable_conversation_initiation_client_data_from_webhook` to [Agent creation](/docs/api-reference/agents/create-agent) - Updated [Agent](/docs/api-reference/agents) endpoints with consolidated settings for: - `platform_settings` - `overrides` - `safety` - Deprecated `with_settings` parameter in [Voice retrieval endpoint](/docs/api-reference/voices/get) # February 10, 2025 ## Conversational AI - **Updated Pricing**: Updated self-serve pricing for Conversational AI with [reduced cost and a more generous free tier](/docs/conversational-ai/overview#pricing-tiers). - **Knowledge Base UI**: Created a new page to easily manage your [knowledge base](/app/conversational-ai/knowledge-base). - **Live calls**: Added number of live calls in progress in the user [dashboard](/app/conversational-ai) and as a new endpoint. - **Retention**: Added ability to customize transcripts and audio recordings [retention settings](/docs/conversational-ai/customization/privacy/retention). - **Audio recording**: Added a new option to [disable audio recordings](/docs/conversational-ai/customization/privacy/audio-saving). - **8k PCM support**: Added support for 8k PCM audio for both input and output. ## Studio - **GenFM**: Updated the create podcast endpoint to accept [multiple input sources](/docs/api-reference/projects/create-podcast). - **GenFM**: Fixed an issue where GenFM was creating empty podcasts. ## Enterprise - **New workspace group endpoints**: Added new endpoints to manage [workspace groups](/docs/api-reference/workspace/search-user-groups). ### API **Studio (formerly Projects)** All `/v1/projects/*` endpoints have been deprecated in favor of the new `/v1/studio/projects/*` endpoints. The following endpoints are now deprecated: - All operations on `/v1/projects/` - All operations related to chapters, snapshots, and content under `/v1/projects/*` **Conversational AI** - `POST /v1/convai/add-tool` - Use `POST /v1/convai/tools` instead - `DELETE /v1/convai/agents/{agent_id}` - Response type is no longer an object - `GET /v1/convai/tools` - Response type changed from array to object with a `tools` property **Conversational AI Updates** - `GET /v1/convai/agents/{agent_id}` - Updated conversation configuration and agent properties - `PATCH /v1/convai/agents/{agent_id}` - Added `use_tool_ids` parameter for tool management - `POST /v1/convai/agents/create` - Added tool integration via `use_tool_ids` **Knowledge Base & Tools** - `GET /v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties - `GET /v1/convai/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties - `GET /v1/convai/tools/{tool_id}` - Added `dependent_agents` property - `PATCH /v1/convai/tools/{tool_id}` - Added `dependent_agents` property **GenFM** - `POST /v1/projects/podcast/create` - Added support for multiple input sources **Studio (formerly Projects)** New endpoints replacing the deprecated `/v1/projects/*` endpoints - `GET /v1/studio/projects`: List all projects - `POST /v1/studio/projects`: Create a project - `GET /v1/studio/projects/{project_id}`: Get project details - `DELETE /v1/studio/projects/{project_id}`: Delete a project **Knowledge Base Management** - `GET /v1/convai/knowledge-base`: List all knowledge base documents - `DELETE /v1/convai/knowledge-base/{documentation_id}`: Delete a knowledge base - `GET /v1/convai/knowledge-base/{documentation_id}/dependent-agents`: List agents using this knowledge base **Workspace Groups** - New enterprise features for team management - `GET /v1/workspace/groups/search`: Search workspace groups - `POST /v1/workspace/groups/{group_id}/members`: Add members to a group - `POST /v1/workspace/groups/{group_id}/members/remove`: Remove members from a group **Tools** - `POST /v1/convai/tools`: Create new tools for agents ## Socials - **ElevenLabs Developers**: Follow our new developers account on X [@ElevenLabsDevs](https://x.com/intent/user?screen_name=elevenlabsdevs) # February 4, 2025 ### Conversational AI - **Agent monitoring**: Added a new dashboard for monitoring conversational AI agents' activity. Check out your's [here](/app/conversational-ai). - **Proactive conversations**: Enhanced capabilities with improved timeout retry logic. [Learn more](/docs/conversational-ai/customization/conversation-flow) - **Tool calls**: Fixed timeout issues occurring during tool calls - **Allowlist**: Fixed implementation of allowlist functionality. - **Content summarization**: Added Gemini as a fallback model to ensure service reliability - **Widget stability**: Fixed issue with dynamic variables causing the Conversational AI widget to fail ### Reader - **Trending content**: Added carousel showcasing popular articles and trending content - **New publications**: Introduced dedicated section for recent ElevenReader Publishing releases ### Studio (formerly Projects) - **Projects is now Studio** and is now generally available to everyone - **Chapter content editing**: Added support for editing chapter content through the public API, enabling programmatic updates to chapter text and metadata - **GenFM public API**: Added public API support for podcast creation through GenFM. Key features include: - Conversation mode with configurable host and guest voices - URL-based content sourcing - Customizable duration and highlights - Webhook callbacks for status updates - Project snapshot IDs for audio downloads ### SDKs - **Swift**: fixed an issue where resources were not being released after the end of a session - **Python**: added uv support - **Python**: fixed an issue where calls were not ending correctly ### API - Added POST `v1/workspace/invites/add-bulk` [endpoint](/docs/api-reference/workspace/invite-multiple-users) to enable inviting multiple users simultaneously - Added POST `v1/projects/podcast/create` [endpoint](/docs/api-reference/projects/create-podcast) for programmatic podcast generation through GenFM - Added 'v1/convai/knowledge-base/:documentation_id' [endpoints](/docs/api-reference/knowledge-base/) with CRUD operations for Conversational AI - Added PATCH `v1/projects/:project_id/chapters/:chapter_id` [endpoint](/docs/api-reference/studio/update-chapter) for updating project chapter content and metadata - Added `group_ids` parameter to [Workspace Invite endpoint](/docs/api-reference/workspace/invite-user) for group-based access control - Added structured `content` property to [Chapter response objects](/docs/api-reference/chapters/get-chapter) - Added `retention_days` and `delete_transcript_and_pii` data retention parameters to [Agent creation](/docs/api-reference/agents/create-agent) - Added structured response to [AudioNative content](/docs/api-reference/audio-native/create#response.body.project_id) - Added `convai_chars_per_minute` usage metric to [User endpoint](/docs/api-reference/user/get) - Added `media_metadata` field to [Dubbing response objects](/docs/api-reference/dubbing/get-dubbing-project-metadata) - Added GDPR-compliant `deletion_settings` to [Conversation responses](/docs/api-reference/conversations/get-conversation#response.body.metadata.deletion_settings) - Deprecated Knowledge Base legacy endpoints: - POST `/v1/convai/agents/{agent_id}/add-to-knowledge-base` - GET `/v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Updated Agent endpoints with consolidated [privacy control parameters](/docs/api-reference/agents/create-agent) # January 27, 2025 ### Docs - **Shipped our new docs**: we're keen to hear your thoughts, you can reach out by opening an issue on [GitHub](https://github.com/elevenlabs/elevenlabs-docs) or chatting with us on [Discord](https://discord.gg/elevenlabs) ### Conversational AI - **Dynamic variables**: Available in the dashboard and SDKs. [Learn more](/docs/conversational-ai/customization/personalization/dynamic-variables) - **Interruption handling**: Now possible to ignore user interruptions in Conversational AI. [Learn more](/docs/conversational-ai/customization/conversation-flow#interruptions) - **Twilio integration**: Shipped changes to increase audio quality when integrating with Twilio - **Latency optimization**: Published detailed blog post on latency optimizations. [Read more](/blog/how-do-you-optimize-latency-for-conversational-ai) - **PCM 8000**: Added support for PCM 8000 to Conversational AI agents - **Websocket improvements**: Fixed unexpected websocket closures ### Projects - **Auto-regenerate**: Auto-regeneration now available by default at no extra cost - **Content management**: Added `updateContent` method for dynamic content updates - **Audio conversion**: New auto-convert and auto-publish flags for seamless workflows ### API - Added `Update Project` endpoint for [project editing](/docs/api-reference/studio/edit-project#:~:text=List%20projects-,POST,Update%20project,-GET) - Added `Update Content` endpoint for [AudioNative content management](/docs/api-reference/audio-native/update-content) - Deprecated `quality_check_on` parameter in [project operations](/docs/api-reference/projects/add-project#request.body.quality_check_on). It is now enabled for all users at no extra cost - Added `apply_text_normalization` parameter to project creation with modes 'auto', 'on', 'apply_english' and 'off' for controlling text normalization during [project creation](/docs/api-reference/projects/add-project#request.body.apply_text_normalization) - Added alpha feature `auto_assign_voices` in [project creation](/docs/api-reference/projects/add-project#request.body.auto_assign_voices) to automatically assign voices to phrases - Added `auto_convert` flag to project creation to automatically convert [projects to audio](/docs/api-reference/audio-native/create#request.body.auto_convert) - Added support for creating Conversational AI agents with [dynamic variables](/docs/api-reference/agents/create-agent#request.body.conversation_config.agent.dynamic_variables) - Added `voice_slots_used` to `Subscription` model to track number of custom voices used in a workspace to the `User` [endpoint](/docs/api-reference/user/get-subscription#response.body.voice_slots_used) - Added `user_id` field to `User` [endpoint](/docs/api-reference/user/get#response.body.user_id) - Marked legacy AudioNative creation parameters (`image`, `small`, `sessionization`) as deprecated [parameters](/docs/api-reference/audio-native/create#request.body.image) - Agents platform now supports `call_limits` containing either `agent_concurrency_limit` or `daily_limit` or both parameters to control simultaneous and daily conversation limits for [agents](docs/api-reference/agents/create-agent#request.body.platform_settings.call_limits) - Added support for `language_presets` in `conversation_config` to customize language-specific [settings](/docs/api-reference/agents/create-agent#request.body.conversation_config.language_presets) ### SDKs - **Cross-Runtime Support**: Now compatible with **Bun 1.1.45+** and **Deno 2.1.7+** - **Regenerated SDKs**: We regenerated our SDKs to be up to date with the latest API spec. Check out the latest [Python SDK release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/1.50.5) and [JS SDK release](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v1.50.4) - **Dynamic Variables**: Fixed an issue where dynamic variables were not being handled correctly, they are now correctly handled in all SDKs # January 16, 2025 ## Product ### Conversational AI - **Additional languages**: Add a language dropdown to your widget so customers can launch conversations in their preferred language. Learn more [here](/docs/conversational-ai/customization/language). - **End call tool**: Let the agent automatically end the call with our new “End Call” tool. Learn more [here](/docs/conversational-ai/customization/tools) - **Flash default**: Flash, our lowest latency model, is now the default for new agents. In your agent dashboard under “voice”, you can toggle between Turbo and Flash. Learn more about Flash [here](https://elevenlabs.io/blog/meet-flash). - **Privacy**: Set concurrent call and daily call limits, turn off audio recordings, add feedback collection, and define customer terms & conditions. - **Increased tool limits**: Increase the number of tools available to your agent from 5 to 15. Learn more [here](/docs/conversational-ai/customization/tools). # January 2, 2025 ## Product - **Workspace Groups and Permissions**: Introduced new workspace group management features to enhance access control within organizations. [Learn more](https://elevenlabs.io/blog/workspace-groups-and-permissions). # December 19, 2024 ## Model - **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech). ## Launches - **[TalkToSanta.io](https://www.talktosanta.io)**: Experience Conversational AI in action by talking to Santa this holiday season. For every conversation with santa we donate 2 dollars to [Bridging Voice](https://www.bridgingvoice.org) (up to $11,000). - **[AI Engineer Pack](https://aiengineerpack.com)**: Get $50+ in credits from leading AI developer tools, including ElevenLabs. # December 6, 2024 ## Product - **GenFM Now on Web**: Access GenFM directly from the website in addition to the ElevenReader App, [try it now](https://elevenlabs.io/app/projects). # December 3, 2024 ## API - **Credit Usage Limits**: Set specific credit limits for API keys to control costs and manage usage across different use cases by setting "Access" or "No Access" to features like Dubbing, Audio Native, and more. [Check it out](https://elevenlabs.io/app/settings/api-keys) - **Workspace API Keys**: Now support access permissions, such as "Read" or "Read and Write" for User, Workspace, and History resources. - **Improved Key Management**: - Redesigned interface moving from modals to dedicated pages - Added detailed descriptions and key information - Enhanced visibility of key details and settings # November 29, 2024 ## Product - **GenFM**: Launched in the ElevenReader app. [Learn more](https://elevenlabs.io/blog/genfm-on-elevenreader) - **Conversational AI**: Now generally available to all customers. [Try it now](https://elevenlabs.io/conversational-ai) - **TTS Redesign**: The website TTS redesign is now rolled out to all customers. - **Auto-regenerate**: Now available in Projects. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects) - **Reader Platform Improvements**: - Improved content sharing with enhanced landing pages and social media previews. - Added podcast rating system and improved voice synchronization. - **Projects revamp**: - Restore past generations, lock content, assign speakers to sentence fragments, and QC at 2x speed. [Learn more](https://elevenlabs.io/blog/narrate-any-project) - Auto-regeneration identifies mispronunciations and regenerates audio at no extra cost. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects) ## API - **Conversational AI**: [SDKs and APIs](https://elevenlabs.io/docs/conversational-ai/docs/introduction) now available. # October 27, 2024 ## API - **u-law Audio Formats**: Added u-law audio formats to the Convai API for integrations with Twilio. - **TTS Websocket Improvements**: TTS websocket improvements, flushes and generation work more intuitively now. - **TTS Websocket Auto Mode**: A streamlined mode for using websockets. This setting reduces latency by disabling chunk scheduling and buffers. Note: Using partial sentences will result in significantly reduced quality. - **Improvements to latency consistency**: Improvements to latency consistency for all models. ## Website - **TTS Redesign**: The website TTS redesign is now in alpha! # October 20, 2024 ## API - **Normalize Text with the API**: Added the option normalize the input text in the TTS API. The new parameter is called `apply_text_normalization` and works on all non-turbo & non-flash models. ## Product - **Voice Design**: The Voice Design feature is now in beta! # October 13, 2024 ## Model - **Stability Improvements**: Significant audio stability improvements across all models, most noticeable on `turbo_v2` and `turbo_v2.5`, when using: - Websockets - Projects - Reader app - TTS with request stitching - ConvAI - **Latency Improvements**: Reduced time to first byte latency by approximately 20-30ms for all models. ## API - **Remove Background Noise Voice Samples**: Added the ability to remove background noise from voice samples using our audio isolation model to improve quality for IVCs and PVCs at no additional cost. - **Remove Background Noise STS Input**: Added the ability to remove background noise from STS audio input using our audio isolation model to improve quality at no additional cost. ## Feature - **Conversational AI Beta**: Conversational AI is now in beta. # Text to Speech > Learn how to turn text into lifelike spoken audio with ElevenLabs. ## Overview ElevenLabs [Text to Speech (TTS)](/docs/api-reference/text-to-speech) API turns text into lifelike audio with nuanced intonation, pacing and emotional awareness. [Our models](/docs/models) adapt to textual cues across 32 languages and multiple voice styles and can be used to: * Narrate global media campaigns & ads * Produce audiobooks in multiple languages with complex emotional delivery * Stream real-time audio from text Listen to a sample: Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project. Learn how to integrate text to speech into your application. Step-by-step guide for using text to speech in ElevenLabs. ### Voice quality For real-time applications, Flash v2.5 provides ultra-low 75ms latency, while Multilingual v2 delivers the highest quality audio with more nuanced expression. Our most lifelike, emotionally rich speech synthesis model
Most natural-sounding output
29 languages supported
10,000 character limit
Rich emotional expression
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
[Explore all](/docs/models)
### Voice options ElevenLabs offers thousands of voices across 32 languages through multiple creation methods: * [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices * [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas * [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication * [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions Learn more about our [voice options](/docs/capabilities/voices). ### Supported formats The default response format is "mp3", but other formats like "PCM", & "μ-law" are available. * **MP3** * Sample rates: 22.05kHz - 44.1kHz * Bitrates: 32kbps - 192kbps * **PCM (S16LE)** * Sample rates: 16kHz - 44.1kHz * **μ-law** * 8kHz sample rate * Optimized for telephony applications Higher quality audio options are only available on paid tiers - see our [pricing page](https://elevenlabs.io/pricing/api) for details. ### Supported languages Our v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* Flash v2.5 supports 32 languages - all languages from v2 models plus: *Hungarian, Norwegian & Vietnamese* Simply input text in any of our supported languages and select a matching voice from our [voice library](https://elevenlabs.io/community). For the most natural results, choose a voice with an accent that matches your target language and region. ### Prompting The models interpret emotional context directly from the text input. For example, adding descriptive text like "she said excitedly" or using exclamation marks will influence the speech emotion. Voice settings like Stability and Similarity help control the consistency, while the underlying emotion comes from textual cues. Read the [prompting guide](/docs/best-practices/prompting) for more details. Descriptive text will be spoken out by the model and must be manually trimmed or removed from the audio if desired. ## FAQ Yes, you can create [instant voice clones](/docs/capabilities/voices#cloned) of your own voice from short audio clips. For high-fidelity clones, check out our [professional voice cloning](/docs/capabilities/voices#cloned) feature. Yes. You retain ownership of any audio you generate. However, commercial usage rights are only available with paid plans. With a paid subscription, you may use generated audio for commercial purposes and monetize the outputs if you own the IP rights to the input content. A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions: * You can regenerate each piece of content up to 2 times for free * The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data. Use the low-latency Flash [models](/docs/models) (Flash v2 or v2.5) optimized for near real-time conversational or interactive scenarios. See our [latency optimization guide](/docs/best-practices/latency-optimization) for more details. The models are nondeterministic. For consistency, use the optional [seed parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle differences may still occur. Split long text into segments and use streaming for real-time playback and efficient processing. To maintain natural prosody flow between chunks, include [previous/next text or previous/next request id parameters](/docs/api-reference/text-to-speech/convert#request.body.previous_text). # Speech to Text > Learn how to turn spoken audio into text with ElevenLabs. ## Overview The ElevenLabs [Speech to Text (STT)](/docs/api-reference/speech-to-text) API turns spoken audio into text with state of the art accuracy. Our Scribe v1 [model](/docs/models) adapts to textual cues across 99 languages and multiple voice styles and can be used to: * Transcribe podcasts, interviews, and other audio or video content * Generate transcripts for meetings and other audio or video recordings Learn how to integrate speech to text into your application. Step-by-step guide for using speech to text in ElevenLabs. Companies requiring HIPAA compliance must contact [ElevenLabs Sales](https://elevenlabs.io/contact-sales) to sign a Business Associate Agreement (BAA) agreement. Please ensure this step is completed before proceeding with any HIPAA-related integrations or deployments. ## State of the art accuracy The Scribe v1 model is capable of transcribing audio from up to 32 speakers with high accuracy. Optionally it can also transcribe audio events like laughter, applause, and other non-speech sounds. The transcribed output supports exact timestamps for each word and audio event, plus diarization to identify the speaker for each word. The Scribe v1 model is best used for when high-accuracy transcription is required rather than real-time transcription. A low-latency, real-time version will be released soon. ## Pricing | Tier | Price/month | Hours included | Price per included hour | Price per additional hour | | -------- | ----------- | ------------------- | ----------------------- | ------------------------- | | Free | \$0 | Unavailable | Unavailable | Unavailable | | Starter | \$5 | 12 hours 30 minutes | \$0.40 | Unavailable | | Creator | \$22 | 62 hours 51 minutes | \$0.35 | \$0.48 | | Pro | \$99 | 300 hours | \$0.33 | \$0.40 | | Scale | \$330 | 1,100 hours | \$0.30 | \$0.33 | | Business | \$1,320 | 6,000 hours | \$0.22 | \$0.22 | | Tier | Price/month | Hours included | Price per included hour | | -------- | ----------- | --------------- | ----------------------- | | Free | \$0 | 12 minutes | Unavailable | | Starter | \$5 | 1 hour | \$5 | | Creator | \$22 | 4 hours 53 min | \$4.5 | | Pro | \$99 | 24 hours 45 min | \$4 | | Scale | \$330 | 94 hours 17 min | \$3.5 | | Business | \$1,320 | 440 hours | \$3 | For reduced pricing at higher scale than 6,000 hours/month in addition to custom MSAs and DPAs, please [contact sales](https://elevenlabs.io/contact-sales). **Note: The free tier requires attribution and does not have commercial licensing.** Scribe has higher concurrency limits than other services from ElevenLabs. Please see other concurrency limits [here](/docs/models#concurrency-and-priority) | Plan | STT Concurrency Limit | | ---------- | --------------------- | | Free | 10 | | Starter | 15 | | Creator | 25 | | Pro | 50 | | Scale | 75 | | Business | 75 | | Enterprise | Elevated | ## Examples The following example shows the output of the Scribe v1 model for a sample audio file. ```javascript { "language_code": "en", "language_probability": 1, "text": "With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects.", "words": [ { "text": "With", "start": 0.119, "end": 0.259, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 0.239, "end": 0.299, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "a", "start": 0.279, "end": 0.359, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 0.339, "end": 0.499, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "soft", "start": 0.479, "end": 1.039, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 1.019, "end": 1.2, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "and", "start": 1.18, "end": 1.359, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 1.339, "end": 1.44, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "whispery", "start": 1.419, "end": 1.979, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 1.959, "end": 2.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "American", "start": 2.159, "end": 2.719, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 2.699, "end": 2.779, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "accent,", "start": 2.759, "end": 3.389, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 4.119, "end": 4.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "I'm", "start": 4.159, "end": 4.459, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 4.44, "end": 4.52, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "the", "start": 4.5, "end": 4.599, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 4.579, "end": 4.699, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "ideal", "start": 4.679, "end": 5.099, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 5.079, "end": 5.219, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "choice", "start": 5.199, "end": 5.719, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 5.699, "end": 6.099, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "for", "start": 6.099, "end": 6.199, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 6.179, "end": 6.279, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "creating", "start": 6.259, "end": 6.799, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 6.779, "end": 6.979, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "ASMR", "start": 6.959, "end": 7.739, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 7.719, "end": 7.859, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "content,", "start": 7.839, "end": 8.45, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 9, "end": 9.06, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "meditative", "start": 9.04, "end": 9.64, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 9.619, "end": 9.699, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "guides,", "start": 9.679, "end": 10.359, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 10.359, "end": 10.409, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "or", "start": 11.319, "end": 11.439, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 11.42, "end": 11.52, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "adding", "start": 11.5, "end": 11.879, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 11.859, "end": 12, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "an", "start": 11.979, "end": 12.079, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 12.059, "end": 12.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "intimate", "start": 12.179, "end": 12.579, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 12.559, "end": 12.699, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "feel", "start": 12.679, "end": 13.159, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.139, "end": 13.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "to", "start": 13.159, "end": 13.26, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.239, "end": 13.3, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "your", "start": 13.299, "end": 13.399, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.379, "end": 13.479, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "narrative", "start": 13.479, "end": 13.889, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.919, "end": 13.939, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "projects.", "start": 13.919, "end": 14.779, "type": "word", "speaker_id": "speaker_0" } ] } ``` The output is classified in three category types: * `word` - A word in the language of the audio * `spacing` - The space between words, not applicable for languages that don't use spaces like Japanese, Mandarin, Thai, Lao, Burmese and Cantonese * `audio_event` - Non-speech sounds like laughter or applause ## Models State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Explore all](/docs/models)
## Supported languages The Scribe v1 model supports 99 languages, including: *Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (cmn), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).* ### Breakdown of language support Word Error Rate (WER) is a key metric used to evaluate the accuracy of transcription systems. It measures how many errors are present in a transcript compared to a reference transcript. Below is a breakdown of the WER for each language that Scribe v1 supports. Bulgarian (bul), Catalan (cat), Czech (ces), Danish (dan), Dutch (nld), English (eng), Finnish (fin), French (fra), Galician (glg), German (deu), Greek (ell), Hindi (hin), Indonesian (ind), Italian (ita), Japanese (jpn), Kannada (kan), Malay (msa), Malayalam (mal), Macedonian (mkd), Norwegian (nor), Polish (pol), Portuguese (por), Romanian (ron), Russian (rus), Serbian (srp), Slovak (slk), Spanish (spa), Swedish (swe), Turkish (tur), Ukrainian (ukr) and Vietnamese (vie). Bengali (ben), Belarusian (bel), Bosnian (bos), Cantonese (yue), Estonian (est), Filipino (fil), Gujarati (guj), Hungarian (hun), Kazakh (kaz), Latvian (lav), Lithuanian (lit), Mandarin (cmn), Marathi (mar), Nepali (nep), Odia (ori), Persian (fas), Slovenian (slv), Tamil (tam) and Telugu (tel) Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Burmese (mya), Cebuano (ceb), Croatian (hrv), Georgian (kat), Hausa (hau), Hebrew (heb), Icelandic (isl), Javanese (jav), Kabuverdianu (kea), Korean (kor), Kyrgyz (kir), Lingala (lin), Maltese (mlt), Mongolian (mon), Māori (mri), Occitan (oci), Punjabi (pan), Sindhi (snd), Swahili (swa), Tajik (tgk), Thai (tha), Urdu (urd), Uzbek (uzb) and Welsh (cym). Amharic (amh), Chichewa (nya), Fulah (ful), Ganda (lug), Igbo (ibo), Irish (gle), Khmer (khm), Kurdish (kur), Lao (lao), Luxembourgish (ltz), Luo (luo), Northern Sotho (nso), Pashto (pus), Shona (sna), Somali (som), Umbundu (umb), Wolof (wol), Xhosa (xho) and Zulu (zul). ## FAQ Yes, the API supports uploading both audio and video files for transcription. Files up to 1 GB in size and up to 4.5 hours in duration are supported. The audio supported audio formats include: * audio/aac * audio/x-aac * audio/x-aiff * audio/ogg * audio/mpeg * audio/mp3 * audio/mpeg3 * audio/x-mpeg-3 * audio/opus * audio/wav * audio/x-wav * audio/webm * audio/flac * audio/x-flac * audio/mp4 * audio/aiff * audio/x-m4a Supported video formats include: * video/mp4 * video/x-msvideo * video/x-matroska * video/quicktime * video/x-ms-wmv * video/x-flv * video/webm * video/mpeg * video/3gpp ElevenLabs is constantly expanding the number of languages supported by our models. Please check back frequently for updates. # Voice changer > Learn how to transform audio between voices while preserving emotion and delivery. ## Overview ElevenLabs [voice changer](/docs/api-reference/speech-to-speech/convert) API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to: * Change any voice while preserving emotional delivery and nuance * Create consistent character voices across multiple languages and recording sessions * Fix or replace specific words and phrases in existing recordings Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project. Learn how to integrate voice changer into your application. Step-by-step guide for using voice changer in ElevenLabs. ## Supported languages Our v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* The `eleven_english_sts_v2` model only supports English. ## Best practices ### Audio quality * Record in a quiet environment to minimize background noise * Maintain appropriate microphone levels - avoid too quiet or peaked audio * Use `remove_background_noise=true` if environmental sounds are present ### Recording guidelines * Keep segments under 5 minutes for optimal processing * Feel free to include natural expressions (laughs, sighs, emotions) * The source audio's accent and language will be preserved in the output ### Parameters * **Style**: Set to 0% when input audio is already expressive * **Stability**: Use 100% for maximum voice consistency * **Language**: Choose source audio that matches your desired accent and language ## FAQ Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability and consistent output. Absolutely. Provide your custom voice’s voice\_id and specify the correct{' '} model\_id. You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no additional fee based on file size. Possibly. Use remove\_background\_noise=true or the Voice Isolator tool to minimize environmental sounds in the final output. Though eleven\_english\_sts\_v2 is available, our{' '} eleven\_multilingual\_sts\_v2 model often outperforms it, even for English material. “Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances in the source audio, turn style down and stability up. # Voice isolator > Learn how to isolate speech from background noise, music, and ambient sounds from any audio. ## Overview ElevenLabs [voice isolator](/docs/api-reference/audio-isolation/audio-isolation) API transforms audio recordings with background noise into clean, studio-quality speech. This is particularly useful for audio recorded in noisy environments, or recordings containing unwanted ambient sounds, music, or other background interference. Listen to a sample: ## Usage The voice isolator model extracts speech from background noise in both audio and video files. Learn how to integrate voice isolator into your application. Step-by-step guide for using voice isolator in ElevenLabs. ### Supported file types * **Audio**: AAC, AIFF, OGG, MP3, OPUS, WAV, FLAC, M4A * **Video**: MP4, AVI, MKV, MOV, WMV, FLV, WEBM, MPEG, 3GPP ## FAQ * **Cost**: Voice isolator costs 1000 characters for every minute of audio. * **File size and length**: Supports files up to 500MB and 1 hour in length. * **Music vocals**: Not specifically optimized for isolating vocals from music, but may work depending on the content. # Dubbing > Learn how to translate audio and video while preserving the emotion, timing & tone of speakers. ## Overview ElevenLabs [dubbing](/docs/api-reference/dubbing/dub-a-video-or-an-audio-file) API translates audio and video across 32 languages while preserving the emotion, timing, tone and unique characteristics of each speaker. Our model separates each speaker’s dialogue from the soundtrack, allowing you to recreate the original delivery in another language. It can be used to: * Grow your addressable audience by 4x to reach international audiences * Adapt existing material for new markets while preserving emotional nuance * Offer content in multiple languages without re-recording voice talent We also offer a [fully managed dubbing service](https://elevenlabs.io/elevenstudios) for video and podcast creators. ## Usage ElevenLabs dubbing can be used in three ways: * **Dubbing Studio** in the user interface for fast, interactive control and editing * **Programmatic integration** via our [API](/docs/api-reference/dubbing/dub-a-video-or-an-audio-file) for large-scale or automated workflows * **Human-verified dubs via ElevenLabs Productions** - for more information, please reach out to [productions@elevenlabs.io](mailto:productions@elevenlabs.io) The UI supports files up to **500MB** and **45 minutes**. The API supports files up to **1GB** and **2.5 hours**. Learn how to integrate dubbing into your application. Edit transcripts and translate videos step by step in Dubbing Studio. ### Key features **Speaker separation** Automatically detect multiple speakers, even with overlapping speech. **Multi-language output** Generate localized tracks in 32 languages. **Preserve original voices** Retain the speaker’s identity and emotional tone. **Keep background audio** Avoid re-mixing music, effects, or ambient sounds. **Customizable transcripts** Manually edit translations and transcripts as needed. **Supported file types** Videos and audio can be dubbed from various sources, including YouTube, X, TikTok, Vimeo, direct URLs, or file uploads. **Video transcript and translation editing** Our AI video translator lets you manually edit transcripts and translations to ensure your content is properly synced and localized. Adjust the voice settings to tune delivery, and regenerate speech segments until the output sounds just right. A Creator plan or higher is required to dub audio files. For videos, a watermark option is available to reduce credit usage. ### Cost To reduce credit usage, you can: * Dub only a selected portion of your file * Use watermarks on video output (not available for audio) * Fine-tune transcripts and regenerate individual segments instead of the entire clip Refer to our [pricing page](https://elevenlabs.io/pricing) for detailed credit costs. ## List of supported languages for dubbing | No | Language Name | Language Code | | -- | ------------- | ------------- | | 1 | English | en | | 2 | Hindi | hi | | 3 | Portuguese | pt | | 4 | Chinese | zh | | 5 | Spanish | es | | 6 | French | fr | | 7 | German | de | | 8 | Japanese | ja | | 9 | Arabic | ar | | 10 | Russian | ru | | 11 | Korean | ko | | 12 | Indonesian | id | | 13 | Italian | it | | 14 | Dutch | nl | | 15 | Turkish | tr | | 16 | Polish | pl | | 17 | Swedish | sv | | 18 | Filipino | fil | | 19 | Malay | ms | | 20 | Romanian | ro | | 21 | Ukrainian | uk | | 22 | Greek | el | | 23 | Czech | cs | | 24 | Danish | da | | 25 | Finnish | fi | | 26 | Bulgarian | bg | | 27 | Croatian | hr | | 28 | Slovak | sk | | 29 | Tamil | ta | ## FAQ Dubbing can be performed on all types of short and long form video and audio content. We recommend dubbing content with a maximum of 9 unique speakers at a time to ensure a high-quality dub. Yes. Our models analyze each speaker’s original delivery to recreate the same tone, pace, and style in your target language. We use advanced source separation to isolate individual voices from ambient sound. Multiple overlapping speakers can be split into separate tracks. Via the user interface, the maximum file size is 500MB up to 45 minutes. Through the API, you can process files up to 1GB and 2.5 hours. You can choose to dub only certain portions of your video/audio or tweak translations/voices in our interactive Dubbing Studio. # Sound effects > Learn how to create high-quality sound effects from text with ElevenLabs. ## Overview ElevenLabs [sound effects](/docs/api-reference/text-to-sound-effects/convert) API turns text descriptions into high-quality audio effects with precise control over timing, style and complexity. The model understands both natural language and audio terminology, enabling you to: * Generate cinematic sound design for films & trailers * Create custom sound effects for games & interactive media * Produce Foley and ambient sounds for video content Listen to an example: ## Usage Sound effects are generated using text descriptions & two optional parameters: * **Duration**: Set a specific length for the generated audio (in seconds) * Default: Automatically determined based on the prompt * Range: 0.1 to 22 seconds * Cost: 40 characters per second when duration is specified * **Prompt influence**: Control how strictly the model follows the prompt * High: More literal interpretation of the prompt * Low: More creative interpretation with added variations Learn how to integrate sound effects into your application. Step-by-step guide for using sound effects in ElevenLabs. ### Prompting guide #### Simple effects For basic sound effects, use clear, concise descriptions: * "Glass shattering on concrete" * "Heavy wooden door creaking open" * "Thunder rumbling in the distance" #### Complex sequences For multi-part sound effects, describe the sequence of events: * "Footsteps on gravel, then a metallic door opens" * "Wind whistling through trees, followed by leaves rustling" * "Sword being drawn, then clashing with another blade" #### Musical elements The API also supports generation of musical components: * "90s hip-hop drum loop, 90 BPM" * "Vintage brass stabs in F minor" * "Atmospheric synth pad with subtle modulation" #### Audio Terminology Common terms that can enhance your prompts: * **Impact**: Collision or contact sounds between objects, from subtle taps to dramatic crashes * **Whoosh**: Movement through air effects, ranging from fast and ghostly to slow-spinning or rhythmic * **Ambience**: Background environmental sounds that establish atmosphere and space * **One-shot**: Single, non-repeating sound * **Loop**: Repeating audio segment * **Stem**: Isolated audio component * **Braam**: Big, brassy cinematic hit that signals epic or dramatic moments, common in trailers * **Glitch**: Sounds of malfunction, jittering, or erratic movement, useful for transitions and sci-fi * **Drone**: Continuous, textured sound that creates atmosphere and suspense ## FAQ The maximum duration is 22 seconds per generation. For longer sequences, generate multiple effects and combine them. Yes, you can generate musical elements like drum loops, bass lines, and melodic samples. However, for full music production, consider combining multiple generated elements. Use detailed prompts, appropriate duration settings, and high prompt influence for more predictable results. For complex sounds, generate components separately and combine them. Generated audio is provided in MP3 format with professional-grade quality (44.1kHz, 128-192kbps). # Voices > Learn how to create, customize, and manage voices with ElevenLabs. ## Overview ElevenLabs provides models for voice creation & customization. The platform supports a wide range of voice options, including voices from our extensive [voice library](https://elevenlabs.io/app/voice-library), voice cloning, and artificially designed voices using text prompts. ### Voice categories * **Community**: Voices shared by the community from the ElevenLabs [voice library](/docs/product-guides/voices/voice-library). * **Cloned**: Custom voices created using instant or professional [voice cloning](/docs/product-guides/voices/voice-cloning). * **Voice design**: Artificially designed voices created with the [voice design](/docs/product-guides/voices/voice-design) tool. * **Default**: Pre-designed, high-quality voices optimized for general use. #### Community The [voice library](/docs/product-guides/voices/voice-library) contains over 5,000 voices shared by the ElevenLabs community. Use it to: * Discover unique voices shared by the ElevenLabs community. * Add voices to your personal collection. * Share your own voice clones for cash rewards when others use it. Share your voice with the community, set your terms, and earn cash rewards when others use it. We've paid out over **\$1M** already. Learn how to use voices from the voice library #### Cloned Clone your own voice from 30-second samples with Instant Voice Cloning, or create hyper-realistic voices using Professional Voice Cloning. * **Instant Voice Cloning**: Quickly replicate a voice from short audio samples. * **Professional Voice Cloning**: Generate professional-grade voice clones with extended training audio. Voice-captcha technology is used to verify that **all** voice clones are created from your own voice samples. A Creator plan or higher is required to create voice clones. Clone a voice instantly Create a perfect voice clone Learn how to create instant & professional voice clones #### Voice design With [Voice Design](/docs/product-guides/voices/voice-design), you can create entirely new voices by specifying attributes like age, gender, accent, and tone. Generated voices are ideal for: * Realistic voices with nuanced characteristics. * Creative character voices for games and storytelling. The voice design tool creates 3 voice previews, simply provide: * A **voice description** between 20 and 1000 characters. * Some **text** to preview the voice between 100 and 1000 characters. Integrate voice design into your application. Learn how to craft voices from a single prompt. #### Default Our curated set of default voices is optimized for core use cases. These voices are: * **Reliable**: Available long-term. * **Consistent**: Carefully crafted and quality-checked for performance. * **Model-ready**: Fine-tuned on new models upon release. Default voices are available to all users via the **my voices** tab in the [voice lab dashboard](https://elevenlabs.io/app/voice-lab). Default voices were previously referred to as `premade` voices. The latter term is still used when accessing default voices via the API. ### Managing voices All voices can be managed through **My Voices**, where you can: * Search, filter, and categorize voices * Add descriptions and custom tags * Organize voices for quick access Learn how to manage your voice collection in [My Voices documentation](/docs/product-guides/voices/voice-library). * **Search and Filter**: Find voices using keywords or tags. * **Preview Samples**: Listen to voice demos before adding them to **My Voices**. * **Add to Collection**: Save voices for easy access in your projects. > **Tip**: Try searching by specific accents or genres, such as "Australian narration" or "child-like character." ### Supported languages All ElevenLabs voices support multiple languages. Experiment by converting phrases like `Hello! こんにちは! Bonjour!` into speech to hear how your own voice sounds across different languages. ElevenLabs supports voice creation in 32 languages. Match your voice selection to your target region for the most natural results. * **Default Voices**: Optimized for multilingual use. * **Generated and Cloned Voices**: Accent fidelity depends on input samples or selected attributes. Our v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* Flash v2.5 supports 32 languages - all languages from v2 models plus: *Hungarian, Norwegian & Vietnamese* [Learn more about our models](/docs/models) ## FAQ Yes, you can create custom voices with Voice Design or clone voices using Instant or Professional Voice Cloning. Both options are accessible in **My Voices**. Instant Voice Cloning uses short audio samples for near-instantaneous voice creation. Professional Voice Cloning requires longer samples but delivers hyper-realistic, high-quality results. Professional Voice Clones can be shared privately or publicly in the Voice Library. Generated voices and Instant Voice Clones cannot currently be shared. Use **My Voices** to search, filter, and organize your voice collection. You can also delete, tag, and categorize voices for easier management. Use clean and consistent audio samples. For Professional Voice Cloning, provide a variety of recordings in the desired speaking style. Yes, Professional Voice Clones can be shared in the Voice Library. Instant Voice Clones and Generated Voices cannot currently be shared. Generated Voices are ideal for unique characters in games, animations, and creative storytelling. Go to **Voices > Voice Library** in your dashboard or access it via API. # Forced Alignment > Learn how to turn spoken audio and text into a time-aligned transcript with ElevenLabs. ## Overview The ElevenLabs [Forced Alignment](/docs/api-reference/forced-alignment) API turns spoken audio and text into a time-aligned transcript. This is useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. This can be used for: * Matching subtitles to a video recording * Generating timings for an audiobook recording of an ebook ## Usage The Forced Alignment API can be used by interfacing with the ElevenLabs API directly. Learn how to integrate Forced Alignment into your application. ## Supported languages Our v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* ## FAQ Forced alignment is a technique used to align spoken audio with text. You provide an audio file and a transcript of the audio file and the API will return a time-aligned transcript. It's useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. The input text should be a string with no special formatting i.e. JSON. Example of good input text: ``` "Hello, how are you?" ``` Example of bad input text: ``` { "text": "Hello, how are you?" } ``` Forced Alignment costs the same as the [Speech to Text](/docs/capabilities/speech-to-text#pricing) API. Forced Alignment does not support diarization. If you provide diarized text, the API will likely return unwanted results. The maximum file size for Forced Alignment is 1GB. For audio files, the maximum duration is 4.5 hours. For the text input, the maximum length is 675k characters. # Streaming text to speech > Learn how to stream text into speech in Python or Node.js. In this tutorial, you'll learn how to convert [text to speech](https://elevenlabs.io/text-to-speech) with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application. If you want to jump straight to an example you can find them in the [Python](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) and [Node.js](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) example repositories. ## Requirements * An ElevenLabs account with an API key (here’s how to [find your API key](/docs/developer-guides/quickstart#authentication)). * Python or Node installed on your machine * (Optionally) an AWS account with access to S3. ## Setup ### Installing our SDK Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip: ```bash Python pip install elevenlabs ``` ```bash TypeScript npm install elevenlabs ``` Additionally, install necessary packages to manage your environmental variables: ```bash Python pip install python-dotenv ``` ```bash TypeScript npm install dotenv npm install @types/dotenv --save-dev ``` Next, create a `.env` file in your project directory and fill it with your credentials like so: ```bash .env ELEVENLABS_API_KEY=your_elevenlabs_api_key_here ``` ## Convert text to speech (file) To convert text to speech and save it as a file, we’ll use the `convert` method of the ElevenLabs SDK and then it locally as a `.mp3` file. ```python text_to_speech_file.py (Python) import os import uuid from elevenlabs import VoiceSettings from elevenlabs.client import ElevenLabs ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY") client = ElevenLabs( api_key=ELEVENLABS_API_KEY, ) def text_to_speech_file(text: str) -> str: # Calling the text_to_speech conversion API with detailed parameters response = client.text_to_speech.convert( voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice output_format="mp3_22050_32", text=text, model_id="eleven_turbo_v2_5", # use the turbo model for low latency # Optional voice settings that allow you to customize the output voice_settings=VoiceSettings( stability=0.0, similarity_boost=1.0, style=0.0, use_speaker_boost=True, speed=1.0, ), ) # uncomment the line below to play the audio back # play(response) # Generating a unique file name for the output MP3 file save_file_path = f"{uuid.uuid4()}.mp3" # Writing the audio to a file with open(save_file_path, "wb") as f: for chunk in response: if chunk: f.write(chunk) print(f"{save_file_path}: A new audio file was saved successfully!") # Return the path of the saved audio file return save_file_path ``` ```typescript text_to_speech_file.ts (Typescript) import * as dotenv from 'dotenv'; import { ElevenLabsClient } from 'elevenlabs'; import { createWriteStream } from 'fs'; import { v4 as uuid } from 'uuid'; dotenv.config(); const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY; const client = new ElevenLabsClient({ apiKey: ELEVENLABS_API_KEY, }); export const createAudioFileFromText = async (text: string): Promise => { return new Promise(async (resolve, reject) => { try { const audio = await client.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', { model_id: 'eleven_multilingual_v2', text, output_format: 'mp3_44100_128', // Optional voice settings that allow you to customize the output voice_settings: { stability: 0, similarity_boost: 0, use_speaker_boost: true, speed: 1.0, }, }); const fileName = `${uuid()}.mp3`; const fileStream = createWriteStream(fileName); audio.pipe(fileStream); fileStream.on('finish', () => resolve(fileName)); // Resolve with the fileName fileStream.on('error', reject); } catch (error) { reject(error); } }); }; ``` You can then run this function with: ```python Python text_to_speech_file("Hello World") ``` ```typescript TypeScript await createAudioFileFromText('Hello World'); ``` ## Convert text to speech (streaming) If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature. ```python text_to_speech_stream.py (Python) import os from typing import IO from io import BytesIO from elevenlabs import VoiceSettings from elevenlabs.client import ElevenLabs ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY") client = ElevenLabs( api_key=ELEVENLABS_API_KEY, ) def text_to_speech_stream(text: str) -> IO[bytes]: # Perform the text-to-speech conversion response = client.text_to_speech.convert( voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice output_format="mp3_22050_32", text=text, model_id="eleven_multilingual_v2", # Optional voice settings that allow you to customize the output voice_settings=VoiceSettings( stability=0.0, similarity_boost=1.0, style=0.0, use_speaker_boost=True, speed=1.0, ), ) # Create a BytesIO object to hold the audio data in memory audio_stream = BytesIO() # Write each chunk of audio data to the stream for chunk in response: if chunk: audio_stream.write(chunk) # Reset stream position to the beginning audio_stream.seek(0) # Return the stream for further use return audio_stream ``` ```typescript text_to_speech_stream.ts (Typescript) import * as dotenv from 'dotenv'; import { ElevenLabsClient } from 'elevenlabs'; dotenv.config(); const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY; if (!ELEVENLABS_API_KEY) { throw new Error('Missing ELEVENLABS_API_KEY in environment variables'); } const client = new ElevenLabsClient({ apiKey: ELEVENLABS_API_KEY, }); export const createAudioStreamFromText = async (text: string): Promise => { const audioStream = await client.textToSpeech.convertAsStream('JBFqnCBsd6RMkjVDRZzb', { model_id: 'eleven_multilingual_v2', text, output_format: 'mp3_44100_128', // Optional voice settings that allow you to customize the output voice_settings: { stability: 0, similarity_boost: 1.0, use_speaker_boost: true, speed: 1.0, }, }); const chunks: Buffer[] = []; for await (const chunk of audioStream) { chunks.push(chunk); } const content = Buffer.concat(chunks); return content; }; ``` You can then run this function with: ```python Python text_to_speech_stream("This is James") ``` ```typescript TypeScript await createAudioStreamFromText('This is James'); ``` ## Bonus - Uploading to AWS S3 and getting a secure sharing link Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link. To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your `.env` file. Follow these steps to find the credentials: 1. Log in to your AWS Management Console: Navigate to the AWS home page and sign in with your account. 2. Access the IAM (Identity and Access Management) Dashboard: You can find IAM under "Security, Identity, & Compliance" on the services menu. The IAM dashboard manages access to your AWS services securely. 3. Create a New User (if necessary): On the IAM dashboard, select "Users" and then "Add user". Enter a user name. 4. Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it's best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket. 5. Review and create the user: Review your settings and create the user. Upon creation, you'll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step. 6. Get AWS region name: ex. us-east-1 If you do not have an AWS S3 bucket, you will need to create a new one by following these steps: 1. Access the S3 dashboard: You can find S3 under "Storage" on the services menu. 2. Create a new bucket: On the S3 dashboard, click the "Create bucket" button. 3. Enter a bucket name and click on the "Create bucket" button. You can leave the other bucket options as default. The newly added bucket will appear in the list. Install `boto3` for interacting with AWS services using `pip` and `npm`. ```bash Python pip install boto3 ``` ```bash TypeScript npm install @aws-sdk/client-s3 npm install @aws-sdk/s3-request-presigner ``` Then add the environment variables to `.env` file like so: ``` AWS_ACCESS_KEY_ID=your_aws_access_key_id_here AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here AWS_REGION_NAME=your_aws_region_name_here AWS_S3_BUCKET_NAME=your_s3_bucket_name_here ``` Add the following functions to upload the audio stream to S3 and generate a signed URL. ```python s3_uploader.py (Python) import os import boto3 import uuid AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") AWS_REGION_NAME = os.getenv("AWS_REGION_NAME") AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME") session = boto3.Session( aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, region_name=AWS_REGION_NAME, ) s3 = session.client("s3") def generate_presigned_url(s3_file_name: str) -> str: signed_url = s3.generate_presigned_url( "get_object", Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name}, ExpiresIn=3600, ) # URL expires in 1 hour return signed_url def upload_audiostream_to_s3(audio_stream) -> str: s3_file_name = f"{uuid.uuid4()}.mp3" # Generates a unique file name using UUID s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name) return s3_file_name ``` ```typescript s3_uploader.ts (TypeScript) import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3'; import { getSignedUrl } from '@aws-sdk/s3-request-presigner'; import * as dotenv from 'dotenv'; import { v4 as uuid } from 'uuid'; dotenv.config(); const { AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME, AWS_S3_BUCKET_NAME } = process.env; if (!AWS_ACCESS_KEY_ID || !AWS_SECRET_ACCESS_KEY || !AWS_REGION_NAME || !AWS_S3_BUCKET_NAME) { throw new Error('One or more environment variables are not set. Please check your .env file.'); } const s3 = new S3Client({ credentials: { accessKeyId: AWS_ACCESS_KEY_ID, secretAccessKey: AWS_SECRET_ACCESS_KEY, }, region: AWS_REGION_NAME, }); export const generatePresignedUrl = async (objectKey: string) => { const getObjectParams = { Bucket: AWS_S3_BUCKET_NAME, Key: objectKey, Expires: 3600, }; const command = new GetObjectCommand(getObjectParams); const url = await getSignedUrl(s3, command, { expiresIn: 3600 }); return url; }; export const uploadAudioStreamToS3 = async (audioStream: Buffer) => { const remotePath = `${uuid()}.mp3`; await s3.send( new PutObjectCommand({ Bucket: AWS_S3_BUCKET_NAME, Key: remotePath, Body: audioStream, ContentType: 'audio/mpeg', }) ); return remotePath; }; ``` You can then call uploading function with the audio stream from the text. ```python Python s3_file_name = upload_audiostream_to_s3(audio_stream) ``` ```typescript TypeScript const s3path = await uploadAudioStreamToS3(stream); ``` After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing. You can now generate a URL from a file with: ```python Python signed_url = generate_presigned_url(s3_file_name) print(f"Signed URL to access the file: {signed_url}") ``` ```typescript TypeScript const presignedUrl = await generatePresignedUrl(s3path); console.log('Presigned URL:', presignedUrl); ``` If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire. To put it all together, you can use the following script: ```python main.py (Python) import os from dotenv import load_dotenv load_dotenv() from text_to_speech_stream import text_to_speech_stream from s3_uploader import upload_audiostream_to_s3, generate_presigned_url def main(): text = "This is James" audio_stream = text_to_speech_stream(text) s3_file_name = upload_audiostream_to_s3(audio_stream) signed_url = generate_presigned_url(s3_file_name) print(f"Signed URL to access the file: {signed_url}") if __name__ == "__main__": main() ``` ```typescript index.ts (Typescript) import 'dotenv/config'; import { generatePresignedUrl, uploadAudioStreamToS3 } from './s3_uploader'; import { createAudioFileFromText } from './text_to_speech_file'; import { createAudioStreamFromText } from './text_to_speech_stream'; (async () => { // save the audio file to disk const fileName = await createAudioFileFromText( 'Today, the sky is exceptionally clear, and the sun shines brightly.' ); console.log('File name:', fileName); // OR stream the audio, upload to S3, and get a presigned URL const stream = await createAudioStreamFromText( 'Today, the sky is exceptionally clear, and the sun shines brightly.' ); const s3path = await uploadAudioStreamToS3(stream); const presignedUrl = await generatePresignedUrl(s3path); console.log('Presigned URL:', presignedUrl); })(); ``` ## Conclusion You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically. Here are some examples of what you could build with this. 1. **Educational Podcasts**: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting. 2. **Accessibility Features for Websites**: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning. 3. **Automated Customer Support Messages**: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails. 4. **Audio Books and Narration**: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading. 5. **Language Learning Tools**: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way. For more details, visit the following to see the full project files which give a clear structure for setting up your application: For Python: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) For TypeScript: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues). # Stitching multiple requests > Learn how to maintain voice prosody over multiple chunks/generations. ## What is Request Stitching? When one has a large text to convert into audio and sends the text in chunks without further context there can be abrupt changes in prosody from one chunk to another. It would be much better to give the model context on what was already generated and what will be generated in the future, this is exactly what Request Stitching does. As you can see below the difference between not using Request Stitching and using it is subtle but noticeable: #### Without Request Stitching: