# ElevenLabs > ElevenLabs is an AI audio research and deployment company. {/* Light mode wave */}
{/* Dark mode wave */}
## Most popular Learn how to integrate ElevenLabs Deploy voice agents in minutes Learn how to use ElevenLabs Dive into our API reference ## Meet the models
Eleven v3
Alpha
} href="/docs/models#eleven-v3-alpha"> Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
3,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Explore all](/docs/models)
## Capabilities
Text to Speech

Convert text into lifelike speech

Speech to Text

Transcribe spoken audio into text

Voice changer

Modify and transform voices

Voice isolator

Isolate voices from background noise

Dubbing

Dub audio and videos seamlessly

Sound effects

Create cinematic sound effects

Voices

Clone and design custom voices

Agents Platform

Deploy intelligent voice agents

## Product guides
Product guides

Explore our product guides for step-by-step guidance

Voice library
† Excluding application & network latency # Developer quickstart > Learn how to make your first ElevenLabs API request. The ElevenLabs API provides a simple interface to state-of-the-art audio [models](/docs/models) and [features](/docs/api-reference/introduction). Follow this guide to learn how to create lifelike speech with our Text to Speech API. See the [developer guides](/docs/quickstart#explore-our-developer-guides) for more examples with our other products. ## Using the Text to Speech API [Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication). Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference. ```js title=".env" ELEVENLABS_API_KEY= ``` We'll also use the `dotenv` library to load our API key from an environment variable. ```python pip install elevenlabs pip install python-dotenv ``` ```typescript npm install @elevenlabs/elevenlabs-js npm install dotenv ``` To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/) and/or [ffmpeg](https://ffmpeg.org/). Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code: {/* This snippet was auto-generated */} ```python from dotenv import load_dotenv from elevenlabs.client import ElevenLabs from elevenlabs.play import play import os load_dotenv() elevenlabs = ElevenLabs( api_key=os.getenv("ELEVENLABS_API_KEY"), ) audio = elevenlabs.text_to_speech.convert( text="The first move is what sets everything in motion.", voice_id="JBFqnCBsd6RMkjVDRZzb", model_id="eleven_multilingual_v2", output_format="mp3_44100_128", ) play(audio) ``` ```typescript import { ElevenLabsClient, play } from '@elevenlabs/elevenlabs-js'; import 'dotenv/config'; const elevenlabs = new ElevenLabsClient(); const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', { text: 'The first move is what sets everything in motion.', modelId: 'eleven_multilingual_v2', outputFormat: 'mp3_44100_128', }); await play(audio); ``` ```python python example.py ``` ```typescript npx tsx example.mts ``` You should hear the audio play through your speakers. ## Explore our developer guides Now that you've made your first ElevenLabs API request, you can explore the other products that ElevenLabs offers. Convert spoken audio into text Deploy conversational voice agents Generate studio-quality music Clone a voice Remix a voice Generate sound effects from text Transform the voice of an audio file Isolate background noise from audio Generate voices from a single text prompt Dub audio/video from one language to another Generate time-aligned transcripts for audio # Models > Learn about the models that power the ElevenLabs API. ## Flagship models ### Text to Speech
Eleven v3
Alpha
} href="/docs/models#eleven-v3-alpha"> Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
3,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
### Speech to Text State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
### Music Studio-grade music with natural language prompts in any style
Complete control over genre, style, and structure
Vocals or just instrumental
Multilingual, including English, Spanish, German, Japanese and more
Edit the sound and lyrics of individual sections or the whole song
[Pricing](https://elevenlabs.io/pricing/api)
## Models overview The ElevenLabs API offers a range of audio models optimized for different use cases, quality levels, and performance requirements. | Model ID | Description | Languages | | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `eleven_v3` | Human-like and expressive speech generation | [70+ languages](/docs/models#supported-languages) | | `eleven_ttv_v3` | Human-like and expressive voice design model (Text to Voice) | [70+ languages](/docs/models#supported-languages) | | `eleven_multilingual_v2` | Our most lifelike model with rich emotional expression | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` | | `eleven_flash_v2_5` | Ultra-fast model optimized for real-time use (\~75ms†) | All `eleven_multilingual_v2` languages plus: `hu`, `no`, `vi` | | `eleven_flash_v2` | Ultra-fast model optimized for real-time use (\~75ms†) | `en` | | `eleven_turbo_v2_5` | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`, `hu`, `no`, `vi` | | `eleven_turbo_v2` | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms) | `en` | | `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` | | `eleven_multilingual_ttv_v2` | State-of-the-art multilingual voice designer model (Text to Voice) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` | | `eleven_english_sts_v2` | English-only voice changer model (Speech to Speech) | `en` | | `scribe_v1` | State-of-the-art speech recognition model | [99 languages](/docs/capabilities/speech-to-text#supported-languages) | | `scribe_v1_experimental` | State-of-the-art speech recognition model with experimental features: improved multilingual performance, reduced hallucinations during silence, fewer audio tags, and better handling of early transcript termination | [99 languages](/docs/capabilities/speech-to-text#supported-languages) | † Excluding application & network latency ### Deprecated models The `eleven_monolingual_v1` and `eleven_multilingual_v1` models are deprecated and will be removed in the future. Please migrate to newer models for continued service. | Model ID | Description | Languages | Replacement model suggestion | | ------------------------ | ---------------------------------------------------- | ---------------------------------------------- | ---------------------------- | | `eleven_monolingual_v1` | First generation TTS model (outclassed by v2 models) | `en` | `eleven_multilingual_v2` | | `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models) | `en`, `fr`, `de`, `hi`, `it`, `pl`, `pt`, `es` | `eleven_multilingual_v2` | ## Eleven v3 (alpha) This model is currently in alpha and is subject to change. Eleven v3 is not made for real-time applications like Agents Platform. When integrating Eleven v3 into your application, consider generating several generations and allowing the user to select the best one. Eleven v3 is our latest and most advanced speech synthesis model. It is a state-of-the-art model that produces natural, life-like speech with high emotional range and contextual understanding across multiple languages. This model works well in the following scenarios: * **Character Discussions**: Excellent for audio experiences with multiple characters that interact with each other. * **Audiobook Production**: Perfect for long-form narration with complex emotional delivery. * **Emotional Dialogue**: Generate natural, lifelike dialogue with high emotional range and contextual understanding. With Eleven v3 comes a new Text to Dialogue API, which allows you to generate natural, lifelike dialogue with high emotional range and contextual understanding across multiple languages. Eleven v3 can also be used with the Text to Speech API to generate natural, lifelike speech with high emotional range and contextual understanding across multiple languages. Read more about the Text to Dialogue API [here](/docs/capabilities/text-to-dialogue). ### Supported languages The Eleven v3 model supports 70+ languages, including: *Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).* ## Multilingual v2 Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages. The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent. This model excels in scenarios requiring high-quality, emotionally nuanced speech: * **Character Voiceovers**: Ideal for gaming and animation due to its emotional range. * **Professional Content**: Well-suited for corporate videos and e-learning materials. * **Multilingual Projects**: Maintains consistent voice quality across language switches. * **Stable Quality**: Produces consistent, high-quality audio output. While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important. Our multilingual v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* ## Flash v2.5 Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and Agents Platform. It delivers high-quality speech with ultra-low latency (\~75ms†) across 32 languages. The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages. This model is particularly well-suited for: * **Agents Platform**: Perfect for real-time voice agents and chatbots. * **Interactive Applications**: Ideal for games and applications requiring immediate response. * **Large-Scale Processing**: Efficient for bulk text-to-speech conversion. With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages. Flash v2.5 supports 32 languages - all languages from v2 models plus: *Hungarian, Norwegian & Vietnamese* † Excluding application & network latency ### Considerations When using Flash v2.5, numbers aren't normalized by default in a way you might expect. For example, phone numbers might be read out in way that isn't clear for the user. Dates and currencies are affected in a similar manner. By default, normalization is disabled for Flash v2.5 to maintain the low latency. However, Enterprise customers can now enable text normalization for v2.5 models by setting the `apply_text_normalization` parameter to "on" in your request. The Multilingual v2 model does a better job of normalizing numbers, so we recommend using it for phone numbers and other cases where number normalization is important. For low-latency or Agents Platform applications, best practice is to have your LLM [normalize the text](/docs/best-practices/prompting/normalization) before passing it to the TTS model, or use the `apply_text_normalization` parameter (Enterprise plans only for v2.5 models). ## Turbo v2.5 Eleven Turbo v2.5 is our high-quality, low-latency model with a good balance of quality and speed. This model is an ideal choice for all scenarios where you'd use Flash v2.5, but where you're willing to trade off latency for higher quality voice generation. ## Model selection guide Use `eleven_multilingual_v2` Best for high-fidelity audio output with rich emotional expression Use Flash models Optimized for real-time applications (\~75ms latency) Use either either `eleven_multilingual_v2` or `eleven_flash_v2_5` Both support up to 32 languages Use `eleven_turbo_v2_5` Good balance between quality and speed Use `eleven_multilingual_v2` Ideal for professional content, audiobooks & video narration. Use `eleven_flash_v2_5`, `eleven_flash_v2`, `eleven_multilingual_v2`, `eleven_turbo_v2_5` or `eleven_turbo_v2` Perfect for real-time conversational applications Use `eleven_multilingual_sts_v2` Specialized for Speech-to-Speech conversion ## Character limits The maximum number of characters supported in a single text-to-speech request varies by model. | Model ID | Character limit | Approximate audio duration | | ------------------------ | --------------- | -------------------------- | | `eleven_v3` | 3,000 | \~3 minutes | | `eleven_flash_v2_5` | 40,000 | \~40 minutes | | `eleven_flash_v2` | 30,000 | \~30 minutes | | `eleven_turbo_v2_5` | 40,000 | \~40 minutes | | `eleven_turbo_v2` | 30,000 | \~30 minutes | | `eleven_multilingual_v2` | 10,000 | \~10 minutes | | `eleven_multilingual_v1` | 10,000 | \~10 minutes | | `eleven_english_sts_v2` | 10,000 | \~10 minutes | | `eleven_english_sts_v1` | 10,000 | \~10 minutes | For longer content, consider splitting the input into multiple requests. ## Scribe v1 Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging. This model excels in scenarios requiring accurate speech-to-text conversion: * **Transcription Services**: Perfect for converting audio/video content to text * **Meeting Documentation**: Ideal for capturing and documenting conversations * **Content Analysis**: Well-suited for audio content processing and analysis * **Multilingual Recognition**: Supports accurate transcription across 99 languages Key features: * Accurate transcription with word-level timestamps * Speaker diarization for multi-speaker audio * Dynamic audio tagging for enhanced context * Support for 99 languages Read more about Scribe v1 [here](/docs/capabilities/speech-to-text). ## Eleven Music Eleven Music is our studio-grade music generation model. It allows you to generate music with natural language prompts in any style. This model is excellent for the following scenarios: * **Game Soundtracks**: Create immersive soundtracks for games * **Podcast Backgrounds**: Enhance podcasts with professional music * **Marketing**: Add background music to ad reels Key features: * Complete control over genre, style, and structure * Vocals or just instrumental * Multilingual, including English, Spanish, German, Japanese and more * Edit the sound and lyrics of individual sections or the whole song Read more about Eleven Music [here](/docs/capabilities/music). ## Concurrency and priority Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue. Speech to Text has an elevated concurrency limit. Once the concurrency limit is met, subsequent requests are processed in a queue alongside lower-priority requests. In practice this typically only adds \~50ms of latency. | Plan | Concurrency Limit
(Multilingual v2) | Concurrency Limit
(Turbo & Flash) | STT Concurrency Limit | Music Concurrency limit | Priority level | | ---------- | ----------------------------------------- | --------------------------------------- | --------------------- | ----------------------- | -------------- | | Free | 2 | 4 | 8 | N/A | 3 | | Starter | 3 | 6 | 12 | 2 | 4 | | Creator | 5 | 10 | 20 | 2 | 5 | | Pro | 10 | 20 | 40 | 2 | 5 | | Scale | 15 | 30 | 60 | 3 | 5 | | Business | 15 | 30 | 60 | 3 | 5 | | Enterprise | Elevated | Elevated | Elevated | Highest | Highest | Startup grants recipients receive Scale level benefits. The response headers include `current-concurrent-requests` and `maximum-concurrent-requests` which you can use to monitor your concurrency. ### API requests per minute vs concurrent requests It's important to understand that **API requests per minute** and **concurrent requests** are different metrics that depend on your usage patterns. API requests per minute can be different from concurrent requests since it depends on the length of time for each request and how the requests are batched. **Example 1: Spaced requests** If you had 180 requests per minute that each took 1 second to complete and you sent them each 0.33 seconds apart, the max concurrent requests would be 3 and the average would be 3 since there would always be 3 in flight. **Example 2: Batched requests** However, if you had a different usage pattern such as 180 requests per minute that each took 3 seconds to complete but all fired at once, the max concurrent requests would be 180 and the average would be 9 (first 3 seconds of the minute saw 180 requests at once, final 57 seconds saw 0 requests). Since our system cares about concurrency, requests per minute matter less than how long each of the requests take and the pattern of when they are sent. How endpoint requests are made impacts concurrency limits: * With HTTP, each request counts individually toward your concurrency limit. * With a WebSocket, only the time where our model is generating audio counts towards your concurrency limit, this means a for most of the time an open websocket doesn't count towards your concurrency limit at all. ### Understanding concurrency limits The concurrency limit associated with your plan should not be interpreted as the maximum number of simultaneous conversations, phone calls character voiceovers, etc that can be handled at once. The actual number depends on several factors, including the specific AI voices used and the characteristics of the use case. As a general rule of thumb, a concurrency limit of 5 can typically support up to approximately 100 simultaneous audio broadcasts. This is because of the speed it takes for audio to be generated relative to the time it takes for the TTS request to be processed. The diagram below is an example of how 4 concurrent calls with different users can be facilitated while only hitting 2 concurrent requests. Concurrency limits Where TTS is used to facilitate dialogue, a concurrency limit of 5 can support about 100 broadcasts for balanced conversations between AI agents and human participants. For use cases in which the AI agent speaks less frequently than the human, such as customer support interactions, more than 100 simultaneous conversations could be supported. Generally, more than 100 simultaneous character voiceovers can be supported for a concurrency limit of 5. The number can vary depending on the character’s dialogue frequency, the length of pauses, and in-game actions between lines. Concurrent dubbing streams generally follow the provided heuristic. If the broadcast involves periods of conversational pauses (e.g. because of a soundtrack, visual scenes, etc), more simultaneous dubbing streams than the suggestion may be possible. If you exceed your plan's concurrency limits at any point and you are on the Enterprise plan, model requests may still succeed, albeit slower, on a best efforts basis depending on available capacity. To increase your concurrency limit & queue priority, [upgrade your subscription plan](https://elevenlabs.io/pricing/api). Enterprise customers can request a higher concurrency limit by contacting their account manager. ### Scale testing concurrency limits Scale testing can be useful to identify client side scaling issues and to verify concurrency limits are set correctly for your usecase. It is heavily recommended to test end-to-end workflows as close to real world usage as possible, simulating and measuring how many users can be supported is the recommended methodology for achieving this. It is important to: * Simulate users, not raw requests * Simulate typical user behavior such as waiting for audio playback, user speaking or transcription to finish before making requests * Ramp up the number of users slowly over a period of minutes * Introduce randomness to request timings and to the size of requests * Capture latency metrics and any returned error codes from the API For example, to test an agent system designed to support 100 simultaneous conversations you would create up to 100 individual "users" each simulating a conversation. Conversations typically consist of a repeating cycle of \~10 seconds of user talking, followed by the TTS API call for \~150 characters, followed by \~10 seconds of audio playback to the user. Therefore, each user should follow the pattern of making a websocket Text-to-Speech API call for 150 characters of text every 20 seconds, with a small amount of randomness introduced to the wait period and the number of characters requested. The test would consist of spawning one user per second until 100 exist and then testing for 10 minutes in total to test overall stability. This example uses [locust](https://locust.io/) as the testing framework with direct API calls to the ElevenLabs API. It follows the example listed above, testing a conversational agent system with each user sending 1 request every 20 seconds. ```python title="Python" {12} import json import random import time import gevent import locust from locust import User, task, events, constant_throughput import websocket # Averages up to 10 seconds of audio when played, depends on the voice speed DEFAULT_TEXT = ( "Hello, this is a test message. I am testing if a long input will cause issues for the model " "like this sentence. " ) TEXT_ARRAY = [ "Hello.", "Hello, this is a test message.", DEFAULT_TEXT, DEFAULT_TEXT * 2, DEFAULT_TEXT * 3 ] # Custom command line arguments @events.init_command_line_parser.add_listener def on_parser_init(parser): parser.add_argument("--api-key", default="YOUR_API_KEY", help="API key for authentication") parser.add_argument("--encoding", default="mp3_22050_32", help="Encoding") parser.add_argument("--text", default=DEFAULT_TEXT, help="Text to use") parser.add_argument("--use-text-array", default="false", help="Text to use") parser.add_argument("--voice-id", default="aria", help="Text to use") class WebSocketTTSUser(User): # Each user will send a request every 20 seconds, regardless of how long each request takes wait_time = constant_throughput(0.05) def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.api_key = self.environment.parsed_options.api_key self.voice_id = self.environment.parsed_options.voice_id self.text = self.environment.parsed_options.text self.encoding = self.environment.parsed_options.encoding self.use_text_array = self.environment.parsed_options.use_text_array if self.use_text_array: self.text = random.choice(TEXT_ARRAY) self.all_recieved = False @task def tts_task(self): # Do jitter waiting of up to 1 second # Users appear to be spawned every second so this ensures requests are not aligned gevent.sleep(random.random()) max_wait_time = 10 # Connection details uri = f"{self.environment.host}/v1/text-to-speech/{self.voice_id}/stream-input?auto_mode=true&output_format={self.encoding}" headers = {"xi-api-key": self.api_key} ws = None self.all_recieved = False try: init_msg = {"text": " "} # Use proper header format for websocket - this is case sensitive! ws = websocket.create_connection(uri, header=headers) ws.send(json.dumps(init_msg)) # Start measuring after websocket initiated but before any messages are sent send_request_time = time.perf_counter() ws.send(json.dumps({"text": self.text})) # Send to flush and receive the audio ws.send(json.dumps({"text": ""})) def _receive(): t_first_response = None audio_size = 0 try: while True: # Wait up to 10 seconds for a response ws.settimeout(max_wait_time) response = ws.recv() response_data = json.loads(response) if "audio" in response_data and response_data["audio"]: audio_size = audio_size + len(response_data["audio"]) if t_first_response is None: t_first_response = time.perf_counter() first_byte_ms = ( t_first_response - send_request_time ) * 1000 if audio_size is None: # The first response should always have audio locust.events.request.fire( request_type="websocket", name="Bad Response (no audio)", response_time=first_byte_ms, response_length=audio_size, exception=Exception("Response has no audio"), ) break if "isFinal" in response_data and response_data["isFinal"]: # Fire this event once finished streaming, but report the important TTFB metric locust.events.request.fire( request_type="websocket", name="TTS Stream Success (First Byte)", response_time=first_byte_ms, response_length=audio_size, exception=None, ) break except websocket.WebSocketTimeoutException: locust.events.request.fire( request_type="websocket", name="TTS Stream Timeout", response_time=max_wait_time * 1000, response_length=audio_size, exception=Exception("Timeout waiting for response"), ) except Exception as e: # Typically JSON decode error if the server returns HTTP backoff error locust.events.request.fire( request_type="websocket", name="TTS Stream Failure", response_time=0, response_length=0, exception=e, ) finally: self.all_recieved = True gevent.spawn(_receive) # Sleep until recieved so new tasks aren't spawned while not self.all_recieved: gevent.sleep(1) except websocket.WebSocketTimeoutException: locust.events.request.fire( request_type="websocket", name="TTS Stream Timeout", response_time=max_wait_time * 1000, response_length=0, exception=Exception("Timeout waiting for response"), ) except Exception as e: locust.events.request.fire( request_type="websocket", name="TTS Stream Failure", response_time=0, response_length=0, exception=e, ) finally: # Try and close the websocket gracefully try: if ws: ws.close() except Exception: pass ``` # October 14, 2025 ### Agents Platform - **LLM overrides**: Added support for overriding an agent's LLM during a conversation, enabling you to specify a different language model on a per-conversation basis. This is useful for testing different models or accommodating specific requirements while maintaining HIPAA and data residency compliance. - **Post-call webhook failures**: Added the option to send post-call webhook events in the event of a phone call failure. This allows you to track and respond to failed call attempts through your webhook endpoint, providing better visibility into call issues. ### SDK Releases #### Python SDK - [v2.18.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.18.0) - Added support for streaming, Music API inpainting, and Agent Workflows #### JavaScript SDK - [v2.19.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.19.0) - Added support for Music API inpainting and Agent Workflows - [v2.18.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.18.0) - API schema updates #### Client Packages - [@elevenlabs/agents-cli@0.5.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/agents-cli@0.5.0) - Significantly reworked agents pull command with bugfixes and improvements - [@elevenlabs/react@0.8.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react@0.8.0) - Fixed import issues - [@elevenlabs/react-native@0.4.3](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.4.3) - Fixed onConnect timing - [@elevenlabs/react-native@0.4.2](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.4.2) - Reverted change to ICE transport policy - [@elevenlabs/react-native@0.4.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.4.1) - Fixed import issues # October 7, 2025 ### Agents Platform - **Gemini 2.5 Flash Preview models**: Added support for `gemini-2.5-flash-preview-09-2025` and `gemini-2.5-flash-lite-preview-09-2025` LLM models, providing access to the latest September 2025 preview versions of Google's Gemini 2.5 Flash models. - **Claude Sonnet 4.5**: Added support for `claude-sonnet-4-5` and `claude-sonnet-4-5@20250929` models, enabling access to the latest Claude Sonnet 4.5 model released on September 29, 2025. - **Test invocations listing**: Added new `GET /v1/convai/test-invocations` endpoint to list all test invocations with pagination support. Includes `agent_id` filter, `page_size` parameter (default 30, max 100), and `cursor` for pagination. Response includes test run counts, pass/fail statistics, and titles. - **Agent archiving**: Added `archived` field (boolean, default false) to agent platform settings, allowing agents to be archived without deletion while keeping them out of active agent lists. - **MCP Server interruption control**: Added `disable_interruptions` field (boolean, default false) to MCP server configuration, preventing user interruptions during tool execution for more reliable tool completion. - **Streaming agent responses**: Added `agent_chat_response_part` WebSocket event type for receiving partial agent chat responses in real-time during streaming conversations. - **Workflow edge ordering**: Added `edge_order` field (array of strings) to all workflow node types, enabling explicit control over edge evaluation order for deterministic workflow execution. - **Test suite agent tracking**: Added `agent_id` field (string, nullable) to test invocation responses for associating test runs with specific agents. ### Voice Management - **Voice generation source tracking**: Added `VoiceGeneration` as a new source type in the History API for tracking audio generated from voice generation features. ### Telephony - **SIP trunk TLS validation**: Added `remote_domains` field (array of strings, nullable) to SIP trunk configuration for specifying domains used in TLS certificate validation. ## SDK Releases ### JavaScript SDK - [v2.18.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.18.0) - Updated with latest API schema changes from October 8, 2025 ### Python SDK - [v2.17.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.17.0) - Updated with latest API schema changes and URL generation fixes from October 6, 2025 ### Packages All packages updated with latest API schema changes: - [@elevenlabs/react-native@0.3.2](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Freact-native%400.3.2) - Updated TypeScript types and API client with new fields for agent archiving, MCP server configuration, and test invocations - [@elevenlabs/react@0.7.1](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Freact%400.7.1) - Updated React hooks and components with support for new agent settings and WebSocket events - [@elevenlabs/client@0.7.1](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Fclient%400.7.1) - Core client library updated with new endpoint for test invocations listing and reorganized SDK method paths for secrets management - [@elevenlabs/agents-cli@0.4.2](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Fagents-cli%400.4.2) - CLI tool updated with support for new agent archiving flag and test invocation commands ### MCP Server - [v0.9.0](https://github.com/elevenlabs/elevenlabs-mcp/releases/tag/v0.9.0) - Added option to return MCP server results as resource items for better integration with resource-based workflows ## API ## New Endpoints ### Agents Platform - `GET /v1/convai/test-invocations` - List all test invocations with pagination support - **Parameters:** - `agent_id` (required, string) - Filter by agent ID - `page_size` (optional, integer, default=30, max=100) - Number of results per page - `cursor` (optional, string) - Pagination cursor from previous response - **Response:** Returns paginated list with test run counts, pass/fail statistics, titles, and next cursor ## New Fields ### Agents Platform - **Agent Settings**: Added `archived` field (boolean, default false) to `AgentPlatformSettingsRequestModel` and `AgentPlatformSettingsResponseModel` for archiving agents - **MCP Servers**: Added `disable_interruptions` field (boolean, default false) to MCP server configuration schemas for preventing user interruptions during tool execution - **Workflows**: Added `edge_order` field (array of strings) to all workflow node types for explicit edge evaluation ordering - **Test Invocations**: Added `agent_id` field (string, nullable) to `GetTestSuiteInvocationResponseModel` for agent tracking ### Telephony - **SIP Trunks**: Added `remote_domains` field (array of strings, nullable) to `GetPhoneNumberInboundSIPTrunkConfigResponseModel` and `InboundSIPTrunkConfigRequestModel` for TLS certificate validation ### WebSocket Events - Added `agent_chat_response_part` to `ServerEventType` enum for streaming partial agent chat responses ### Voice Management - Added `VoiceGeneration` to speech history source types ## New LLM Models Added the following models to the LLM enum: - `claude-sonnet-4-5` - Claude Sonnet 4.5 latest - `claude-sonnet-4-5@20250929` - Claude Sonnet 4.5 dated release (September 29, 2025) - `gemini-2.5-flash-preview-09-2025` - Gemini 2.5 Flash preview (September 2025) - `gemini-2.5-flash-lite-preview-09-2025` - Gemini 2.5 Flash Lite preview (September 2025) ## Other Changes ### Pronunciation Dictionaries - Updated parameter description for `version_id` in `GET /v1/pronunciation-dictionaries/{dictionary_id}/{version_id}/download` from "The id of the version of the pronunciation dictionary" to "The id of the pronunciation dictionary version" - Removed documentation note about UI limitation for multiple pronunciation dictionaries (multiple dictionaries now fully supported in UI) ### Conversation History - Made `type` field optional in `ConversationHistoryTranscriptOtherToolsResultCommonModel` (previously required) # September 29, 2025 ### v1 TTS model deprecation The `eleven_monolingual_v1` and `eleven_multilingual_v1` models are deprecated and will be removed on December 15th, 2025. Please [migrate to newer models](https://elevenlabs.io/docs/models#deprecated-models) for continued service. ### Agents Platform - **Workflow Expressions**: Workflows now support complex expressions that allow for defining deterministic conditions using logical operators, dynamic variables and LLM evaluation. This enables more sophisticated agent logic and decision-making capabilities. - **MCP Server Interrupt Control**: Added option to disable interruptions during all tool calls for MCP Servers, providing better control over agent behavior during tool execution. - **Audio Alignment Data**: Agents now have a flag to enable alignment data in audio events, useful for audio-text synchronization use cases such as lip sync applications. - **Ignore Default Personality Setting**: The Agents Platform configuration page now includes a checkbox to toggle whether agents should ignore the default helpful personality, giving developers more control over agent behavior. ### Speech to Text - **Fixed Base64 Encoding Flag**: Resolved an issue where the `is_base64_encoded` flag in STT responses was incorrectly set to false for PDF and DOCX formats, even when content was actually base64 encoded. ### SDK Releases #### JavaScript SDK - **v2.16.0**: Updated with latest API schema changes from September 19, 2025. #### Packages - **@elevenlabs/types@0.0.1**: New public TypeScript types package providing shared type definitions across ElevenLabs integrations. - **@elevenlabs/react@0.7.0** and **@elevenlabs/client@0.7.0**: Added support for passing custom script paths to avoid blob: and data: URLs for improved security and flexibility. - **@elevenlabs/convai-widget-embed@0.3.0** and **@elevenlabs/convai-widget-core@0.3.0**: Added `use_rtc` attribute for widget functionality and added expand event support for better widget interaction handling. ### API ## Updated Endpoints ### Agents Platform - **POST /v1/convai/agents/create**: Added `ignore_default_personality` boolean parameter to control whether agents should ignore the default helpful personality behavior - **PATCH /v1/convai/agents/{agent_id}**: Added `ignore_default_personality` field support for agent updates - **GET /v1/convai/agents/{agent_id}**: Response now includes `ignore_default_personality` field in agent configuration - **POST /v1/convai/mcp-servers**: Added interrupt control configuration parameters for disabling interruptions during tool calls - **PATCH /v1/convai/mcp-servers/{mcp_server_id}**: Enhanced with interrupt control settings for MCP server configuration - **GET /v1/convai/mcp-servers/{mcp_server_id}**: Response includes new interrupt control configuration fields - **GET /v1/convai/conversations/{conversation_id}**: Response enhanced with alignment data fields for audio-text synchronization support - **POST /v1/convai/agent-testing/create**: Enhanced to support workflow expressions functionality in agent testing - **GET /v1/convai/agent-testing/{test_id}**: Response includes additional fields for workflow expression test results - **PUT /v1/convai/agent-testing/{test_id}**: Request and response schemas updated for workflow expression support - **POST /v1/convai/agents/{agent_id}/simulate-conversation**: Request schema updated to support workflow expressions in conversation simulation - **POST /v1/convai/agents/{agent_id}/simulate-conversation/stream**: Streaming conversation simulation with workflow expression support - **GET /v1/convai/settings**: Response includes new platform configuration options - **PATCH /v1/convai/settings**: Request schema updated with new platform settings - **POST /v1/convai/batch-calling/submit**: Request schema updates for batch calling functionality - **PATCH /v1/convai/mcp-servers/{mcp_server_id}/approval-policy**: Response schema updated for approval policy management - **POST /v1/convai/mcp-servers/{mcp_server_id}/tool-approvals**: Response schema enhanced for tool approval handling - **DELETE /v1/convai/mcp-servers/{mcp_server_id}/tool-approvals/{tool_name}**: Response schema updated for tool approval removal ### Speech to Text - **POST /v1/speech-to-text**: Fixed `is_base64_encoded` boolean flag to correctly return `true` when PDF and DOCX document content is base64 encoded ### Text to Speech - **POST /v1/text-to-speech/{voice_id}/with-timestamps**: Request and response schemas updated for enhanced timestamp functionality - **POST /v1/text-to-speech/{voice_id}/stream**: Request schema updated for improved streaming parameters - **POST /v1/text-to-speech/{voice_id}/stream/with-timestamps**: Request and response schemas updated for streaming with timestamps - **POST /v1/text-to-voice/create-previews**: Request schema enhanced with new preview generation options - **POST /v1/text-to-voice**: Response schema updated with additional voice creation data - **POST /v1/text-to-voice/{voice_id}/remix**: Request schema enhanced for voice remixing parameters ### Voice Management - **GET /v1/voices**: Response schema updated with new voice metadata fields - **GET /v1/voices/{voice_id}**: Response schema enhanced with additional voice properties - **GET /v1/voices/settings/default**: Response schema updated for default voice settings - **GET /v1/voices/{voice_id}/settings**: Response schema enhanced with new configuration options - **POST /v1/voices/{voice_id}/settings/edit**: Request schema updated for voice settings modification - **POST /v1/voices/pvc/{voice_id}/samples/{sample_id}**: Request schema enhanced for PVC sample management - **GET /v1/voices/pvc/{voice_id}/samples/{sample_id}/audio**: Response schema updated for audio sample retrieval - **GET /v1/voices/pvc/{voice_id}/samples/{sample_id}/speakers/{speaker_id}/audio**: Response schema enhanced for speaker-specific audio - **POST /v1/voice-generation/create-voice**: Response schema updated with new voice generation data ### Studio - **POST /v1/studio/podcasts**: Request schema enhanced with new podcast creation parameters ### User Management - **GET /v1/user**: Response schema updated with additional user profile data All changes are backward compatible and do not require immediate action from developers. # September 22, 2025 ### Productions launch Introducing Productions - our new managed service offering for ordering human-edited content that looks, sounds and feels natural. Made for creators and media businesses. Our network of linguists and audio professionals offer end-to-end production quality for: - Dubbing - Captions and subtitles - Transcription - Audiobooks You can order a project directly from the 'Productions' page in your ElevenLabs account, or by emailing productions@elevenlabs.io. Pricing starts at $2/minute, contact us for more details. ### Agents Platform - **MCP pre-tool speech**: Added support for configuring tools extracted from an MCP Server to require pre-tool execution speech. This enhancement allows agents to provide verbal context before executing specific tools, improving the conversational flow during tool usage. - **ElevenLabs hosted LLMs**: Added support for [ElevenLabs hosted LLMs](/docs/agents-platform/customization/llm#elevenlabs-experimental) which unlock lower latency by running on ElevenLabs infrastructure alongside Speech to Text and Text to Speech services. - **Enum values for tool parameters**: Added support for specifying a tool's parameters as [enum values](/docs/api-reference/tools/create#response.body.tool_config.WebhookToolConfig.api_schema.request_body_schema.properties.LiteralJsonSchemaProperty.enum) for greater control ### SDK Releases #### JavaScript SDK - **v2.16.0**: Updated the [elevenlabs-js](https://github.com/elevenlabs/elevenlabs-js) SDK with the latest API schema changes, including new MCP server endpoints and enhanced history filtering capabilities. #### Python SDK - **v2.16.0**: Updated the [elevenlabs-python](https://github.com/elevenlabs/elevenlabs-python) SDK with the latest API schema changes, including new MCP server endpoints and enhanced history filtering capabilities. - **v2.15.1**: Fixed conversation handling when no authentication is required and added asyncio event loop support for better async operations. #### Package Updates - **@elevenlabs/agents-cli@0.3.2**: Updated the Agents CLI package with improvements to agent development tools. The ConvAI CLI has been renamed to Agents CLI to align with the ElevenLabs Agents Platform branding. - **@elevenlabs/convai-cli@0.2.3**: Final release of the legacy ConvAI CLI package before migration to the new Agents CLI. - **@elevenlabs/react@0.6.3**: Updated the React components package with enhanced functionality. ### API ## New Endpoints - `PATCH /v1/convai/mcp-servers/{mcp_server_id}` - [Update MCP Server Configuration](/docs/api-reference/mcp/update): Added new endpoint to update MCP server configurations, replacing the deprecated approval policy endpoint. ## Updated Endpoints ### History Management - `GET /v1/history` - [Get generated items](/docs/api-reference/history/list): Enhanced with additional filtering parameters: - Added `model_id` parameter for filtering by specific models - Added `date_before_unix` parameter for filtering items before a specific date - Added `date_after_unix` parameter for filtering items after a specific date - Added `sort_direction` parameter for controlling sort order ## Deprecated Endpoints - `PATCH /v1/convai/mcp-servers/{mcp_server_id}/approval-policy` - Deprecated in favor of the new general MCP server update endpoint # September 15, 2025 ### Text to Speech - **WebSocket output format**: Added support for specifying output format in the first message of a WebSocket connection, providing greater flexibility for real-time audio streaming workflows. ### Agents Platform - **First message interruption control**: Added `disable_first_message_interruptions` setting to prevent agents from being interrupted during important opening messages like legal disclaimers. ### MCP Server - **Version 0.8.1**: Added data residency support. ## SDK Releases ### JavaScript SDK - **v2.15.0** - Added new Text to Voice Remix endpoint ### Python SDK - **v2.15.1** - Fixed conversation authentication issue and added asyncio event loop support - **v2.15.0** - Added new Text to Voice Remix endpoint and fixed Pydantic issues ### Packages - **@elevenlabs/react@0.6.2** - Added correction and MCP tool call events - **@elevenlabs/client@0.6.2** - Added correction and MCP tool call events - **@elevenlabs/react-native@0.3.1** - Added correction and MCP tool call events ## API ## New Endpoints - `DELETE /v1/speech-to-text/transcripts/{transcription_id}` - [Delete Transcript By Id](/docs/api-reference/speech-to-text/delete) ## Updated Endpoints ### Backward Compatible Changes - [Get dubbing](/docs/api-reference/dubbing/get) - Added the optional `order_by` and `order_direction` parameters. - [List Agents](/docs/api-reference/agents/list) - Added the optional `sort_by` and `sort_direction` parameters. - [List knowledge base documents](/docs/api-reference/knowledge-base/list) - Added the optional `sort_by` and `sort_direction` parameters. # September 8, 2025 ### Text to Speech - **Language code support**: All Text to Speech models now support language codes for improved output. Normalization has been enabled for Eleven v3, Flash, and Turbo models to enhance audio quality and consistency. ### Agents Platform - **Multi-voice agent history**: Messages from multi-voice agents are now displayed in conversation history with clear separation by voice, making it easier to follow which voice spoke which part of a conversation. ### SDK Releases #### JavaScript SDK - **v2.15.0** - Adds support for new voice remix functionality #### Python SDK - **v2.15.0** - Adds support for new voice remix functionality. Also fixed an issue with Pydantic. #### React Components - **@elevenlabs/react@0.6.1** - Fix output bytes and device input/output switching - **@elevenlabs/client@0.6.1** - Fix output bytes and device input/output switching ### MCP Server - **v0.7.0** - Latest release of the [ElevenLabs MCP Server](https://github.com/elevenlabs/elevenlabs-mcp) with new features and improvements for Claude Desktop integration. Includes new `loop` parameter for SFX generation. ### API ## New Endpoints - [Remix a voice](/docs/api-reference/text-to-voice/remix) - Create voice variations from existing voices - [Get Transcript By Id](/docs/api-reference/speech-to-text/get) - Retrieve specific transcription results ## Updated Endpoints ### Backward Compatible Changes - [Get Project](/docs/api-reference/studio/get-project) - Added optional `share_id` query parameter for project sharing functionality - [Convert Speech to Text](/docs/api-reference/speech-to-text/convert) - Modified `enable_logging` parameter for improved logging control All API changes in this release are backward compatible and will not break existing integrations. # September 1, 2025 ### Agents Platform - **Gemini 2.5 Flash Lite HIPAA compliance**: Added Gemini 2.5 Flash Lite to the list of [HIPAA approved models](/docs/agents-platform/legal/hipaa) for compliant conversations when a BAA is signed and zero-retention mode is enabled. - **Conversation ID in signed URLs**: Added support for including conversation IDs in signed URL requests, providing better tracking and identification capabilities for conversation audio access. ## SDK Releases ### JavaScript SDK - **[v2.13.0](https://github.com/elevenlabs/elevenlabs-js)** - Released August 29, 2025. Adds support for new `loop` parameter in SFX. ### Python SDK - **[v2.13.0](https://github.com/elevenlabs/elevenlabs-python)** - Released August 29, 2025. Adds support for new `loop` parameter in SFX. ### ConvAI packages - **[@elevenlabs/react v0.6.0 and @elevenlabs/client v0.6.0](https://github.com/elevenlabs/packages)** - Released August 29, 2025. Fixed setVolume functionality, added client tool debugging, and added audio device controls. ### MCP Server - **[ElevenLabs MCP Server v0.6.0](https://github.com/elevenlabs/elevenlabs-mcp)** - Released August 26, 2025. Fixed diarization functionality in speech-to-text and added music generation endpoints. ## API ## Updated Endpoints ### Dubbing - **[Render project](/docs/api-reference/dubbing/resources/render-project)** - Added optional `should_normalize_volume` query parameter to control audio normalization during rendering ### Agents Platform - **[Get signed URL](/docs/api-reference/conversations/get-signed-url)** - Added optional `include_conversation_id` query parameter to include conversation ID in the response ### Sound Effects - **[Create sound effect](/docs/api-reference/text-to-sound-effects/convert)** - Added optional `loop` parameter to create sound effects that loop smoothly ## Removed Endpoints - **Delete workspace member** - Removed the `DELETE /v1/workspace/members` endpoint for deleting workspace members. This endpoint was never meant to be publicly available. # August 25, 2025 ### Agents Platform - **Agent testing framework**: Introduced a comprehensive testing framework for ElevenLabs agents, allowing developers to create, manage, and execute automated tests for their agents. This includes test creation, execution tracking, and result analysis capabilities. - **Test invocation management**: Added support for resubmitting failed test invocations and viewing detailed test results to help developers debug and improve their agents. - **Enhanced agent configuration**: Improved agent creation and management with additional workspace override capabilities and refined platform settings. ### Text to Speech - **Pronunciation dictionary updates**: Added support for updating pronunciation dictionaries with PATCH operations, enabling more flexible dictionary management. - **Enhanced timestamp support**: Improved timestamp generation for text-to-speech conversions with better alignment data and streaming capabilities. ### SDK Releases - **TypeScript SDK v2.12.2**: Updated with the latest API schema changes, including full support for the new agent testing endpoints and enhanced Agents Platform capabilities. - **Python SDK v2.12.1**: Released with complete support for all new API features, including agent testing framework and improved workspace resource management. ### API ## New Endpoints Added 10 new endpoints this week: ### ElevenLabs agent Testing - `POST /v1/convai/agent-testing/create` - [Create Agent Response Test](/docs/api-reference/tests/create) - Create automated tests for your ElevenLabs agents - `GET /v1/convai/agent-testing/{test_id}` - [Get Agent Response Test By Id](/docs/api-reference/tests/get) - Retrieve specific test configurations and results - `PUT /v1/convai/agent-testing/{test_id}` - [Update Agent Response Test](/docs/api-reference/tests/update) - Modify existing test setups and parameters - `DELETE /v1/convai/agent-testing/{test_id}` - [Delete Agent Response Test](/docs/api-reference/tests/delete) - Remove test configurations from your workspace - `POST /v1/convai/agent-testing/summaries` - [Get Agent Response Test Summaries By Ids](/docs/api-reference/tests/summaries) - Retrieve aggregated test results for multiple tests - `GET /v1/convai/agent-testing` - [List Agent Response Tests](/docs/api-reference/tests/list) - Browse all available tests in your workspace - `POST /v1/convai/agents/{agent_id}/run-tests` - [Run Tests On The Agent](/docs/api-reference/tests/run-tests) - Execute test suites against specific agents - `GET /v1/convai/test-invocations/{test_invocation_id}` - [Get Test Invocation](/docs/api-reference/tests/test-invocations/get) - Retrieve detailed test execution results - `POST /v1/convai/test-invocations/{test_invocation_id}/resubmit` - [Resubmit Tests](/docs/api-reference/tests/test-invocations/resubmit) - Re-run failed test invocations ### Pronunciation Dictionaries - `PATCH /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}` - [Update Pronunciation Dictionary](/docs/api-reference/pronunciation-dictionaries/update) - Update existing pronunciation dictionaries with new rules or modifications # August 20, 2025 ### Eleven v3 API Eleven v3 is now available via the API. To start using it, simply specify the model ID `eleven_v3` when making [Text to Speech requests](/docs/api-reference/text-to-speech/convert). Additionally the [Text to Dialogue](/docs/cookbooks/text-to-dialogue) API endpoint is now available to all. ### Music Generation API The Eleven Music API is now freely available to all paid users. Visit the [quickstart](/docs/cookbooks/music/quickstart) to lean how to integrate. The API section below highlights the new endpoints that have been released. ### Global TTS API preview ElevenLabs is launching inference servers in additional geographical regions to reduce latency for clients outside of the US. Initial request processing will be available in the Netherlands and in Singapore in addition to the US. To learn how to get started [head to the docs](/docs/best-practices/latency-optimization#global-tts-api-preview). ### API ## New Endpoints - Added 4 new endpoints: - [Compose music](/docs/api-reference/music/compose) - Create music from text prompts - [Create composition plan](/docs/api-reference/music/create-composition-plan) - Optimize music generation parameters before processing - [Compose music with details](/docs/api-reference/music/compose-detailed) - Advanced music generation with detailed parameters - [Stream music](/docs/api-reference/music/stream) - Real-time streaming music generation ## Updated Endpoints ### Text to Speech - Updated Text to Speech endpoints with improved parameter handling: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Enhanced voice settings and text input parameter handling - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Improved streaming parameter management - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Better alignment parameter handling ### Voice Management - Updated Voice endpoints with enhanced parameter support: - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Improved preview generation parameters - [Create voice from preview](/docs/api-reference/text-to-voice/create) - Enhanced voice creation options - [Get voice](/docs/api-reference/voices/get) - Updated voice parameter responses - [List voices](/docs/api-reference/voices/search) - Improved voice listing parameters ### Speech to Text - Updated Speech to Text endpoint: - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Enhanced transcription parameter handling ### Usage and Analytics - Updated Usage endpoints: - [Get character stats](/docs/api-reference/usage/get) - Added aggregation bucket size parameter and improved breakdown type options ### Workspace Management - Updated Workspace endpoints: - [Get workspace resource](/docs/api-reference/workspace/get-resource) - Enhanced resource type parameter handling - [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource) - Updated sharing parameter structure - [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource) - Updated unsharing parameter structure # August 11, 2025 ### Music **Eleven Music**: Officially released new music generation model that creates studio-grade music with natural language prompts in any style. See the [capabilities page](/docs/capabilities/music) and [prompting guide](/docs/best-practices/prompting/eleven-music) for more information. ### SDKs v2.9.0 of the TypesScript SDK released - Includes better typing support for Speech to Text requests in webhook mode - Includes new enums for ChatGPT 5 v2.9.2 of the Python SDK released - Includes new enums for ChatGPT 5 ### Agents Platform **Agent response correction**: Updated WebSocket event schema and handling for improved agent response correction functionality. ### API ### User Account Changes - Updated user account endpoint: - [Get user subscription info](/docs/api-reference/user/get) - Deprecated `convai_chars_per_minute` and `convai_asr_chars_per_minute` fields in the response schema. These fields will now always return `None`. ### Parameter Removals - Updated conversation token endpoint: - [Get conversation token](/docs/api-reference/conversations/get-webrtc-token) - Removed `source` and `version` query parameters. These were internal parameters not meant for public use and their removal does not affect functionality. # August 4, 2025 ### Agents Platform - **Conversation token generation**: Added new route to generate Conversation Tokens for WebRTC connections. [Learn more](/docs/api-reference/conversations/get-webrtc-token) - **Expandable widget options**: Our embeddable [widget](/docs/agents-platform/customization/widget) can now be customized to start in the expanded state and disable collapsing altogether. - **Simplified operation IDs**: We simplified the OpenAPI operator IDs for Agents Platform endpoints to improve developer experience. ### Workspaces - **Simplified operation IDs**: We simplified the operation IDs for our workspace endpoints to improve API usability. ### SDK Releases - **Python SDK v2.8.2**: Released latest version with improvements and bug fixes. [View release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.8.2) ### NPM Packages - **@elevenlabs/react-native@0.1.2**: Enhanced React Native support - **@elevenlabs/client@0.4.4**: Client library improvements - **@elevenlabs/react@0.4.5**: React component updates ### API ## New Endpoints ### Agents Platform - [Get conversation token](/docs/api-reference/conversations/get-webrtc-token) - Generate authentication token for WebRTC connections ## Updated Endpoints ### Voice Management - [List voices](/docs/api-reference/voices/search) - Added `voice_ids` query parameter for filtering specific voices ### Agents Platform Core - [List conversations](/docs/api-reference/conversations/list) - Added `summary_mode` parameter for conversation summaries ### Operation ID Improvements - **Agents Platform endpoints**: Simplified operation IDs for better developer experience while maintaining full backward compatibility - **Workspace endpoints**: Streamlined operation IDs across all workspace-related endpoints to improve API usability # July 28, 2025 ### Workspaces - **Service account API key management**: Added comprehensive API endpoints for managing service account API keys, including creation, retrieval, updating, and deletion capabilities. See [Service Accounts documentation](/docs/product-guides/administration/workspaces/service-accounts). ### Agents Platform - **Post-call webhook migration**: The post call webhook format is being migrated so that webhook handlers can be auto generated in the SDKs. This is not a breaking change, and no further action is required if your current handler accepts additional fields. Please see more information [here](docs/conversational-ai/workflows/post-call-webhooks#migration-notice-enhanced-webhook-format). - **Agent transfer improvements**: Fixed system variable `system_agent_id` to properly update after agent-to-agent transfers, ensuring accurate conversation context tracking. Added new `system_current_agent_id` variable for tracking current active agent. Learn more about [dynamic variables](/docs/agents-platform/customization/personalization/dynamic-variables#system-dynamic-variables). - **Enhanced public agent page**: Added text input functionality and dynamic variable support to the public talk-to-agent page. You can now pass dynamic variables via URL parameters (e.g., `?var_username=value`) and use text input during voice conversations. See [dynamic variables guide](/docs/agents-platform/customization/personalization/dynamic-variables#public-talk-to-page-integration). - **Voicemail detection**: Added voicemail detection as a built-in tool for ElevenLabs agents to improve call handling. Learn about [voicemail detection](/docs/agents-platform/customization/tools/system-tools/voicemail-detection). - **Conversation filtering**: Added `user_id` query parameter to [conversation list endpoint](/docs/agents-platform/api-reference/conversations/list#request.query.user_id.user_id) for filtering conversations by initiating user. ### Speech to Text - **Multi-channel transcription**: Added `use_multi_channel` parameter to transcription endpoint for processing audio files with multiple speakers on separate channels. Supports up to 5 channels with per-channel transcription results. See [multichannel guide](/docs/cookbooks/speech-to-text/multichannel-transcription). ### Studio - **Caption support**: Added caption functionality to Studio projects with new `captions_enabled` and `caption_style` properties for both podcasts and general projects. Learn more about [Studio](/docs/product-guides/products/studio). ## SDKs - **[JavaScript SDK v2.7.0](https://github.com/elevenlabs/elevenlabs-js)**: Released with latest API support and improvements - **[Python SDK v2.8.1](https://github.com/elevenlabs/elevenlabs-python)**: Released with latest API support and improvements - **[@elevenlabs/client v0.4.1](https://github.com/elevenlabs/packages/tree/main/packages/client)**: Updated client library with latest features, including WebRTC support - **[@elevenlabs/react v0.4.1](https://github.com/elevenlabs/packages/tree/main/packages/react)**: Enhanced React components with latest features, including WebRTC support - **[@elevenlabs/react-native v0.1.1](https://github.com/elevenlabs/packages/tree/main/packages/react-native)**: New React Native package for mobile integration with ElevenLabs Agents, based on WebRTC - **[@elevenlabs/convai-widget-embed v0.1.0](https://github.com/elevenlabs/packages/tree/main/packages/convai-widget-embed)**: New package for embedding Agents Platform widgets into web applications - **[Swift SDK v2.0.3](https://github.com/elevenlabs/elevenlabs-swift-sdk/releases/tag/v2.0.3)**: Released with WebRTC support for real-time Agents Platform integration on Apple platforms ## API Schema Updates ### New Endpoints - **Service Account Management**: Added 5 new endpoints for service account API key management: - `GET /v1/service-accounts/{service_account_user_id}/api-keys` - Retrieve service account API keys - `POST /v1/service-accounts/{service_account_user_id}/api-keys` - Create service account API key - `DELETE /v1/service-accounts/{service_account_user_id}/api-keys/{api_key_id}` - Delete service account API key - `PATCH /v1/service-accounts/{service_account_user_id}/api-keys/{api_key_id}` - Update service account API key - `GET /v1/service-accounts` - Get workspace service accounts ### Removed Endpoints - **Legacy Project Endpoints**: Removed 22 deprecated project management endpoints as part of Studio API consolidation: - All `/v1/projects/*` endpoints (replaced by `/v1/studio/projects/*`) - Legacy Text to Voice endpoints (`/v1/text-to-voice/create-voice-from-preview`, `/v1/text-to-voice/remixing-sessions/*`) - Legacy ConvAI knowledge base endpoints ### Updated Endpoints #### Speech to Text - **Multi-channel support**: Updated `/v1/speech-to-text` endpoint: - Added `use_multi_channel` parameter for processing multi-speaker audio files - Modified response structure to include optional `language_code`, `language_probability`, `text`, and `words` properties #### Agents Platform - **Enhanced agent configuration**: Updated agent creation and management endpoints: - Added voicemail detection to built-in tools - Improved RAG configuration with `max_retrieved_rag_chunks_count` parameter - Enhanced conversation token endpoint with `source` and `version` parameters - Added `user_id` filtering to conversations list endpoint #### Studio Projects - **Caption support**: Updated Studio project endpoints to include: - `captions_enabled` property for enabling/disabling captions - `caption_style` property for global caption styling configuration #### Text to Voice - **Improved voice generation**: Enhanced voice creation endpoints with: - `loudness` control (-1 to 1 range, 0 corresponds to -24 LUFS) - `quality` parameter for balancing output quality vs variety - `guidance_scale` parameter for controlling AI creativity vs prompt adherence # July 22, 2025 ### Agents Platform - **Agent workspace overrides**: Enhanced agent configuration with workspace-level overrides for better enterprise management and customization. - **Agent API improvements**: Updated agent creation and modification endpoints with enhanced configuration options, though these changes may break backward compatibility. ### Dubbing - **Dubbing endpoint access**: Added new endpoint to list all available dubs. ### API ## New Endpoints - Added 1 new endpoints: - [List dubs you have access to](/docs/api-reference/dubbing/list) - `GET /v1/dubbing` ## Updated Endpoints ### Text to Speech - Updated Text to Speech endpoints with backward compatible changes: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Enhanced response schema - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Improved timestamp handling - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Enhanced streaming response ### Voice Management - Updated Voice endpoints with backward compatible improvements: - [Get voices](/docs/api-reference/voices/get-all) - Enhanced voice information schema - [Get voice](/docs/api-reference/voices/get) - Improved voice details response - [Get voice settings](/docs/api-reference/voices/get-settings) - Enhanced settings schema ### Voice Creation - Updated Voice Creation endpoints: - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Enhanced preview creation - [Create voice from preview](/docs/api-reference/text-to-voice/create) - Improved voice generation - [Create voice](/docs/api-reference/text-to-voice/create) - Enhanced voice creation response ### Dubbing - Updated Dubbing endpoints with backward compatible changes: - [Dub a video or audio file](/docs/api-reference/dubbing/create) - Enhanced dubbing request schema - [Get dubbing project](/docs/api-reference/dubbing/get) - Improved project response ### Workspace Management - **Breaking Change**: Updated Workspace endpoints: - [Get workspace resource](/docs/api-reference/workspace/get-resource) - Modified `resource_type` query parameter handling and response schema - [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource) - Enhanced sharing configuration - [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource) - Improved unsharing workflow ### Speech to Text - Updated Speech to Text endpoint: - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Enhanced transcription request and response schemas ### Agents Platform Updated Agents Platform endpoints with enhanced changes: - [Create agent](/docs/api-reference/agents/create) - Modified agent creation schema with workspace overrides - [Get agent](/docs/api-reference/agents/get) - Enhanced agent response with new configuration options - [Update agent](/docs/api-reference/agents/update) - Improved agent update capabilities - [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Enhanced conversation simulation - [Stream conversation simulation](/docs/api-reference/agents/simulate-conversation-stream) - Improved streaming simulation ### Other Updates - [Get conversation](/docs/api-reference/conversations/get-conversation) - Enhanced conversation details - [Get Agents Platform settings](/docs/api-reference/workspace/get) - Improved settings response - [Update Agents Platform settings](/docs/api-reference/workspace/update) - Enhanced settings modification # July 14, 2025 ### Agents Platform - **Azure OpenAI custom LLM support**: Added support for Azure-hosted OpenAI models in custom LLM configurations. When using an Azure endpoint, a new required field for [API version](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.custom_llm.api_version) is now available in the UI. - **Genesys output variables**: Added support for output variables when using [Genesys integrations](/docs/agents-platform/phone-numbers/c-caa-s-integrations/genesys), enabling better call analytics and data collection. - **Gemini 2.5 Preview Models Deprecation**: [Models](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) `gemini-2.5-flash-preview-05-20` and `gemini-2.5-flash-preview-04-17` have been deprecated in Agents Platform as they are being deprecated on 15th July by Google. All agents using these models will automatically be transferred to `gemini-2.5-flash` the next time they are used. No action is required. - **WebRTC rollout**: Began progressive rollout of WebRTC capabilities for improved connection stability and performance. WebRTC mode can be selected in the React SDK and is used in 11.ai. - **Keypad touch tone**: Fixed an issue affecting playing keypad touch tones on Twilio. See [keypad touch tone documentation](/docs/agents-platform/customization/tools/system-tools/play-keypad-touch-tone). ### Voices - **Language collection navigation**: Added quick navigation from language preview collections to view all available voices in that language, making it easier to explore voice options by language. ### Text to Voice - **Preview streaming**: Added new streaming endpoint for Text to Voice previews, allowing real-time streaming of generated voice previews via `/v1/text-to-voice/{generated_voice_id}/stream`. - **Enhanced voice design**: Added [`stream_previews`](/docs/api-reference/text-to-voice/design#request.body.stream_previews) option to voice design endpoint, enabling streaming-only preview generation for improved performance. - **Improved parameter controls**: Enhanced [`loudness`](/docs/api-reference/text-to-voice/design#request.body.loudness), quality, and guidance scale parameters with better control options for more precise voice generation. ### Studio - **Podcast customization**: Added support for [intro](/docs/api-reference/studio/create-podcast#request.body.intro) and [outro](/docs/api-reference/studio/create-podcast#request.body.outro) text in podcast creation, along with custom instructions prompts for better style and tone control. ### SDKs - **[JavaScript SDK v2.6.0](https://github.com/elevenlabs/elevenlabs-js)**: Released with latest API support and improvements - **[Python SDK v2.7.1](https://github.com/elevenlabs/elevenlabs-python)**: Released with bug fixes and enhancements - **[@elevenlabs/client v0.3.0](https://github.com/elevenlabs/packages/tree/main/packages/client)**: Updated client library with support for User IDs in Agents Platform. - **[@elevenlabs/react v0.3.0](https://github.com/elevenlabs/packages/tree/main/packages/react)**: Add WebRTC debug support. ### API ## New Endpoints - Added 1 new endpoint: - [Stream Text to Voice Preview](/docs/api-reference/text-to-voice/stream) - Stream generated voice previews in real-time ## Updated Endpoints ### Text to Voice - [Create voice previews](/docs/api-reference/text-to-voice/create) - Enhanced `loudness`, `quality`, and `guidance_scale` parameter descriptions - [Design voice](/docs/api-reference/text-to-voice/design) - Added `stream_previews` property for streaming-only preview generation ### Studio - [Create podcast](/docs/api-reference/studio/create-podcast) - Added `intro`, `outro`, and `instructions_prompt` properties ### Agents Platform - [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Enhanced simulation configuration with improved parameter descriptions - [Stream simulate conversation](/docs/api-reference/agents/simulate-conversation-stream) - Enhanced simulation configuration with improved parameter descriptions - [Get Agents Platform settings](/docs/api-reference/workspace/get) - Updated RAG retention period configuration - [Update Agents Platform settings](/docs/api-reference/workspace/update) - Updated RAG retention period configuration - [Retry batch calling](/docs/api-reference/batch-calling/retry) - Added batch retry functionality # July 7, 2025 ### Agents Platform - **HIPAA Compliance**: [Gemini 2.5 Flash is now available for HIPAA customers](/docs/agents-platform/legal/hipaa), providing enhanced AI capabilities while maintaining strict healthcare compliance standards. - **Post-call Audio**: Added support for returning call audio in [post-call webhooks](/docs/agents-platform/workflows/post-call-webhooks), enabling comprehensive conversation analysis and quality assurance workflows. - **Enhanced Widget**: Added additional [text customization options](/docs/agents-platform/customization/widget) including start chat button text, chatting status text, and input placeholders for text-only and new conversations. - **Agent Transfers**: Improved [agent transfer capabilities](/docs/agents-platform/customization/tools/system-tools/agent-transfer) with transfer delay configuration, custom transfer messages, and control over transferred agent first message behavior. - **SIP Trunk Enhancements**: Added support for separate inbound and outbound [SIP trunk configurations](/docs/agents-platform/phone-numbers/sip-trunking) with enhanced access control and transfer options. ### Dubbing - **API Schema Update**: Updated our API documentation to explicitly require the `target_language` parameter for [dubbing projects](/docs/capabilities/dubbing). This parameter has always been required - we're just making it clearer in our docs. No code changes needed. - **Duration Validation**: Added validation to ensure calculated duration makes sense, preventing zero-credit charges for invalid audio uploads. ### Speech to Text - **Deterministic Sampling**: Added `seed` parameter support for deterministic sampling, enabling reproducible [speech-to-text results](/docs/capabilities/speech-to-text). ### Forced Alignment - **Confidence Scoring**: Added confidence scoring with `loss` field for words and overall transcript accuracy assessment using [forced alignment](/docs/capabilities/forced-alignment). ### Usage Analytics - **Workspace Breakdown**: Added reporting workspace ID breakdown for character usage statistics, providing detailed usage insights across [workspaces](/docs/product-guides/administration/workspaces/overview). ### SDKs - **React Agents Platform SDK**: Released [v0.2.0](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs%2Freact%400.2.0) with support for Indian data residency and WebRTC mode for Agents Platform. - **Python SDK**: Released [v2.6.1](https://github.com/elevenlabs/elevenlabs-python/releases) with enhanced Agents Platform capabilities and bug fixes. - **JavaScript SDK**: Released [v2.5.0](https://github.com/elevenlabs/elevenlabs-js/releases) with improved Agents Platform SDK support and new features. ### API ## Deprecations - `POST /v1/convai/phone-numbers/create` has been deprecated in favor of [POST /v1/convai/phone-numbers](/docs/api-reference/phone-numbers/create). Please note that migrating to the new endpoint requires a few adjustments: - Replace `provider_config` field with `inbound_trunk` and `outbound_trunk` for SIP trunk configurations - Update response parsing to handle the new trunk configuration structure ### Schema Removals - Removed `SIPTrunkConfigResponseModel`, `SIPTrunkCredentials`, `TransferToNumberToolConfig` - Removed `incomplete_expired` and `canceled` subscription statuses ## New Features ### Enhanced SIP Trunk Support - [SIP trunk configuration](/docs/agents-platform/phone-numbers/sip-trunking) now uses separate inbound and outbound trunk configs instead of single configuration - Deprecated `provider_config` field in SIP trunk response from [the new endpoint](/docs/api-reference/phone-numbers/create) (replaced with `inbound_trunk` and `outbound_trunk`) - Inbound trunk access control with allowed addresses and phone numbers - SIP URI transfer destinations alongside phone number transfers - Transfer to number improvements (conference or SIP refer) ### Agent Transfers - [Transfer delay configuration](/docs/agents-platform/customization/tools/system-tools/agent-transfer) with `delay_ms` - Custom transfer messages - Control over transferred agent first message behavior ### Conversation Enhancements - ElevenLabs Assistant integration tracking - User ID tracking for conversation participants and initiators - Audio data in [post-call webhooks](/docs/agents-platform/workflows/post-call-webhooks) (configurable) - [MCP (Model Context Protocol)](/docs/agents-platform/customization/mcp) tool call details in conversation history ### Widget Improvements - Additional [text customization options](/docs/agents-platform/customization/widget): - Start chat button text - Chatting status text - Input placeholders for text-only and new conversations ### API Improvements #### Speech to Text - Added deterministic sampling with `seed` parameter in [Convert speech to text](/docs/api-reference/speech-to-text/convert) #### Forced Alignment - Added confidence scoring with `loss` field for words and overall transcript in [Forced alignment](/docs/api-reference/forced-alignment/create) #### Usage Analytics - Added reporting workspace ID breakdown for character stats in [Get characters usage metrics](/docs/api-reference/usage/get) #### Tool Configuration - [Client tool](/docs/agents-platform/customization/tools/client-tools) response timeout increased from 30 to 120 seconds #### Workspace Resources - Added agent response tests resource type ## Deprecations - Phone number `provider_config` field (use `inbound_trunk`/`outbound_trunk` instead) - `phone_number` field in transfer configurations (use `transfer_destination` instead) # June 30, 2025 ### Text to Voice - **Voice Design**: Launched new [Text to Voice Design](/docs/api-reference/text-to-voice/design#request.body.model_id) with Eleven v3 for creating custom voices from text descriptions. ### Speech to Text - **Enhanced Diarization**: Added `diarization_threshold` parameter to the [Speech to Text](/docs/api-reference/speech-to-text/convert#request.body.diarization_threshold.diarization_threshold) endpoint. Fine-tune the balance between speaker accuracy and total speaker count by adjusting the threshold between 0.1 and 0.4. ### Professional Voice Cloning - **Background Noise Removal**: Added `remove_background_noise` to clean up voice samples using audio isolation models for [better quality training data](/docs/api-reference/voices/pvc/samples/create#request.body.remove_background_noise.remove_background_noise). ### Studio - **Video Support Detection**: Added `has_video` property to chapter responses to indicate whether [chapters contain video content](/docs/api-reference/studio/get-chapters#response.body.chapters.has_video). ### Workspaces - **Service Account Groups**: Service accounts can now be added to workspace groups for better permission management and access control. - **Workspace Authentication**: Added support for workspace authentication connections, enabling secure webhook tool integrations with external services. ### SDKs - **Python SDK**: Released [v2.6.0](https://github.com/elevenlabs/elevenlabs-python/releases) with latest API support and bug fixes. - **JavaScript SDK**: Released [v2.5.0](https://github.com/elevenlabs/elevenlabs-js/releases) with latest API support and bug fixes. - **React Agents Platform SDK**: Added WebRTC support in [0.2.0](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs%2Freact%400.2.0) ### API ## New Endpoints - Added 2 new endpoints: - [Design a Voice](/docs/api-reference/text-to-voice/design) - Create voice previews from text descriptions - [Create Voice From Preview](/docs/api-reference/text-to-voice/create) - Convert voice previews to permanent voices ## Updated Endpoints ### Speech to Text - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `diarization_threshold` parameter for fine-tuning speaker separation ### Voice Management - [Get voice sample audio](/docs/api-reference/voices/pvc/samples/create#request.body.remove_background_noise.remove_background_noise) - Added `remove_background_noise` query parameter and moved from request body to query parameters # June 23, 2025 ### Tools migration - **Agents Platform tools migration**: The way tools in Agents Platform are handled is being migrated, please see the guide here to understand [what's changing and how to migrate](/docs/agents-platform/customization/tools/agent-tools-deprecation) ### Text to Speech - **Audio tags automatic removal**: Audio tags are now automatically removed when switching from V3 to V2 models, ensuring optimal compatibility and performance. ### Agents Platform - **Tools management UI**: Added a new comprehensive [tools management interface](/app/agents/tools) for creating, configuring, and managing tools across all agents in your workspace. - **Streamlined agent creation**: Introduced a new [agent creation flow](/app/agents/new) with improved user experience and better configuration options. - **Agent duplication**: Added the ability to [duplicate existing agents](/docs/api-reference/agents/duplicate), allowing you to quickly create variations of successful agent configurations. ### SIP Trunking - **Inbound media encryption**: Added support for configurable [inbound media encryption settings](/docs/agents-platform/phone-numbers/sip-trunking#configure-transport-and-encryption) for SIP trunk phone numbers, enhancing security options. ### Voices - **Famous voice category**: Added a new "famous" voice category to the voice library, expanding the available voice options for users. ### Dubbing - **CSV frame rate control**: Added `csv_fps` parameter to control frame rate when parsing CSV files for dubbing projects, providing more precise timing control. ## SDKs - **ElevenLabs JavaScript SDK v2.4.0**: Released with new Agents Platform SDK support for Node.js. [View release notes](https://github.com/elevenlabs/elevenlabs-js/releases) - **ElevenLabs Python SDK v2.5.0**: Updated with enhanced Agents Platform capabilities. [View release notes](https://github.com/elevenlabs/elevenlabs-python/releases) ### API ## New Endpoints ### Agents Platform - [Duplicate agent](/docs/api-reference/agents/duplicate) - Create a new agent by duplicating an existing one - [Create tool](/docs/api-reference/tools/create) - Add a new tool to the available tools in the workspace - [List tools](/docs/api-reference/tools/list) - Retrieve all tools available in the workspace - [Get tool](/docs/api-reference/tools/get) - Retrieve a specific tool configuration - [Update tool](/docs/api-reference/tools/update) - Update an existing tool configuration - [Delete tool](/docs/api-reference/tools/delete) - Remove a tool from the workspace - [Get tool dependent agents](/docs/api-reference/tools/get-dependent-agents) - List all agents that depend on a specific tool ## Updated Endpoints ### Agents Platform - **Agent configuration**: - Added `built_in_tools` configuration for system tools management - Deprecated inline `tools` configuration in favor of `tool_ids` for better tool management - **Tool system**: - Refactored tool configuration structure to use centralized tool management ### Dubbing - **CSV processing**: - [Create dubbing project](/docs/api-reference/dubbing/create) - Added `csv_fps` parameter for custom frame rate control ### SIP Trunking - **Phone number creation**: - [Create SIP trunk phone number](/docs/api-reference/phone-numbers) - Added `inbound_media_encryption` parameter for security configuration ### Voice Library - **Voice categories**: - Updated voice response models to include "famous" as a new voice category option - Enhanced voice search and filtering capabilities # June 17, 2025 ### Agents Platform - **Dynamic variables in simulated conversations**: Added support for [dynamic variable population in simulated conversations](/docs/api-reference/agents/simulate-conversation#request.body.simulation_specification.simulated_user_config.dynamic_variables), enabling more flexible and context-aware conversation testing scenarios. - **MCP server integration**: Introduced comprehensive support for [Model Context Protocol (MCP) servers](/docs/agents-platform/customization/mcp), allowing agents to connect to external tools and services through standardized protocols with configurable approval policies. - **Burst pricing for extra concurrency**: Added [bursting capability](/docs/agents-platform/guides/burst-pricing) for workspace call limits, automatically allowing up to 3x the configured concurrency limit during peak usage for overflow capacity. ### Studio - **JSON content initialization**: Added support for initializing Studio projects with structured JSON content through the `from_content_json` parameter, enabling programmatic project creation with predefined chapters, blocks, and voice configurations. ### Workspaces - **Webhook management**: Introduced workspace-level webhook management capabilities, allowing administrators to view, configure, and monitor webhook integrations across the entire workspace with detailed usage tracking and failure diagnostics. ### API ## New Endpoints ### Agents Platform - MCP Servers - [Create MCP server](/docs/api-reference/mcp/create) - Create a new MCP server configuration in the workspace - [List MCP servers](/docs/api-reference/mcp/list) - Retrieve all MCP server configurations available in the workspace - [Get MCP server](/docs/api-reference/mcp/get) - Retrieve a specific MCP server configuration from the workspace - [Update MCP server approval policy](/docs/api-reference/mcp/approval-policies/update) - Update the approval policy configuration for an MCP server - [Create MCP server tool approval](/docs/api-reference/mcp/approval-policies/create) - Add approval for a specific MCP tool when using per-tool approval mode - [Delete MCP server tool approval](/docs/api-reference/mcp/approval-policies/delete) - Remove approval for a specific MCP tool when using per-tool approval mode ### Workspace - [Get workspace webhooks](/docs/api-reference/webhooks/list) - Retrieve all webhook configurations for the workspace with optional usage information ## Updated Endpoints ### Agents Platform - **Agent simulation**: - [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Added `dynamic_variables` parameter for populating conversation context with runtime values - [Simulate conversation stream](/docs/api-reference/agents/simulate-conversation-stream) - Added `dynamic_variables` parameter for streaming conversation simulations - **Agent configuration**: - [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.call_limits) - Added `bursting_enabled` parameter to control burst pricing for call limits - **WebSocket events**: - Enhanced `ClientEvent` enum to include `mcp_connection_status` for real-time MCP server monitoring - **Conversation charging**: - Added `is_burst` indicator to conversation metadata for tracking burst pricing usage ### Studio - [Create Studio project](/docs/api-reference/studio/add-project#request.body.from_content_json.from_content_json) - Added `from_content_json` parameter for JSON-based project setup ### User Management - **User profile**: - [Get user](/docs/api-reference/user/get) - Deprecated `can_use_delayed_payment_methods` field in user response model ### Subscription Management - **Subscription status**: - Removed `canceled` and `unpaid` from available subscription status types, streamlining subscription state management # June 8, 2025 ### Text to Speech - **Eleven v3 (alpha)**: Released Eleven v3 (alpha), our most expressive Text to Speech model, as a research preview. ### Agents Platform - **Custom voice settings in multi-voice**: Added support for configuring individual [voice settings per supported voice](/docs/agents-platform/customization/voice/multi-voice-support) in multi-voice agents, allowing fine-tuned control over stability, speed, similarity boost, and streaming latency for each voice. - **Silent transfer to human in Twilio**: Added backend configuration support for silent (cold) [transfer to human](/docs/agents-platform/customization/tools/system-tools/transfer-to-human) in the Twilio native integration, enabling seamless handoff without announcing the transfer to callers. - **Batch calling retry and cancel**: Added support for retrying outbound calls to phone numbers that did not respond during a [batch call](/docs/agents-platform/phone-numbers/batch-calls), along with the ability to cancel ongoing batch operations for better campaign management. - **LLM pinning**: Added support for [versioned LLM models with explicit checkpoint identifiers](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) - **Custom LLM headers**: Added support for passing [custom headers to custom LLMs](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.custom_llm.request_headers) - **Fixed issue in non-latin languages**: Fixed an issue causing some conversations in non latin alphabet languages to fail. ### SDKs - **Python SDK v2.3.0**: Released [Python SDK v2.3.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.3.0) - **JavaScript SDK v2.2.0**: Released [JavaScript SDK v2.2.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.2.0) ### API ## New Endpoints ### Agents Platform - **Batch Calling**: - [Cancel batch call](/docs/api-reference/batch-calling/cancel) - Cancel a running batch call and set all recipients to cancelled status - [Retry batch call](/docs/api-reference/batch-calling/retry) - Retry a batch call by setting completed recipients back to pending status - **Knowledge Base RAG**: - [Get document RAG indexes](/docs/api-reference/knowledge-base/get-document-rag-indexes) - Get information about all RAG indexes of a knowledge base document - [Delete document RAG index](/docs/api-reference/knowledge-base/delete-document-rag-index) - Delete a specific RAG index for a knowledge base document - [RAG index overview](/docs/api-reference/knowledge-base/rag-index-overview) - Get total size and information of RAG indexes used by knowledge base documents ## Updated Endpoints ### Agents Platform - **Supported Voices**: - [Agent configuration](/docs/api-reference/agents/update#request.body.tts.supported_voices) - Added `optimize_streaming_latency`, `stability`, `speed`, and `similarity_boost` parameters for per-voice TTS customization - **Transfer to Human**: - [Agent configuration](/docs/api-reference/agents/update#request.body.system_tools.transfer_to_number) - Added `enable_client_message` parameter to control whether a message is played to the client during transfer - **Knowledge Base**: - Knowledge base documents now use `supported_usages` instead of `prompt_injectable` for better usage mode control - RAG index creation now returns enhanced response model with usage information - **Custom LLM**: - [Agent configuration](/docs/api-reference/agents/update#request.body.llm.custom_llm) - Added `request_headers` parameter for custom header configuration - **Widget Configuration**: - [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.widget_config) - Added comprehensive `styles` configuration for widget appearance customization - **LLM**: - Added support for [versioned LLM models](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) with explicit version identifiers # June 1, 2025 ### Agents Platform - **Multi-voice support for agents**: Enable ElevenLabs agents to [dynamically switch between different voices](docs/conversational-ai/customization/voice/multi-voice-support) during conversations for multi-character storytelling, language tutoring, and role-playing scenarios. - **Claude Sonnet 4 support**: Added [Claude Sonnet 4 as a new LLM option](/docs/agents-platform/customization/llm#anthropic) for conversational agents, providing enhanced reasoning capabilities and improved performance. - **Genesys Cloud integration**: Introduced AudioHook Protocol integration for seamless connection with [Genesys Cloud contact center platform](/docs/agents-platform/phone-numbers/c-caa-s-integrations/genesys). - **Force delete knowledge base documents**: Added [`force` parameter](/docs/api-reference/knowledge-base/delete#request.query.force.force) to knowledge base document deletion, allowing removal of documents even when used by agents. - **Multimodal widget**: Added text input and text-only mode defaults for better user experience with [improved widget configuration](/docs/agents-platform/customization/widget). ### API ## Updated Endpoints ### Speech to Text - [Create transcript](/docs/api-reference/speech-to-text/convert) - Added `webhook` parameter for asynchronous processing with webhook delivery ### Agents Platform - **Knowledge Base**: - [Delete knowledge base document](/docs/api-reference/knowledge-base/delete) - Added `force` query parameter to delete documents regardless of agent dependencies - **Widget**: - [Widget configuration](/docs/api-reference/widget/get#response.body.widget_config.supports_text_only) - Added text input and text-only mode support for multi-modality # May 26, 2025 ### Forced Aligment - **Forced alignment improvements**: Fixed a rare failure case in forced alignment processing to improve reliability. ### Voices - **Live moderated voices filter**: Added `include_live_moderated` query parameter to the shared voices endpoint, allowing you to include or exclude voices that are live moderated. ### Agents Platform - **Secret dynamic variables**: Added support for specifying dynamic variables as secrets with the `secret__` prefix. Secret dynamic variables can only be used in webhook tool headers and are never sent to an LLM, enhancing security for sensitive data. [Learn more](/docs/agents-platform/customization/personalization/dynamic-variables#secret-dynamic-variables). - **Skip turn system tool**: Introduced a new system tool called **skip_turn**. When enabled, the agent will skip its turn if the user explicitly indicates they need a moment to think or perform an action (e.g., "just a sec", "give me a minute"). This prevents turn timeout from being triggered during intentional user pauses. See the [skip turn tool docs](/docs/agents-platform/customization/tools/system-tools/skip-turn) for more information. - **Text input support**: Added text input support in websocket connections via "user_message" event with text field. Also added "user_activity" event support to indicate typing or other UI activity, improving agent turn-taking when there's interleaved text and audio input. - **RAG chunk limit**: Added ability to configure the [maximum number of chunks](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.rag.max_retrieved_rag_chunks_count) collected during RAG retrieval, giving users more control over context window usage and costs. - **Enhanced widget configuration**: Expanded widget customization options to include [text input and text only mode](/docs/api-reference/widget/get#response.body.widget_config.text_only). - **LLM usage calculator**: Introduced tools to calculate expected LLM token usage and costs for agents, helping with cost estimation and planning. ### Audio Native - **Accessibility improvements**: Enhanced accessibility for the AudioNative player with multiple improvements: - Added aria-labels for all buttons - Enabled keyboard navigation for all interactive elements - Made progress bar handle focusable and keyboard-accessible - Improved focus indicator visibility for better screen reader compatibility ### API ## New Endpoints - Added 3 new endpoints: - [Get Agent Knowledge Base Size](/docs/agents-platform/api-reference/knowledge-base/size) - Returns the number of pages in the agent's knowledge base. - [Calculate Agent LLM Usage](/docs/agents-platform/api-reference/llm-usage/calculate) - Calculates expected number of LLM tokens needed for the specified agent. - [Calculate LLM Usage](/docs/agents-platform/api-reference/llm-usage/calculate) - Returns a list of LLM models and the expected cost for using them based on the provided values. ## Updated Endpoints ### Voices - [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_live_moderated` query parameter to `GET /v1/shared-voices` to filter voices by live moderation status. ### Agents Platform - **Agent Configuration**: - Enhanced system tools with new `skip_turn` tool configuration - Improved RAG configuration with `max_retrieved_rag_chunks_count` parameter - **Widget Configuration**: - Added support for text-only mode - **Batch Calling**: - Batch call responses now include `phone_provider` field with default value "twilio" ### Text to Speech - **Voice Settings**: - Added `quality` parameter to voice settings for controlling audio generation quality - Model response schema updated to include `can_use_quality` field # May 19, 2025 ### SDKs - **SDKs V2**: Released new v2 SDKs for both [Python](https://github.com/elevenlabs/elevenlabs-python) and [JavaScript](https://github.com/elevenlabs/elevenlabs-js) ### Speech to Text - **Speech to text logprobs**: The Speech to Text response now includes a `logprob` field for word prediction confidence. ### Billing - **Improved API error messages**: Enhanced API error messages for subscriptions with failed payments. This provides clearer information if a failed payment has caused a user to reach their quota threshold sooner than expected. ### Agents Platform - **Batch calls**: Released new batch calling functionality, which allows you to [automate groups of outbound calls](/docs/agents-platform/phone-numbers/batch-calls). - **Increased evaluation criteria limit**: The maximum number of evaluation criteria for agent performance evaluation has been increased from 5 to 10. - **Human-readable IDs**: Introduced human-readable IDs for key Agents Platform entities (e.g., agents, conversations). This improves usability and makes resources easier to identify and manage through the API and UI. - **Unanswered call tracking**: 'Not Answered' outbound calls are now reliably detected and visible in the conversation history. - **LLM cost visibility in dashboard**: The Agents Platform dashboard now displays the total and per-minute average LLM costs. - **Zero retention mode (ZRM) for agents**: Allowed enabling Zero Retention Mode (ZRM) per agent. - **Dynamic variables in headers**: Added option of setting dynamic variable as a [header value for tools](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.webhook.api_schema.request_headers.Conv-AI-Dynamic-Variable) - **Customisable tool timeouts**: Shipped setting different [timeout durations per tool](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.client.response_timeout_secs). ### Workspaces - **Simplified secret updates**: Workspace secrets can now be updated more granularly using a `PATCH` request via the API, simplifying the management of individual secret values. For technical details, please see the API changes section below. ### API ## New Endpoints - Added 6 new endpoints: - [Get Signed Url](/docs/agents-platform/api-reference/conversations/get-signed-url) - Get a signed URL to start a conversation with an agent that requires authorization. - [Simulate Conversation](/docs/agents-platform/api-reference/agents/simulate-conversation) - Run a conversation between an agent and a simulated user. - [Simulate Conversation (Stream)](/docs/agents-platform/api-reference/agents/simulate-conversation-stream) - Run and stream a conversation simulation between an agent and a simulated user. - [Update Convai Workspace Secret](/docs/agents-platform/api-reference/workspace/secrets/update) - Update an existing secret for the Convai workspace. - [Submit Batch Call Request](/docs/agents-platform/api-reference/batch-calling/create) - Submit a batch call request to schedule calls for multiple recipients. - [Get All Batch Calls for Workspace](/docs/agents-platform/api-reference/batch-calling/list) - Retrieve all batch calls for the current workspace. ## Updated Endpoints ### Agents Platform - **Agents & Conversations**: - Endpoint `GET /v1/convai/conversation/get_signed_url` (snake_case path) has been deprecated. Use the new `GET /v1/convai/conversation/get-signed-url` (kebab-case path) instead. - **Phone Numbers**: - [Get Phone Number Details](/docs/agents-platform/api-reference/phone-numbers/get) - Response schema for `GET /v1/convai/phone-numbers/{phone_number_id}` updated to distinct `Twilio` and `SIPTrunk` provider details. - [Update Phone Number](/docs/agents-platform/api-reference/phone-numbers/update) - Response schema for `PATCH /v1/convai/phone-numbers/{phone_number_id}` updated similarly for `Twilio` and `SIPTrunk`. - [List Phone Numbers](/docs/agents-platform/api-reference/phone-numbers/list) - Response schema for `GET /v1/convai/phone-numbers/` list items updated for `Twilio` and `SIPTrunk` providers. ### Text To Speech - [Text to Speech Endpoints](/docs/api-reference/text-to-speech) - Default `model_id` changed from `eleven_monolingual_v1` to `eleven_multilingual_v2` for the following endpoints: - `POST /v1/text-to-speech/{voice_id}/stream` - `POST /v1/text-to-speech/{voice_id}/stream-with-timestamps` - `POST /v1/text-to-speech/{voice_id}` - `POST /v1/text-to-speech/{voice_id}/with-timestamps` ### Voices - [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_custom_rates` query parameter to `GET /v1/shared-voices`. - **Schema Updates**: - `LibraryVoiceResponseModel` and `VoiceSharingResponseModel` now include an optional `fiat_rate` field (USD per 1000 credits). # May 12, 2025 ### Billing - **Downgraded Plan Pricing Fix**: Fixed an issue where customers with downgraded subscriptions were shown their current price instead of the correct future price. ### Agents Platform - **Edit Knowledge Base Document Names**: You can now edit the names of knowledge base documents. See: [Knowledge Base](/docs/agents-platform/customization/knowledge-base) - **Conversation Simulation**: Released a [new endpoint](/docs/agents-platform/api-reference/agents/simulate-conversation) that allows you to test an agent over text ### Studio - **Export Paragraphs as Zip**: Added support for exporting separated paragraphs in a zip file. See: [Studio](/docs/product-guides/products/studio) ### SDKs - **Released new SDKs**: - [ElevenLabs Python v1.58.1](https://github.com/elevenlabs/elevenlabs-python) - [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js) ### API #### New Endpoints - [Update metadata for a speaker](/docs/api-reference/dubbing) `PATCH /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}` Amend the metadata associated with a speaker, such as their voice. Both voice cloning and using voices from the ElevenLabs library are supported. - [Search similar voices for a speaker](/docs/api-reference/dubbing) `GET /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}/similar-voices` Fetch the top 10 similar voices to a speaker, including IDs, names, descriptions, and sample audio. - [Simulate a conversation](/docs/api-reference/agents/simulate-conversation) `POST /v1/convai/agents/{agent_id}/simulate_conversation` Run a conversation between the agent and a simulated user. - [Simulate a conversation (stream)](/docs/api-reference/agents/simulate-conversation-stream) `POST /v1/convai/agents/{agent_id}/simulate_conversation/stream` Stream a simulated conversation between the agent and a simulated user. - [Handle outbound call via SIP trunk](/docs/api-reference/sip-trunk/outbound-call) `POST /v1/convai/sip-trunk/outbound-call` Initiate an outbound call using SIP trunking. #### Updated Endpoints - [List conversations](/docs/api-reference/conversations/get-conversations) `GET /v1/convai/conversations` Added `call_start_after_unix` query parameter to filter conversations by start date. - [Update knowledge base document](/docs/api-reference/knowledge-base/update-knowledge-base-document) `PATCH /v1/convai/knowledge-base/{documentation_id}` Now supports updating the name of a document. - [Text to Speech endpoints](/docs/api-reference/text-to-speech) The default model for all TTS endpoints is now `eleven_multilingual_v2` (was `eleven_monolingual_v1`). #### Removed Endpoints - None. # May 5, 2025 ### Dubbing - **Disable Voice Cloning**: Added an option in the [Dubbing Studio UI](https://elevenlabs.io/app/dubbing) to disable voice cloning when uploading audio, aligning with the existing `disable_voice_cloning` API parameter. ### Billing - **Quota Exceeded Error**: Improved error messaging for exceeding character limits. Users attempting to generate audio beyond their quota within a short billing window will now receive a clearer `401 unauthorized: This request exceeds your quota limit of...` error message indicating the limit has been exceeded. ## SDKs - **Released new SDKs**: Added [ElevenLabs Python v1.58.0](https://github.com/elevenlabs/elevenlabs-python) and [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js) to fix a breaking change that had been mistakenly shipped # April 28, 2025 ### Agents Platform - **Custom Dashboard Charts**: The Agents Platform dashboard can now be extended with custom charts displaying the results of evaluation criteria over time. See the new [GET](/docs/api-reference/workspace/dashboard/get) and [PATCH](/docs/api-reference/workspace/dashboard/update) endpoints for managing dashboard settings. - **Call History Filtering**: Added the ability to filter the call history by start date using the new `call_start_before_unix` parameter in the [List Conversations](/docs/agents-platform/api-reference/conversations/list#request.query.call_start_before_unix) endpoint. [Try it here](https://elevenlabs.io/app/agents/history). - **Server Tools**: Added option of making PUT requests in [server tools](/docs/agents-platform/customization/tools/server-tools) - **Transfer to human**: Added call forwarding functionality to support forwarding to operators, see docs [here](/docs/agents-platform/customization/tools/system-tools/transfer-to-human) - **Language detection**: Fixed an issue where the [language detection system tool](/docs/agents-platform/customization/tools/system-tools/language-detection) would trigger on a user replying yes in non-English language. ### Usage Analytics - **Custom Aggregation**: Added an optional `aggregation_interval` parameter to the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint to control the interval over which to aggregate character usage (hour, day, week, month, or cumulative). - **New Metric Breakdowns**: The Usage Analytics section now supports additional metric breakdowns including `minutes_used`, `request_count`, `ttfb_avg`, and `ttfb_p95`, selectable via the new `metric` parameter in the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint. Furthermore, you can now get a breakdown and filter by `request_queue`. ### API ## New Endpoints - Added 2 new endpoints for managing Agents Platform dashboard settings: - [Get Dashboard Settings](/docs/api-reference/workspace/dashboard/get) - Retrieves custom chart configurations for the ConvAI dashboard. - [Update Dashboard Settings](/docs/api-reference/workspace/dashboard/update) - Updates custom chart configurations for the ConvAI dashboard. ## Updated Endpoints ### Audio Generation (TTS, S2S, SFX, Voice Design) - Updated endpoints to support new `output_format` option `pcm_48000`: - [Text to Speech](/docs/api-reference/text-to-speech/convert) (`POST /v1/text-to-speech/{voice_id}`) - [Text to Speech with Timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/with-timestamps`) - [Text to Speech Stream](/docs/api-reference/text-to-speech/convert-as-stream) (`POST /v1/text-to-speech/{voice_id}/stream`) - [Text to Speech Stream with Timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/stream/with-timestamps`) - [Speech to Speech](/docs/api-reference/speech-to-speech/convert) (`POST /v1/speech-to-speech/{voice_id}`) - [Speech to Speech Stream](/docs/api-reference/speech-to-speech/stream) (`POST /v1/speech-to-speech/{voice_id}/stream`) - [Sound Generation](/docs/api-reference/text-to-sound-effects/convert) (`POST /v1/sound-generation`) - [Create Voice Previews](/docs/api-reference/legacy/voices/create-previews) (`POST /v1/text-to-voice/create-previews`) ### Usage Analytics - Updated usage metrics endpoint: - [Get Usage Metrics](/docs/api-reference/usage/get) (`GET /v1/usage/character-stats`) - Added optional `aggregation_interval` and `metric` query parameters. ### Agents Platform - Updated conversation listing endpoint: - [List Conversations](/docs/agents-platform/api-reference/conversations/list#request.query.call_start_before_unix) (`GET /v1/convai/conversations`) - Added optional `call_start_before_unix` query parameter for filtering by start date. ## Schema Changes ### Agents Platform - Added detailed LLM usage and pricing information to conversation [charging and history models](/docs/agents-platform/api-reference/conversations/get#response.body.metadata.charging). - Added `tool_latency_secs` to [tool result schemas](/docs/api-reference/conversations/get-conversation#response.body.transcript.tool_results.tool_latency_secs) - Added `access_info` to [`GET /v1/convai/agents/{agent_id}`](/docs/api-reference/agents/get#response.body.access_info) # April 21, 2025 ### Professional Voice Cloning (PVC) - **PVC API**: Introduced a comprehensive suite of API endpoints for managing Professional Voice Clones (PVC). You can now programmatically create voices, add/manage/delete audio samples, retrieve audio/waveforms, manage speaker separation, handle verification, and initiate training. For a full list of new endpoints check the API changes summary below or read the PVC API reference [here](/docs/api-reference/voices/pvc/create). ### Speech to Text - **Enhanced Export Options**: Added options to include or exclude timestamps and speaker IDs when exporting Speech to Text results in segmented JSON format via the API. ### Agents Platform - **New LLM Models**: Added support for new GPT-4.1 models: `gpt-4.1`, `gpt-4.1-mini`, and `gpt-4.1-nano` [here](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) - **VAD Score**: Added a new client event which sends VAD scores to the client, see reference [here](/docs/agents-platform/customization/events/client-events#vad_score) ### API ## New Endpoints - Added 16 new endpoints: - [Create PVC Voice](/docs/api-reference/voices/pvc/create) - Creates a new PVC voice. - [Edit PVC Voice](/docs/api-reference/voices/pvc/update) - Edits PVC voice metadata. - [Add Samples To PVC Voice](/docs/api-reference/voices/pvc/samples/create) - Adds audio samples to a PVC voice. - [Update PVC Voice Sample](/docs/api-reference/voices/pvc/samples/update) - Updates a PVC voice sample (noise removal, speaker selection, trimming). - [Delete PVC Voice Sample](/docs/api-reference/voices/pvc/samples/delete) - Deletes a sample from a PVC voice. - [Retrieve Voice Sample Audio](/docs/api-reference/voices/pvc/samples/get-audio) - Retrieves audio for a PVC voice sample. - [Retrieve Voice Sample Visual Waveform](/docs/api-reference/voices/pvc/samples/get-waveform) - Retrieves the visual waveform for a PVC voice sample. - [Retrieve Speaker Separation Status](/docs/api-reference/voices/pvc/samples/get-speaker-separation-status) - Gets the status of speaker separation for a sample. - [Start Speaker Separation](/docs/api-reference/voices/pvc/samples/separate-speakers) - Initiates speaker separation for a sample. - [Retrieve Separated Speaker Audio](/docs/api-reference/voices/pvc/samples/get-separated-speaker-audio) - Retrieves audio for a specific separated speaker. - [Get PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha) - Gets the captcha for PVC voice verification. - [Verify PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha/verify) - Submits captcha verification for a PVC voice. - [Run PVC Training](/docs/api-reference/voices/pvc/train) - Starts the training process for a PVC voice. - [Request Manual Verification](/docs/api-reference/voices/pvc/verification/request) - Requests manual verification for a PVC voice. ## Updated Endpoints ### Speech to Text - Updated endpoint with changes: - [Create Forced Alignment Task](/docs/api-reference/forced-alignment/create#request.body.enabled_spooled_file) - Added `enabled_spooled_file` parameter to allow streaming large files (`POST /v1/forced-alignment`). ## Schema Changes ### Agents Platform - `GET conversation details`: Added `has_audio`, `has_user_audio`, `has_response_audio` boolean fields [here](/docs/api-reference/conversations/get-conversation#response.body.has_audio) ### Dubbing - `GET dubbing resource `: Added `status` field to each render [here](/docs/api-reference/dubbing/get-dubbing-resource#response.body.renders.status) # April 14, 2025 ### Voices - **New PVC flow**: Added new flow for Professional Voice Clone creation, try it out [here](https://elevenlabs.io/app/voice-lab?action=create&creationType=professionalVoiceClone) ### Agents Platform - **Agent-agent transfer:** Added support for agent-to-agent transfers via a new system tool, enabling more complex conversational flows. See the [Agent Transfer tool documentation](/docs/agents-platform/customization/tools/system-tools/agent-transfer) for details. - **Enhanced tool debugging:** Improved how tool execution details are displayed in the conversation history for easier debugging. - **Language detection fix:** Resolved an issue regarding the forced calling of the language detection tool. ### Dubbing - **Render endpoint:** Introduced a new endpoint to regenerate audio or video renders for specific languages within a dubbing project. This automatically handles missing transcriptions or translations. See the [Render Dub endpoint](/docs/api-reference/dubbing/render-dub). - **Increased size limit:** Raised the maximum allowed file size for dubbing projects to 1 GiB. ### API ## New Endpoints - [Added render dub endpoint](/docs/api-reference/dubbing/render-dub) - Regenerate dubs for a specific language. ## Updated Endpoints ### Pronunciation Dictionaries - Updated the response for the [`GET /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/`](/docs/api-reference/pronunciation-dictionary/get#response.body.permission_on_resource) endpoint and related components to include the `permission_on_resource` field. ### Speech to Text - Updated [Speech to Text endpoint](/docs/api-reference/speech-to-text/convert) (`POST /v1/speech-to-text`): - Added `cloud_storage_url` parameter to allow transcription directly from public S3 or GCS URLs (up to 2GB). - Made the `file` parameter optional; exactly one of `file` or `cloud_storage_url` must now be provided. ### Speech to Speech - Added optional `file_format` parameter (`pcm_s16le_16` or `other`) for lower latency with PCM input to [`POST /v1/speech-to-speech/{voice_id}`](/docs/api-reference/speech-to-speech/convert) ### Agents Platform - Updated components to support [agent-agent transfer](/docs/agents-platform/customization/tools/system-tools/agent-transfer) tool ### Voices - Updated [`GET /v1/voices/{voice_id}`](/docs/api-reference/voices/get#response.body.samples.trim_start) `samples` field to include optional `trim_start` and `trim_end` parameters. ### AudioNative - Updated [`Get /v1/audio-native/{project_id}/settings`](/docs/api-reference/audio-native/get-settings#response.body.settings.status) to include `status` field (`processing` or `ready`). # April 7, 2025 ## Speech to text - **`scribe_v1_experimental`**: Launched a new experimental preview of the [Scribe v1 model](/docs/capabilities/speech-to-text) with improvements including improved performance on audio files with multiple languages, reduced hallucinations when audio is interleaved with silence, and improved audio tags. The new model is available via the API under the model name [`scribe_v1_experimental`](/docs/api-reference/speech-to-text/convert#request.body.model_id) ### Text to speech - **A-law format support**: Added [a-law format](/docs/api-reference/text-to-speech/convert#request.query.output_format) with 8kHz sample rate to enable integration with European telephony systems. - **Fixed quota issues**: Fixed a database bug that caused some requests to be mistakenly rejected as exceeding their quota. ### Agents Platform - **Document type filtering**: Added support for filtering knowledge base documents by their [type](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.types) (file, URL, or text). - **Non-audio agents**: Added support for conversational agents that don't output audio but still send response transcripts and can use tools. Non-audio agents can be enabled by removing the audio [client event](/docs/agents-platform/customization/events/client-events). - **Improved agent templates**: Updated all agent templates with enhanced configurations and prompts. See more about how to improve system prompts [here](/docs/agents-platform/best-practices/prompting-guide). - **Fixed stuck exports**: Fixed an issue that caused exports to be stuck for extended periods. ### Studio - **Fixed volume normalization**: Fixed issue with streaming project snapshots when volume normalization is enabled. ### New API endpoints - **Forced alignment**: Added new [forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text, perfect for subtitle generation. - **Batch calling**: Added batch calling [endpoint](/docs/agents-platform/api-reference/batch-calling/create) for scheduling calls to multiple recipients ### API ## New Endpoints - Added [Forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text - Added dedicated endpoints for knowledge base document types: - [Create text document](/docs/api-reference/knowledge-base/create-from-text) - [Create file document](/docs/api-reference/knowledge-base/create-from-file) - [Create URL document](/docs/api-reference/knowledge-base/create-from-url) ## Updated Endpoints ### Text to Speech - Added a-law format (8kHz) to all audio endpoints: - [Text to speech](/docs/api-reference/text-to-speech/convert) - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - [Speech to speech](/docs/api-reference/speech-to-speech) - [Stream speech to speech](/docs/api-reference/speech-to-speech/stream) - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - [Sound generation](/docs/api-reference/sound-generation) ### Voices - [Get voices](/docs/api-reference/voices/search) - Added `collection_id` parameter for filtering voices by collection ### Knowledge Base - [Get knowledge base](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Added `types` parameter for filtering documents by type - General endpoint for creating knowledge base documents marked as deprecated in favor of specialized endpoints ### User Subscription - [Get user subscription](/docs/api-reference/user/subscription/get) - Added `professional_voice_slots_used` property to track number of professional voices used in a workspace ### Agents Platform - Added `silence_end_call_timeout` parameter to set maximum wait time before terminating a call - Removed `/v1/convai/agents/{agent_id}/add-secret` endpoint (now handled by workspace secrets endpoints) # March 31, 2025 ### Text to speech - **Opus format support**: Added support for Opus format with 48kHz sample rate across multiple bitrates (32-192 kbps). - **Improved websocket error handling**: Updated TTS websocket API to return more accurate error codes (1011 for internal errors instead of 1008) for better error identification and SLA monitoring. ### Agents Platform - **Twilio outbound**: Added ability to natively run outbound calls. - **Post-call webhook override**: Added ability to override post-call webhook settings at the agent level, providing more flexible configurations. - **Large knowledge base document viewing**: Enhanced the knowledge base interface to allow viewing the entire content of large RAG documents. - **Added call SID dynamic variable**: Added `system__call_sid` as a system dynamic variable to allow referencing the call ID in prompts and tools. ### Studio - **Actor Mode**: Added Actor Mode in Studio, allowing you to use your own voice recordings to direct the way speech should sound in Studio projects. - **Improved keyboard shortcuts**: Updated keyboard shortcuts for viewing settings and editor shortcuts to avoid conflicts and simplified shortcuts for locking paragraphs. ### Dubbing - **Dubbing duplication**: Made dubbing duplication feature available to all users. - **Manual mode foreground generation**: Added ability to generate foreground audio when using manual mode with a file and CSV. ### Voices - **Enhanced voice collections**: Improved voice collections with visual upgrades, language-based filtering, navigation breadcrumbs, collection images, and mouse dragging for carousel navigation. - **Locale filtering**: Added locale parameter to shared voices endpoint for more precise voice filtering. ### API ## Updated Endpoints ### Text to Speech - Updated Text to Speech endpoints: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Added `apply_language_text_normalization` parameter for improved text pronunciation in supported languages (currently Japanese) - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added `apply_language_text_normalization` - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added `apply_language_text_normalization` - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added `apply_language_text_normalization` ### Audio Format - Added Opus format support to multiple endpoints: - [Text to speech](/docs/api-reference/text-to-speech/convert) - Added support for Opus format with 48kHz sample rate at multiple bitrates (32, 64, 96, 128, 192 kbps) - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added Opus format options - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added Opus format options - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added Opus format options - [Speech to speech](/docs/api-reference/speech-to-speech) - Added Opus format options - [Stream speech to speech](/docs/api-reference/speech-to-speech/stream) - Added Opus format options - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added Opus format options - [Sound generation](/docs/api-reference/sound-generation) - Added Opus format options ### Agents Platform - Updated Agents Platform endpoints: - [Delete agent](/docs/api-reference/agents/delete) - Changed success response code from 200 to 204 - [Updated RAG embedding model options](docs/api-reference/knowledge-base/rag-index-status#request.body.model) - replaced `gte_Qwen2_15B_instruct` with `multilingual_e5_large_instruct` ### Voices - Updated Voice endpoints: - [Get shared voices](/docs/api-reference/voice-library/get-shared) - Added locale parameter for filtering voices by language region ### Dubbing - Updated Dubbing endpoint: - [Dub a video or audio file](/docs/api-reference/dubbing/create) - Renamed beta feature `use_replacement_voices_from_library` parameter to `disable_voice_cloning` for clarity # March 24, 2025 ### Voices - **List Voices V2**: Added a new [V2 voice search endpoint](/docs/api-reference/voices/search) with better search and additional filtering options ### Agents Platform - **Native outbound calling**: Added native outbound calling for Twilio-configured numbers, eliminating the need for complex setup configurations. Outbound calls are now visible in the Call History page. - **Automatic language detection**: Added new system tool for automatic language detection that enables agents to switch languages based on both explicit user requests ("Let's talk in Spanish") and implicit language in user audio. - **Pronunciation dictionary improvements**: Fixed phoneme tags in pronunciation dictionaries to work correctly with Agents Platform. - **Large RAG document viewing**: Added ability to view the entire content of large RAG documents in the knowledge base. - **Customizable widget controls**: Updated UI to include an optional mute microphone button and made widget icons customizable via slots. ### Sound Effects - **Fractional duration support**: Fixed an issue where users couldn't enter fractional values (like 0.5 seconds) for sound effect generation duration. ### Speech to Text - **Repetition handling**: Improved detection and handling of repetitions in speech-to-text processing. ### Studio - **Reader publishing fixes**: Added support for mp3_44100_192 output format (high quality) so users below Publisher tier can export audio to Reader. ### Mobile - **Core app signup**: Added signup endpoints for the new Core mobile app. ### API ## New Endpoints - Added 5 new endpoints: - [List voices (v2)](/docs/api-reference/voices/search) - Enhanced voice search capabilities with additional filtering options - [Initiate outbound call](/docs/api-reference/conversations/outbound-call) - New endpoint for making outbound calls via Twilio integration - [Add pronunciation dictionary from rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Create pronunciation dictionaries directly from rules without file upload - [Get knowledge base document content](/docs/api-reference/knowledge-base/get-knowledge-base-document-content) - Retrieve full document content from the knowledge base - [Get knowledge base document chunk](/docs/api-reference/knowledge-base/get-knowledge-base-document-part-by-id) - Retrieve specific chunks from knowledge base documents ## Updated Endpoints ### Agents Platform - Updated Agents Platform endpoints: - [Create agent](/docs/api-reference/agents/create) - Added `mic_muting_enabled` property for UI control and `workspace_overrides` property for workspace-specific configurations - [Update agent](/docs/api-reference/agents/update) - Added `workspace_overrides` property for customizing agent behavior per workspace - [Get agent](/docs/api-reference/agents/get) - Added `workspace_overrides` property to the response - [Get widget](/docs/api-reference/widget/get-agent-widget) - Added `mic_muting_enabled` property for controlling microphone muting in the widget UI - [Get conversation](/docs/api-reference/conversations/get-conversation) - Added rag information to view knowledge base content used during conversations - [Create phone number](/docs/api-reference/phone-numbers) - Replaced generic structure with specific twilio phone number and sip trunk options - [Compute RAG index](/docs/agents-platform/api-reference/knowledge-base/compute-rag-index) - Removed `force_reindex` query parameter for more controlled indexing - [List knowledge base documents](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Changed response structure to support different document types - [Get knowledge base document](/docs/api-reference/knowledge-base/get) - Modified to return different response models based on document type ### Text to Speech - Updated Text to Speech endpoints: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made properties optional, including `stability` and `similarity` settings - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made voice settings properties optional for more flexible streaming requests - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made settings optional and modified `pronunciation_dictionary_locators` property - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Made voice settings properties optional for more flexible requests ### Speech to Text - Updated Speech to Text endpoint: - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Removed `biased_keywords` property from form data and improved internal repetition detection algorithm ### Voice Management - Updated Voice endpoints: - [Get voices](/docs/api-reference/voices/search) - Updated voice settings properties in the response - [Get default voice settings](/docs/api-reference/voices/settings/get-default) - Made `stability` and `similarity` properties optional - [Get voice settings](/docs/api-reference/voices/settings/get) - Made numeric properties optional for more flexible configuration - [Edit voice settings](/docs/api-reference/voices/settings/update) - Made `stability` and `similarity` settings optional - [Create voice](/docs/api-reference/voices/ivc/create) - Modified array properties to accept null values - [Create voice from preview](/docs/api-reference/text-to-voice/create) - Updated voice settings model with optional properties ### Studio - Updated Studio endpoints: - [Get project](/docs/api-reference/studio/get-project) - Added `version_rules_num` to project metadata - [Get project snapshot](/docs/api-reference/studio/get-project-snapshot) - Removed `status` property - [Create pronunciation dictionaries](/docs/api-reference/studio/create-pronunciation-dictionaries) - Modified `pronunciation_dictionary_locators` property and string properties to accept null values ### Pronunciation Dictionary - Updated Pronunciation Dictionary endpoints: - [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Added `sort` and `sort_direction` query parameters, plus `latest_version_rules_num` and `integer` properties to response - [Get pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/get) - Added `latest_version_rules_num` and `integer` properties to response - [Add from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Added `version_rules_num` property to response for tracking rules quantity - [Add rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Added `version_rules_num` to response for rules tracking - [Remove rules](/docs/api-reference/pronunciation-dictionary/remove-rules) - Added `version_rules_num` to response for rules tracking # March 17, 2025 ### Agents Platform - **Default LLM update**: Changed the default agent LLM from Gemini 1.5 Flash to Gemini 2.0 Flash for improved performance. - **Fixed incorrect conversation abandons**: Improved detection of conversation continuations, preventing premature abandons when users repeat themselves. - **Twilio information in history**: Added Twilio call details to conversation history for better tracking. - **Knowledge base redesign**: Redesigned the knowledge base interface. - **System dynamic variables**: Added system dynamic variables to use time, conversation id, caller id and other system values as dynamic variables in prompts and tools. - **Twilio client initialisation**: Adds an agent level override for conversation initiation client data twilio webhook. - **RAG chunks in history**: Added retrieved chunks by RAG to the call transcripts in the [history view](https://elevenlabs.io/app/agents/history). ### Speech to Text - **Reduced pricing**: Reduced the pricing of our Scribe model, see more [here](/docs/capabilities/speech-to-text#pricing). - **Improved VAD detection**: Enhanced Voice Activity Detection with better pause detection at segment boundaries and improved handling of silent segments. - **Enhanced diarization**: Improved speaker clustering with a better ECAPA model, symmetric connectivity matrix, and more selective speaker embedding generation. - **Fixed ASR bugs**: Resolved issues with VAD rounding, silence and clustering that affected transcription accuracy. ### Studio - **Disable publishing UI**: Added ability to disable the publishing interface for specific workspace members to support enterprise workflows. - **Snapshot API improvement**: Modified endpoints for project and chapter snapshots to return an empty list instead of throwing errors when snapshots can't be downloaded. - **Disabled auto-moderation**: Turned off automatic moderation based on Text to Speech generations in Studio. ### Workspaces - **Fixed API key editing**: Resolved an issue where editing workspace API keys would reset character limits to zero, causing the keys to stop working. - **Optimized free subscriptions**: Fixed an issue with refreshing free subscription character limits, ### API ## New Endpoints - Added 3 new endpoints: - [Get workspace resource](/docs/api-reference/workspace/get-resource) - [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource) - [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource) ## Updated Endpoints ### Dubbing - Updated Dubbing endpoints: - [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `use_replacement_voices_from_library` property and made `source_path`, `target_language`, `source_language` nullable - [Resource dubbing](/docs/api-reference/dubbing/dub-segments) - Made `language_codes` array nullable - [Add language to dubbing resource](/docs/api-reference/dubbing/add-language-to-resource) - Made `language_code` nullable - [Translate dubbing resource](/docs/api-reference/dubbing/translate-segments) - Made `target_languages` array nullable - [Update dubbing segment](/docs/api-reference/dubbing/update-segment-language) - Made `start_time` and `end_time` nullable ### Project Management - Updated Project endpoints: - [Add project](/docs/api-reference/studio/add-project) - Made `metadata`, `project_name`, `description` nullable - [Create podcast](/docs/api-reference/studio/create-podcast) - Made `title`, `description`, `author` nullable - [Get project](/docs/api-reference/studio/get-project) - Made `last_modified_at`, `created_at`, `project_name` nullable - [Add chapter](/docs/api-reference/studio/add-chapter) - Made `chapter_id`, `word_count`, `statistics` nullable - [Update chapter](/docs/api-reference/studio/update-chapter) - Made `content` and `blocks` properties nullable ### Agents Platform - Updated Agents Platform endpoints: - [Update agent](/docs/api-reference/agents/update) - Made `conversation_config`, `platform_settings` nullable and added `workspace_overrides` property - [Create agent](/docs/api-reference/agents/create) - Made `agent_name`, `prompt`, `widget_config` nullable and added `workspace_overrides` property - [Add to knowledge base](/docs/api-reference/knowledge-base/create-from-url) - Made `document_name` nullable - [Get conversation](/docs/api-reference/conversations/get-conversation) - Added `twilio_call_data` model and made `transcript`, `metadata` nullable ### Text to Speech - Updated Text to Speech endpoints: - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made `character_alignment` and `word_alignment` nullable ### Voice Management - Updated Voice endpoints: - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added `loudness`, `quality`, `guidance_scale` properties - [Create voice from preview](/docs/api-reference/text-to-voice/create) - Added `speaker_separation` properties and made `voice_id`, `name`, `labels` nullable - [Get voice](/docs/api-reference/voices/get) - Added `speaker_boost`, `speaker_clarity`, `speaker_isolation` properties ### Speech to Text - Updated Speech to Text endpoint: - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `biased_keywords` property ### Other Updates - [Download history](/docs/api-reference/history/download) - Added application/zip content type and 400 response - [Add pronunciation dictionary from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Made `dictionary_name` and `description` nullable # March 10, 2025 ### Agents Platform - **HIPAA compliance**: Agents Platform is now [HIPAA compliant](/docs/agents-platform/legal/hipaa) on appropriate plans, when a BAA is signed, zero-retention mode is enabled and appropriate LLMs are used. For access please [contact sales](/contact-sales) - **Cascade LLM**: Added dynamic dispatch during the LLM step to other LLMs if your default LLM fails. This results in higher latency but prevents the turn failing. - **Better error messages**: Added better error messages for websocket failures. - **Audio toggling**: Added ability to select only user or agent audio in the conversation playback. ### Scribe - **HIPAA compliance**: Added a zero retention mode to Scribe to be HIPAA compliant. - **Diarization**: Increased time length of audio files that can be transcribed with diarization from 8 minutes to 2 hours. - **Cheaper pricing**: Updated Scribe's pricing to be cheaper, as low as $0.22 per hour for the Business tier. - **Memory usage**: Shipped improvements to Scribe's memory usage. - **Fixed timestamps**: Fixed an issue that was causing incorrect timestamps to be returned. ### Text to Speech - **Pronunciation dictionaries**: Fixed pronunciation dictionary rule application for replacements that contain symbols. ### Dubbing - **Studio support**: Added support for creating dubs with `dubbing_studio` enabled, allowing for more advanced dubbing workflows beyond one-off dubs. ### Voices - **Verification**: Fixed an issue where users on probation could not verify their voice clone. ### API ## New Endpoints - Added 7 new endpoints: - [Add a shared voice to your collection](/docs/api-reference/voice-library/share) - [Archive a project snapshot](/docs/api-reference/studio/archive-snapshot) - [Update a project](/docs/api-reference/studio/edit-project) - [Create an Audio Native enabled project](/docs/api-reference/audio-native/create) - [Get all voices](/docs/api-reference/voices/search) - [Download a pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/download) - [Get Audio Native project settings](/docs/api-reference/audio-native/get-settings) ## Updated Endpoints ### Studio Projects - Updated Studio project endpoints to add `source_type` property and deprecate `quality_check_on` and `quality_check_on_when_bulk_convert` properties: - [Get projects](/docs/api-reference/studio/get-projects) - [Get project](/docs/api-reference/studio/get-project) - [Add project](/docs/api-reference/studio/add-project) - [Update content](/docs/api-reference/studio/update-content) - [Create podcast](/docs/api-reference/studio/create-podcast) ### Voice Management - Updated Voice endpoints with several property changes: - [Get voice](/docs/api-reference/voices/get) - Made several properties optional and added `preview_url` - [Create voice](/docs/api-reference/voices/ivc/create) - Made several properties optional and added `preview_url` - [Create voice from preview](/docs/api-reference/text-to-voice/create) - Made several properties optional and added `preview_url` - [Get similar voices](/docs/api-reference/voices/get-similar-library-voices) - Made `language`, `description`, `preview_url`, and `rate` properties optional ### Agents Platform - Updated ElevenLabs agent endpoints: - [Update agent](/docs/api-reference/agents/update) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties - [Create agent](/docs/api-reference/agents/create) - Modified `conversation_config`, `agent`, `prompt`, platform_settings, widget properties and added `shareable_page_show_terms` - [Get agent](/docs/api-reference/agents/get) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties - [Get widget](/docs/api-reference/widget/get-agent-widget) - Modified `widget_config` property and added `shareable_page_show_terms` ### Knowledge Base - Updated Knowledge Base endpoints to add metadata property: - [List knowledge base documents](/docs/api-reference/knowledge-base/list#response.body.metadata) - [Get knowledge base document](/docs/api-reference/knowledge-base/get-document#response.body.metadata) ### Other Updates - [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `dubbing_studio` property - [Convert text to sound effects](/docs/api-reference/text-to-sound-effects/convert) - Added `output_format` query parameter - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `enable_logging` query parameter - [Get secrets](/docs/api-reference/workspace/secrets/list) - Modified `secrets` and `used_by` properties - [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Made `next_cursor` property optional ## Removed Endpoints - Temporarily removed Agents Platform tools endpoints: - Get tool - List tools - Update tool - Create tool - Delete tool # March 3, 2025 ### Dubbing - **Scribe for speech recognition**: Dubbing Studio now uses Scribe by default for speech recognition to improve accuracy. ### Speech to Text - **Fixes**: Shipped several fixes improving the stability of Speech to Text. ### Agents Platform - **Speed control**: Added speed control to an agent's settings in Agents Platform. - **Post call webhook**: Added the option of sending [post-call webhooks](/docs/agents-platform/workflows/post-call-webhooks) after conversations are completed. - **Improved error messages**: Added better error messages to the Agents Platform websocket. - **Claude 3.7 Sonnet**: Added Claude 3.7 Sonnet as a new LLM option in Agents Platform. ### API #### New Endpoints - Added new Dubbing resource management endpoints: - for adding [languages to dubs](/docs/api-reference/dubbing/resources/add-language) - for retrieving [dubbing resources](/docs/api-reference/dubbing/resources/get-resource) - for creating [segments](/docs/api-reference/dubbing/resources/create-segment) - for modifying [segments](/docs/api-reference/dubbing/resources/update-segment) - for removing [segments](/docs/api-reference/dubbing/resources/delete-segment) - for dubbing [segments](/docs/api-reference/dubbing/resources/dub-segment) - for transcribing [segments](/docs/api-reference/dubbing/resources/transcribe-segment) - for translating [segments](/docs/api-reference/dubbing/resources/translate-segment) - Added Knowledge Base RAG indexing [endpoint](/docs/agents-platform/api-reference/knowledge-base/compute-rag-index) - Added Studio snapshot retrieval endpoints for [projects](/docs/api-reference/studio/get-project-snapshot) and [chapters](/docs/api-reference/studio/get-chapter-snapshot) #### Updated Endpoints - Added `prompt_injectable` property to knowledge base [endpoints](docs/api-reference/knowledge-base/get#response.body.prompt_injectable) - Added `name` property to Knowledge Base document [creation](/docs/api-reference/knowledge-base/create-from-url#request.body.name) and [retrieval](/docs/api-reference/knowledge-base/get-document#response.body.name) endpoints: - Added `speed` property to [agent creation](/docs/api-reference/agents/create#request.body.conversation_config.tts.speed) - Removed `secrets` property from agent endpoints (now handled by dedicated secrets endpoints) - Added [secret deletion endpoint](/docs/api-reference/workspace/secrets/delete) for removing secrets - Removed `secrets` property from settings [endpoints](/docs/api-reference/workspace/get) # February 25, 2025 ### Speech to Text - **ElevenLabs launched a new state of the art [Speech to Text API](/docs/capabilities/speech-to-text) available in 99 languages.** ### Text to Speech - **Speed control**: Added speed control to the Text to Speech API. ### Studio - **Auto-assigned projects**: Increased token limits for auto-assigned projects from 1 month to 3 months worth of tokens, addressing user feedback about working on longer projects. - **Language detection**: Added automatic language detection when generating audio for the first time, with suggestions to switch to Eleven Turbo v2.5 for languages not supported by Multilingual v2 (Hungarian, Norwegian, Vietnamese). - **Project export**: Enhanced project exporting in ElevenReader with better metadata tracking. ### Dubbing - **Clip overlap prevention**: Added automatic trimming of overlapping clips in dubbing jobs to ensure clean audio tracks for each speaker and language. ### Voice Management - **Instant Voice Cloning**: Improved preview generation for Instant Voice Cloning v2, making previews available immediately. ### Agents Platform - **Agent ownership**: Added display of agent creators in the agent list, improving visibility and management of shared agents. ### Web app - **Dark mode**: Added dark mode to the web app. ### API - Launched **/v1/speech-to-text** [endpoint](/docs/api-reference/speech-to-text/convert) - Added `agents.level` property to [ElevenLabs agents endpoint](/docs/api-reference/agents/get#response.body.agents.access_level) - Added `platform_settings` to [ElevenLabs agent endpoint](/docs/api-reference/agents/update#request.body.platform_settings) - Added `expandable` variant to `widget_config`, with configuration options `show_avatar_when_collapsed` and `disable_banner` to [ElevenLabs agent widget endpoint](/docs/api-reference/agents/get#response.body.widget) - Added `webhooks` property and `used_by` to `secrets` to [secrets endpoint](/docs/api-reference/workspace/secrets/list#response.body.secrets.used_by) - Added `verified_languages` to [voices endpoint](/docs/api-reference/voices/get#response.body.verified_languages) - Added `speed` property to [voice settings endpoints](/docs/api-reference/voices/get#response.body.settings.speed) - Added `verified_languages`, `is_added_by_user` to `voices` and `min_notice_period_days` query parameter to [shared voices endpoint](/docs/api-reference/voice-library/get-shared#request.query) - Added `verified_languages`, `is_added_by_user` to `voices` in [similar voices endpoint](/docs/api-reference/voices/get-similar-library-voices) - Added `search`, `show_only_owned_documents`, `use_typesense` query parameters to [knowledge base endpoint](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.search) - Added `used_by` to Conversation AI [secrets endpoint](/docs/api-reference/workspace/secrets/list) - Added `invalidate_affected_text` property to Studio [pronunciation dictionaries endpoint](/docs/api-reference/studio/create-pronunciation-dictionaries#request.body.invalidate_affected_text) # February 17, 2025 ### Agents Platform - **Tool calling fix**: Fixed an issue where tool calling was not working with agents using gpt-4o mini. This was due to a breaking change in the OpenAI API. - **Tool calling improvements**: Added support for tool calling with dynamic variables inside objects and arrays. - **Dynamic variables**: Fixed an issue where dynamic variables of a conversation were not being displayed correctly. ### Voice Isolator - **Fixed**: Fixed an issue that caused the voice isolator to not work correctly temporarily. ### Workspace - **Billing**: Improved billing visibility by differentiating rollover, cycle, gifted, and usage-based credits. - **Usage Analytics**: Improved usage analytics load times and readability. - **Fine grained fiat billing**: Added support for customizable pricing based on several factors. ### API - Added `phone_numbers` property to [Agent responses](/docs/api-reference/agents/get) - Added usage metrics to subscription_extras in [User endpoint](/docs/api-reference/user/get): - `unused_characters_rolled_over_from_previous_period` - `overused_characters_rolled_over_from_previous_period` - `usage` statistics - Added `enable_conversation_initiation_client_data_from_webhook` to [Agent creation](/docs/api-reference/agents/create) - Updated [Agent](/docs/api-reference/agents) endpoints with consolidated settings for: - `platform_settings` - `overrides` - `safety` - Deprecated `with_settings` parameter in [Voice retrieval endpoint](/docs/api-reference/voices/get) # February 10, 2025 ## Agents Platform - **Updated Pricing**: Updated self-serve pricing for Agents Platform with [reduced cost and a more generous free tier](/docs/agents-platform/overview#pricing-tiers). - **Knowledge Base UI**: Created a new page to easily manage your [knowledge base](/app/agents/knowledge-base). - **Live calls**: Added number of live calls in progress in the user [dashboard](/app/agents) and as a new endpoint. - **Retention**: Added ability to customize transcripts and audio recordings [retention settings](/docs/agents-platform/customization/privacy/retention). - **Audio recording**: Added a new option to [disable audio recordings](/docs/agents-platform/customization/privacy/audio-saving). - **8k PCM support**: Added support for 8k PCM audio for both input and output. ## Studio - **GenFM**: Updated the create podcast endpoint to accept [multiple input sources](/docs/api-reference/studio/create-podcast). - **GenFM**: Fixed an issue where GenFM was creating empty podcasts. ## Enterprise - **New workspace group endpoints**: Added new endpoints to manage [workspace groups](/docs/api-reference/workspace/search-user-groups). ### API **Studio (formerly Projects)** All `/v1/projects/*` endpoints have been deprecated in favor of the new `/v1/studio/projects/*` endpoints. The following endpoints are now deprecated: - All operations on `/v1/projects/` - All operations related to chapters, snapshots, and content under `/v1/projects/*` **Agents Platform** - `POST /v1/convai/add-tool` - Use `POST /v1/convai/tools` instead - `DELETE /v1/convai/agents/{agent_id}` - Response type is no longer an object - `GET /v1/convai/tools` - Response type changed from array to object with a `tools` property **Agents Platform Updates** - `GET /v1/convai/agents/{agent_id}` - Updated conversation configuration and agent properties - `PATCH /v1/convai/agents/{agent_id}` - Added `use_tool_ids` parameter for tool management - `POST /v1/convai/agents/create` - Added tool integration via `use_tool_ids` **Knowledge Base & Tools** - `GET /v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties - `GET /v1/convai/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties - `GET /v1/convai/tools/{tool_id}` - Added `dependent_agents` property - `PATCH /v1/convai/tools/{tool_id}` - Added `dependent_agents` property **GenFM** - `POST /v1/projects/podcast/create` - Added support for multiple input sources **Studio (formerly Projects)** New endpoints replacing the deprecated `/v1/projects/*` endpoints - `GET /v1/studio/projects`: List all projects - `POST /v1/studio/projects`: Create a project - `GET /v1/studio/projects/{project_id}`: Get project details - `DELETE /v1/studio/projects/{project_id}`: Delete a project **Knowledge Base Management** - `GET /v1/convai/knowledge-base`: List all knowledge base documents - `DELETE /v1/convai/knowledge-base/{documentation_id}`: Delete a knowledge base - `GET /v1/convai/knowledge-base/{documentation_id}/dependent-agents`: List agents using this knowledge base **Workspace Groups** - New enterprise features for team management - `GET /v1/workspace/groups/search`: Search workspace groups - `POST /v1/workspace/groups/{group_id}/members`: Add members to a group - `POST /v1/workspace/groups/{group_id}/members/remove`: Remove members from a group **Tools** - `POST /v1/convai/tools`: Create new tools for agents ## Socials - **ElevenLabs Developers**: Follow our new developers account on X [@ElevenLabsDevs](https://x.com/intent/user?screen_name=elevenlabsdevs) # February 4, 2025 ### Agents Platform - **Agent monitoring**: Added a new dashboard for monitoring ElevenLabs agents' activity. Check out your's [here](/app/agents). - **Proactive conversations**: Enhanced capabilities with improved timeout retry logic. [Learn more](/docs/agents-platform/customization/conversation-flow) - **Tool calls**: Fixed timeout issues occurring during tool calls - **Allowlist**: Fixed implementation of allowlist functionality. - **Content summarization**: Added Gemini as a fallback model to ensure service reliability - **Widget stability**: Fixed issue with dynamic variables causing the Agents Platform widget to fail ### Reader - **Trending content**: Added carousel showcasing popular articles and trending content - **New publications**: Introduced dedicated section for recent ElevenReader Publishing releases ### Studio (formerly Projects) - **Projects is now Studio** and is now generally available to everyone - **Chapter content editing**: Added support for editing chapter content through the public API, enabling programmatic updates to chapter text and metadata - **GenFM public API**: Added public API support for podcast creation through GenFM. Key features include: - Conversation mode with configurable host and guest voices - URL-based content sourcing - Customizable duration and highlights - Webhook callbacks for status updates - Project snapshot IDs for audio downloads ### SDKs - **Swift**: fixed an issue where resources were not being released after the end of a session - **Python**: added uv support - **Python**: fixed an issue where calls were not ending correctly ### API - Added POST `v1/workspace/invites/add-bulk` [endpoint](/docs/api-reference/workspace/invite-multiple-users) to enable inviting multiple users simultaneously - Added POST `v1/projects/podcast/create` [endpoint](/docs/api-reference/studio/create-podcast) for programmatic podcast generation through GenFM - Added 'v1/convai/knowledge-base/:documentation_id' [endpoints](/docs/api-reference/knowledge-base/) with CRUD operations for Agents Platform - Added PATCH `v1/projects/:project_id/chapters/:chapter_id` [endpoint](/docs/api-reference/studio/update-chapter) for updating project chapter content and metadata - Added `group_ids` parameter to [Workspace Invite endpoint](/docs/api-reference/workspace/invite-user) for group-based access control - Added structured `content` property to [Chapter response objects](/docs/api-reference/studio/get-chapter) - Added `retention_days` and `delete_transcript_and_pii` data retention parameters to [Agent creation](/docs/api-reference/agents/create) - Added structured response to [AudioNative content](/docs/api-reference/audio-native/create#response.body.project_id) - Added `convai_chars_per_minute` usage metric to [User endpoint](/docs/api-reference/user/get) - Added `media_metadata` field to [Dubbing response objects](/docs/api-reference/dubbing/get) - Added GDPR-compliant `deletion_settings` to [Conversation responses](/docs/api-reference/conversations/get-conversation#response.body.metadata.deletion_settings) - Deprecated Knowledge Base legacy endpoints: - POST `/v1/convai/agents/{agent_id}/add-to-knowledge-base` - GET `/v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Updated Agent endpoints with consolidated [privacy control parameters](/docs/api-reference/agents/create) # January 27, 2025 ### Docs - **Shipped our new docs**: we're keen to hear your thoughts, you can reach out by opening an issue on [GitHub](https://github.com/elevenlabs/elevenlabs-docs) or chatting with us on [Discord](https://discord.gg/elevenlabs) ### Agents Platform - **Dynamic variables**: Available in the dashboard and SDKs. [Learn more](/docs/agents-platform/customization/personalization/dynamic-variables) - **Interruption handling**: Now possible to ignore user interruptions in Agents Platform. [Learn more](/docs/agents-platform/customization/conversation-flow#interruptions) - **Twilio integration**: Shipped changes to increase audio quality when integrating with Twilio - **Latency optimization**: Published detailed blog post on latency optimizations. [Read more](/blog/how-do-you-optimize-latency-for-conversational-ai) - **PCM 8000**: Added support for PCM 8000 to ElevenLabs agents - **Websocket improvements**: Fixed unexpected websocket closures ### Projects - **Auto-regenerate**: Auto-regeneration now available by default at no extra cost - **Content management**: Added `updateContent` method for dynamic content updates - **Audio conversion**: New auto-convert and auto-publish flags for seamless workflows ### API - Added `Update Project` endpoint for [project editing](/docs/api-reference/studio/edit-project#:~:text=List%20projects-,POST,Update%20project,-GET) - Added `Update Content` endpoint for [AudioNative content management](/docs/api-reference/audio-native/update-content) - Deprecated `quality_check_on` parameter in [project operations](/docs/api-reference/studio/add-project#request.body.quality_check_on). It is now enabled for all users at no extra cost - Added `apply_text_normalization` parameter to project creation with modes 'auto', 'on', 'apply_english' and 'off' for controlling text normalization during [project creation](/docs/api-reference/studio/add-project#request.body.apply_text_normalization) - Added alpha feature `auto_assign_voices` in [project creation](/docs/api-reference/studio/add-project#request.body.auto_assign_voices) to automatically assign voices to phrases - Added `auto_convert` flag to project creation to automatically convert [projects to audio](/docs/api-reference/audio-native/create#request.body.auto_convert) - Added support for creating ElevenLabs agents with [dynamic variables](/docs/api-reference/agents/create#request.body.conversation_config.agent.dynamic_variables) - Added `voice_slots_used` to `Subscription` model to track number of custom voices used in a workspace to the `User` [endpoint](/docs/api-reference/user/subscription/get#response.body.voice_slots_used) - Added `user_id` field to `User` [endpoint](/docs/api-reference/user/get#response.body.user_id) - Marked legacy AudioNative creation parameters (`image`, `small`, `sessionization`) as deprecated [parameters](/docs/api-reference/audio-native/create#request.body.image) - Agents platform now supports `call_limits` containing either `agent_concurrency_limit` or `daily_limit` or both parameters to control simultaneous and daily conversation limits for [agents](/docs/api-reference/agents/create#request.body.platform_settings.call_limits) - Added support for `language_presets` in `conversation_config` to customize language-specific [settings](/docs/api-reference/agents/create#request.body.conversation_config.language_presets) ### SDKs - **Cross-Runtime Support**: Now compatible with **Bun 1.1.45+** and **Deno 2.1.7+** - **Regenerated SDKs**: We regenerated our SDKs to be up to date with the latest API spec. Check out the latest [Python SDK release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/1.50.5) and [JS SDK release](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v1.50.4) - **Dynamic Variables**: Fixed an issue where dynamic variables were not being handled correctly, they are now correctly handled in all SDKs # January 16, 2025 ## Product ### Agents Platform - **Additional languages**: Add a language dropdown to your widget so customers can launch conversations in their preferred language. Learn more [here](/docs/agents-platform/customization/language). - **End call tool**: Let the agent automatically end the call with our new “End Call” tool. Learn more [here](/docs/agents-platform/customization/tools) - **Flash default**: Flash, our lowest latency model, is now the default for new agents. In your agent dashboard under “voice”, you can toggle between Turbo and Flash. Learn more about Flash [here](https://elevenlabs.io/blog/meet-flash). - **Privacy**: Set concurrent call and daily call limits, turn off audio recordings, add feedback collection, and define customer terms & conditions. - **Increased tool limits**: Increase the number of tools available to your agent from 5 to 15. Learn more [here](/docs/agents-platform/customization/tools). # January 2, 2025 ## Product - **Workspace Groups and Permissions**: Introduced new workspace group management features to enhance access control within organizations. [Learn more](https://elevenlabs.io/blog/workspace-groups-and-permissions). # December 19, 2024 ## Model - **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency Agents Platform applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech). ## Launches - **[TalkToSanta.io](https://www.talktosanta.io)**: Experience Agents Platform in action by talking to Santa this holiday season. For every conversation with santa we donate 2 dollars to [Bridging Voice](https://www.bridgingvoice.org) (up to $11,000). - **[AI Engineer Pack](https://aiengineerpack.com)**: Get $50+ in credits from leading AI developer tools, including ElevenLabs. # December 6, 2024 ## Product - **GenFM Now on Web**: Access GenFM directly from the website in addition to the ElevenReader App, [try it now](https://elevenlabs.io/app/projects). # December 3, 2024 ## API - **Credit Usage Limits**: Set specific credit limits for API keys to control costs and manage usage across different use cases by setting "Access" or "No Access" to features like Dubbing, Audio Native, and more. [Check it out](https://elevenlabs.io/app/settings/api-keys) - **Workspace API Keys**: Now support access permissions, such as "Read" or "Read and Write" for User, Workspace, and History resources. - **Improved Key Management**: - Redesigned interface moving from modals to dedicated pages - Added detailed descriptions and key information - Enhanced visibility of key details and settings # November 29, 2024 ## Product - **GenFM**: Launched in the ElevenReader app. [Learn more](https://elevenlabs.io/blog/genfm-on-elevenreader) - **Agents Platform**: Now generally available to all customers. [Try it now](https://elevenlabs.io/conversational-ai) - **TTS Redesign**: The website TTS redesign is now rolled out to all customers. - **Auto-regenerate**: Now available in Projects. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects) - **Reader Platform Improvements**: - Improved content sharing with enhanced landing pages and social media previews. - Added podcast rating system and improved voice synchronization. - **Projects revamp**: - Restore past generations, lock content, assign speakers to sentence fragments, and QC at 2x speed. [Learn more](https://elevenlabs.io/blog/narrate-any-project) - Auto-regeneration identifies mispronunciations and regenerates audio at no extra cost. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects) ## API - **Agents Platform**: [SDKs and APIs](https://elevenlabs.io/docs/agents-platform/quickstart) now available. # October 27, 2024 ## API - **u-law Audio Formats**: Added u-law audio formats to the Convai API for integrations with Twilio. - **TTS Websocket Improvements**: TTS websocket improvements, flushes and generation work more intuitively now. - **TTS Websocket Auto Mode**: A streamlined mode for using websockets. This setting reduces latency by disabling chunk scheduling and buffers. Note: Using partial sentences will result in significantly reduced quality. - **Improvements to latency consistency**: Improvements to latency consistency for all models. ## Website - **TTS Redesign**: The website TTS redesign is now in alpha! # October 20, 2024 ## API - **Normalize Text with the API**: Added the option normalize the input text in the TTS API. The new parameter is called `apply_text_normalization` and works on all models. For v2.5 models, this feature is available with Enterprise plans only. ## Product - **Voice Design**: The Voice Design feature is now in beta! # October 13, 2024 ## Model - **Stability Improvements**: Significant audio stability improvements across all models, most noticeable on `turbo_v2` and `turbo_v2.5`, when using: - Websockets - Projects - Reader app - TTS with request stitching - ConvAI - **Latency Improvements**: Reduced time to first byte latency by approximately 20-30ms for all models. ## API - **Remove Background Noise Voice Samples**: Added the ability to remove background noise from voice samples using our audio isolation model to improve quality for IVCs and PVCs at no additional cost. - **Remove Background Noise STS Input**: Added the ability to remove background noise from STS audio input using our audio isolation model to improve quality at no additional cost. ## Feature - **Agents Platform Beta**: Agents Platform is now in beta. # Text to Speech > Learn how to turn text into lifelike spoken audio with ElevenLabs. ## Overview ElevenLabs [Text to Speech (TTS)](/docs/api-reference/text-to-speech) API turns text into lifelike audio with nuanced intonation, pacing and emotional awareness. [Our models](/docs/models) adapt to textual cues across 32 languages and multiple voice styles and can be used to: * Narrate global media campaigns & ads * Produce audiobooks in multiple languages with complex emotional delivery * Stream real-time audio from text Listen to a sample: Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project. Learn how to integrate text to speech into your application. Step-by-step guide for using text to speech in ElevenLabs. ### Voice quality For real-time applications, Flash v2.5 provides ultra-low 75ms latency, while Multilingual v2 delivers the highest quality audio with more nuanced expression.
Eleven v3
Alpha
} href="/docs/models#eleven-v3-alpha"> Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
3,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
[Explore all](/docs/models)
### Voice options ElevenLabs offers thousands of voices across 32 languages through multiple creation methods: * [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices * [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas * [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication * [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions Learn more about our [voice options](/docs/capabilities/voices). ### Supported formats The default response format is "mp3", but other formats like "PCM", & "μ-law" are available. * **MP3** * Sample rates: 22.05kHz - 44.1kHz * Bitrates: 32kbps - 192kbps * 22.05kHz @ 32kbps * 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps * **PCM (S16LE)** * Sample rates: 16kHz - 44.1kHz * Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz * 16-bit depth * **μ-law** * 8kHz sample rate * Optimized for telephony applications * **A-law** * 8kHz sample rate * Optimized for telephony applications * **Opus** * Sample rate: 48kHz * Bitrates: 32kbps - 192kbps Higher quality audio options are only available on paid tiers - see our [pricing page](https://elevenlabs.io/pricing/api) for details. ### Supported languages Our multilingual v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* Flash v2.5 supports 32 languages - all languages from v2 models plus: *Hungarian, Norwegian & Vietnamese* Simply input text in any of our supported languages and select a matching voice from our [voice library](https://elevenlabs.io/community). For the most natural results, choose a voice with an accent that matches your target language and region. ### Prompting The models interpret emotional context directly from the text input. For example, adding descriptive text like "she said excitedly" or using exclamation marks will influence the speech emotion. Voice settings like Stability and Similarity help control the consistency, while the underlying emotion comes from textual cues. Read the [prompting guide](/docs/best-practices/prompting) for more details. Descriptive text will be spoken out by the model and must be manually trimmed or removed from the audio if desired. ## FAQ Yes, you can create [instant voice clones](/docs/capabilities/voices#cloned) of your own voice from short audio clips. For high-fidelity clones, check out our [professional voice cloning](/docs/capabilities/voices#cloned) feature. Yes. You retain ownership of any audio you generate. However, commercial usage rights are only available with paid plans. With a paid subscription, you may use generated audio for commercial purposes and monetize the outputs if you own the IP rights to the input content. A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions: * You can regenerate each piece of content up to 2 times for free * The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data. Use the low-latency Flash [models](/docs/models) (Flash v2 or v2.5) optimized for near real-time conversational or interactive scenarios. See our [latency optimization guide](/docs/best-practices/latency-optimization) for more details. The models are nondeterministic. For consistency, use the optional [seed parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle differences may still occur. Split long text into segments and use streaming for real-time playback and efficient processing. To maintain natural prosody flow between chunks, include [previous/next text or previous/next request id parameters](/docs/api-reference/text-to-speech/convert#request.body.previous_text). # Speech to Text > Learn how to turn spoken audio into text with ElevenLabs. ## Overview The ElevenLabs [Speech to Text (STT)](/docs/api-reference/speech-to-text) API turns spoken audio into text with state of the art accuracy. Our Scribe v1 [model](/docs/models) adapts to textual cues across 99 languages and multiple voice styles and can be used to: * Transcribe podcasts, interviews, and other audio or video content * Generate transcripts for meetings and other audio or video recordings Learn how to integrate speech to text into your application. Step-by-step guide for using speech to text in ElevenLabs. Companies requiring HIPAA compliance must contact [ElevenLabs Sales](https://elevenlabs.io/contact-sales) to sign a Business Associate Agreement (BAA) agreement. Please ensure this step is completed before proceeding with any HIPAA-related integrations or deployments. ## State of the art accuracy The Scribe v1 model is capable of transcribing audio from up to 32 speakers with high accuracy. Optionally it can also transcribe audio events like laughter, applause, and other non-speech sounds. The transcribed output supports exact timestamps for each word and audio event, plus diarization to identify the speaker for each word. The Scribe v1 model is best used for when high-accuracy transcription is required rather than real-time transcription. A low-latency, real-time version will be released soon. ## Pricing | Tier | Price/month | Hours included | Price per included hour | Price per additional hour | | -------- | ----------- | ------------------- | ----------------------- | ------------------------- | | Free | \$0 | Unavailable | Unavailable | Unavailable | | Starter | \$5 | 12 hours 30 minutes | \$0.40 | Unavailable | | Creator | \$22 | 62 hours 51 minutes | \$0.35 | \$0.48 | | Pro | \$99 | 300 hours | \$0.33 | \$0.40 | | Scale | \$330 | 1,100 hours | \$0.30 | \$0.33 | | Business | \$1,320 | 6,000 hours | \$0.22 | \$0.22 | | Tier | Price/month | Hours included | Price per included hour | | -------- | ----------- | --------------- | ----------------------- | | Free | \$0 | 12 minutes | Unavailable | | Starter | \$5 | 1 hour | \$5 | | Creator | \$22 | 4 hours 53 min | \$4.5 | | Pro | \$99 | 24 hours 45 min | \$4 | | Scale | \$330 | 94 hours 17 min | \$3.5 | | Business | \$1,320 | 440 hours | \$3 | For reduced pricing at higher scale than 6,000 hours/month in addition to custom MSAs and DPAs, please [contact sales](https://elevenlabs.io/contact-sales). **Note: The free tier requires attribution and does not have commercial licensing.** Scribe has higher concurrency limits than other services from ElevenLabs. Please see other concurrency limits [here](/docs/models#concurrency-and-priority) | Plan | STT Concurrency Limit | | ---------- | --------------------- | | Free | 8 | | Starter | 12 | | Creator | 20 | | Pro | 40 | | Scale | 60 | | Business | 60 | | Enterprise | Elevated | ## Examples The following example shows the output of the Scribe v1 model for a sample audio file. ```javascript { "language_code": "en", "language_probability": 1, "text": "With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects.", "words": [ { "text": "With", "start": 0.119, "end": 0.259, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 0.239, "end": 0.299, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "a", "start": 0.279, "end": 0.359, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 0.339, "end": 0.499, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "soft", "start": 0.479, "end": 1.039, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 1.019, "end": 1.2, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "and", "start": 1.18, "end": 1.359, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 1.339, "end": 1.44, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "whispery", "start": 1.419, "end": 1.979, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 1.959, "end": 2.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "American", "start": 2.159, "end": 2.719, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 2.699, "end": 2.779, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "accent,", "start": 2.759, "end": 3.389, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 4.119, "end": 4.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "I'm", "start": 4.159, "end": 4.459, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 4.44, "end": 4.52, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "the", "start": 4.5, "end": 4.599, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 4.579, "end": 4.699, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "ideal", "start": 4.679, "end": 5.099, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 5.079, "end": 5.219, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "choice", "start": 5.199, "end": 5.719, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 5.699, "end": 6.099, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "for", "start": 6.099, "end": 6.199, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 6.179, "end": 6.279, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "creating", "start": 6.259, "end": 6.799, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 6.779, "end": 6.979, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "ASMR", "start": 6.959, "end": 7.739, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 7.719, "end": 7.859, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "content,", "start": 7.839, "end": 8.45, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 9, "end": 9.06, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "meditative", "start": 9.04, "end": 9.64, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 9.619, "end": 9.699, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "guides,", "start": 9.679, "end": 10.359, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 10.359, "end": 10.409, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "or", "start": 11.319, "end": 11.439, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 11.42, "end": 11.52, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "adding", "start": 11.5, "end": 11.879, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 11.859, "end": 12, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "an", "start": 11.979, "end": 12.079, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 12.059, "end": 12.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "intimate", "start": 12.179, "end": 12.579, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 12.559, "end": 12.699, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "feel", "start": 12.679, "end": 13.159, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.139, "end": 13.179, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "to", "start": 13.159, "end": 13.26, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.239, "end": 13.3, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "your", "start": 13.299, "end": 13.399, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.379, "end": 13.479, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "narrative", "start": 13.479, "end": 13.889, "type": "word", "speaker_id": "speaker_0" }, { "text": " ", "start": 13.919, "end": 13.939, "type": "spacing", "speaker_id": "speaker_0" }, { "text": "projects.", "start": 13.919, "end": 14.779, "type": "word", "speaker_id": "speaker_0" } ] } ``` The output is classified in three category types: * `word` - A word in the language of the audio * `spacing` - The space between words, not applicable for languages that don't use spaces like Japanese, Mandarin, Thai, Lao, Burmese and Cantonese * `audio_event` - Non-speech sounds like laughter or applause ## Models State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Explore all](/docs/models)
## Concurrency and priority Concurrency is the concept of how many requests can be processed at the same time. For Speech to Text, files that are over 8 minutes long are transcribed in parallel internally in order to speed up processing. The audio is chunked into four segments to be transcribed concurrently. You can calculate the concurrency limit with the following calculation: $$ Concurrency = \min(4, \text{round\_up}(\frac{\text{audio\_duration\_secs}}{480})) $$ For example, a 15 minute audio file will be transcribed with a concurrency of 2, while a 120 minute audio file will be transcribed with a concurrency of 4. ## Supported languages The Scribe v1 model supports 99 languages, including: *Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (zho), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).* ### Breakdown of language support Word Error Rate (WER) is a key metric used to evaluate the accuracy of transcription systems. It measures how many errors are present in a transcript compared to a reference transcript. Below is a breakdown of the WER for each language that Scribe v1 supports. Bulgarian (bul), Catalan (cat), Czech (ces), Danish (dan), Dutch (nld), English (eng), Finnish (fin), French (fra), Galician (glg), German (deu), Greek (ell), Hindi (hin), Indonesian (ind), Italian (ita), Japanese (jpn), Kannada (kan), Malay (msa), Malayalam (mal), Macedonian (mkd), Norwegian (nor), Polish (pol), Portuguese (por), Romanian (ron), Russian (rus), Serbian (srp), Slovak (slk), Spanish (spa), Swedish (swe), Turkish (tur), Ukrainian (ukr) and Vietnamese (vie). Bengali (ben), Belarusian (bel), Bosnian (bos), Cantonese (yue), Estonian (est), Filipino (fil), Gujarati (guj), Hungarian (hun), Kazakh (kaz), Latvian (lav), Lithuanian (lit), Mandarin (cmn), Marathi (mar), Nepali (nep), Odia (ori), Persian (fas), Slovenian (slv), Tamil (tam) and Telugu (tel) Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Burmese (mya), Cebuano (ceb), Croatian (hrv), Georgian (kat), Hausa (hau), Hebrew (heb), Icelandic (isl), Javanese (jav), Kabuverdianu (kea), Korean (kor), Kyrgyz (kir), Lingala (lin), Maltese (mlt), Mongolian (mon), Māori (mri), Occitan (oci), Punjabi (pan), Sindhi (snd), Swahili (swa), Tajik (tgk), Thai (tha), Urdu (urd), Uzbek (uzb) and Welsh (cym). Amharic (amh), Chichewa (nya), Fulah (ful), Ganda (lug), Igbo (ibo), Irish (gle), Khmer (khm), Kurdish (kur), Lao (lao), Luxembourgish (ltz), Luo (luo), Northern Sotho (nso), Pashto (pus), Shona (sna), Somali (som), Umbundu (umb), Wolof (wol), Xhosa (xho) and Zulu (zul). ## FAQ Yes, the API supports uploading both audio and video files for transcription. Files up to 3 GB in size and up to 10 hours in duration are supported. The audio supported audio formats include: * audio/aac * audio/x-aac * audio/x-aiff * audio/ogg * audio/mpeg * audio/mp3 * audio/mpeg3 * audio/x-mpeg-3 * audio/opus * audio/wav * audio/x-wav * audio/webm * audio/flac * audio/x-flac * audio/mp4 * audio/aiff * audio/x-m4a Supported video formats include: * video/mp4 * video/x-msvideo * video/x-matroska * video/quicktime * video/x-ms-wmv * video/x-flv * video/webm * video/mpeg * video/3gpp ElevenLabs is constantly expanding the number of languages supported by our models. Please check back frequently for updates. Yes, asynchronous transcription results can be sent to webhooks configured in webhook settings in the UI. Learn more in the [webhooks cookbook](/docs/cookbooks/speech-to-text/webhooks). Yes, the multichannel STT feature allows you to transcribe audio where each channel is processed independently and assigned a speaker ID based on its channel number. This feature supports up to 5 channels. Learn more in the [multichannel transcription cookbook](/docs/cookbooks/speech-to-text/multichannel-transcription). # Text to Dialogue > Learn how to create immersive, natural-sounding dialogue with ElevenLabs. ## Overview The ElevenLabs [Text to Dialogue](/docs/api-reference/text-to-dialogue) API creates natural sounding expressive dialogue from text using the Eleven v3 model. Popular use cases include: * Generating pitch perfect conversations for video games * Creating immersive dialogue for podcasts and other audio content * Bring audiobooks to life with expressive narration Text to Dialogue is not intended for use in real-time applications like conversational agents. Several generations might be required to achieve the desired results. When integrating Text to Dialogue into your application, consider generating several generations and allowing the user to select the best one. Listen to a sample: Learn how to integrate text to dialogue into your application. Learn how to use the Eleven v3 model to generate expressive dialogue. ## Voice options ElevenLabs offers thousands of voices across 70+ languages through multiple creation methods: * [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices * [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas * [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication * [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions Learn more about our [voice options](/docs/capabilities/voices). ## Prompting The models interpret emotional context directly from the text input. For example, adding descriptive text like "she said excitedly" or using exclamation marks will influence the speech emotion. Voice settings like Stability and Similarity help control the consistency, while the underlying emotion comes from textual cues. Read the [prompting guide](/docs/best-practices/prompting) for more details. ### Emotional deliveries with audio tags This feature is still under active development, actual results may vary. The Eleven v3 model allows the use of non-speech audio events to influence the delivery of the dialogue. This is done by inserting the audio events into the text input wrapped in square brackets. Audio tags come in a few different forms: ### Emotions and delivery For example, \[sad], \[laughing] and \[whispering] ### Audio events For example, \[leaves rustling], \[gentle footsteps] and \[applause]. ### Overall direction For example, \[football], \[wrestling match] and \[auctioneer]. Some examples include: ``` "[giggling] That's really funny!" "[groaning] That was awful." "Well, [sigh] I'm not sure what to say." ``` You can also use punctuation to indicate the flow of dialog, like interruptions: ``` "[cautiously] Hello, is this seat-" "[jumping in] Free? [cheerfully] Yes it is." ``` Ellipses can be used to indicate trailing sentences: ``` "[indecisive] Hi, can I get uhhh..." "[quizzically] The usual?" "[elated] Yes! [laughs] I'm so glad you knew!" ``` ## Supported formats The default response format is "mp3", but other formats like "PCM", & "μ-law" are available. * **MP3** * Sample rates: 22.05kHz - 44.1kHz * Bitrates: 32kbps - 192kbps * 22.05kHz @ 32kbps * 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps * **PCM (S16LE)** * Sample rates: 16kHz - 44.1kHz * Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz * 16-bit depth * **μ-law** * 8kHz sample rate * Optimized for telephony applications * **A-law** * 8kHz sample rate * Optimized for telephony applications * **Opus** * Sample rate: 48kHz * Bitrates: 32kbps - 192kbps Higher quality audio options are only available on paid tiers - see our [pricing page](https://elevenlabs.io/pricing/api) for details. ## Supported languages The Eleven v3 model supports 70+ languages, including: *Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).* ## FAQ Text to Dialogue is only available on the Eleven v3 model. Yes. You retain ownership of any audio you generate. However, commercial usage rights are only available with paid plans. With a paid subscription, you may use generated audio for commercial purposes and monetize the outputs if you own the IP rights to the input content. A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions: * Only available within the ElevenLabs dashboard. * You can regenerate each piece of content up to 2 times for free. * The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation. Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data. There is no limit to the number of speakers in a dialogue. The models are nondeterministic. For consistency, use the optional [seed parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle differences may still occur. Split long text into segments and use streaming for real-time playback and efficient processing. # Voice changer > Learn how to transform audio between voices while preserving emotion and delivery. ## Overview ElevenLabs [voice changer](/docs/api-reference/speech-to-speech/convert) API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to: * Change any voice while preserving emotional delivery and nuance * Create consistent character voices across multiple languages and recording sessions * Fix or replace specific words and phrases in existing recordings Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project. Learn how to integrate voice changer into your application. Step-by-step guide for using voice changer in ElevenLabs. ## Supported languages Our multilingual v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* The `eleven_english_sts_v2` model only supports English. ## Best practices ### Audio quality * Record in a quiet environment to minimize background noise * Maintain appropriate microphone levels - avoid too quiet or peaked audio * Use `remove_background_noise=true` if environmental sounds are present ### Recording guidelines * Keep segments under 5 minutes for optimal processing * Feel free to include natural expressions (laughs, sighs, emotions) * The source audio's accent and language will be preserved in the output ### Parameters * **Style**: Set to 0% when input audio is already expressive * **Stability**: Use 100% for maximum voice consistency * **Language**: Choose source audio that matches your desired accent and language ## FAQ Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability and consistent output. Absolutely. Provide your custom voice’s voice\_id and specify the correct{' '} model\_id. You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no additional fee based on file size. Possibly. Use remove\_background\_noise=true or the Voice Isolator tool to minimize environmental sounds in the final output. Though eleven\_english\_sts\_v2 is available, our{' '} eleven\_multilingual\_sts\_v2 model often outperforms it, even for English material. “Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances in the source audio, turn style down and stability up. # Voice isolator > Learn how to isolate speech from background noise, music, and ambient sounds from any audio. ## Overview ElevenLabs [voice isolator](/docs/api-reference/audio-isolation/audio-isolation) API transforms audio recordings with background noise into clean, studio-quality speech. This is particularly useful for audio recorded in noisy environments, or recordings containing unwanted ambient sounds, music, or other background interference. Listen to a sample: ## Usage The voice isolator model extracts speech from background noise in both audio and video files. Learn how to integrate voice isolator into your application. Step-by-step guide for using voice isolator in ElevenLabs. ### Supported file types * **Audio**: AAC, AIFF, OGG, MP3, OPUS, WAV, FLAC, M4A * **Video**: MP4, AVI, MKV, MOV, WMV, FLV, WEBM, MPEG, 3GPP ## FAQ * **Cost**: Voice isolator costs 1000 characters for every minute of audio. * **File size and length**: Supports files up to 500MB and 1 hour in length. * **Music vocals**: Not specifically optimized for isolating vocals from music, but may work depending on the content. # Dubbing > Learn how to translate audio and video while preserving the emotion, timing & tone of speakers. ## Overview ElevenLabs [dubbing](/docs/api-reference/dubbing/create) API translates audio and video across 32 languages while preserving the emotion, timing, tone and unique characteristics of each speaker. Our model separates each speaker’s dialogue from the soundtrack, allowing you to recreate the original delivery in another language. It can be used to: * Grow your addressable audience by 4x to reach international audiences * Adapt existing material for new markets while preserving emotional nuance * Offer content in multiple languages without re-recording voice talent We also offer a [fully managed dubbing service](https://elevenlabs.io/elevenstudios) for video and podcast creators. ## Usage ElevenLabs dubbing can be used in three ways: * **Dubbing Studio** in the user interface for fast, interactive control and editing * **Programmatic integration** via our [API](/docs/api-reference/dubbing/create) for large-scale or automated workflows * **Human-verified dubs via ElevenLabs Productions** - for more information, please reach out to [productions@elevenlabs.io](mailto:productions@elevenlabs.io) The UI supports files up to **500MB** and **45 minutes**. The API supports files up to **1GB** and **2.5 hours**. Learn how to integrate dubbing into your application. Edit transcripts and translate videos step by step in Dubbing Studio. ### Key features **Speaker separation** Automatically detect multiple speakers, even with overlapping speech. **Multi-language output** Generate localized tracks in 32 languages. **Preserve original voices** Retain the speaker’s identity and emotional tone. **Keep background audio** Avoid re-mixing music, effects, or ambient sounds. **Customizable transcripts** Manually edit translations and transcripts as needed. **Supported file types** Videos and audio can be dubbed from various sources, including YouTube, X, TikTok, Vimeo, direct URLs, or file uploads. **Video transcript and translation editing** Our AI video translator lets you manually edit transcripts and translations to ensure your content is properly synced and localized. Adjust the voice settings to tune delivery, and regenerate speech segments until the output sounds just right. A Creator plan or higher is required to dub audio files. For videos, a watermark option is available to reduce credit usage. ### Cost To reduce credit usage, you can: * Dub only a selected portion of your file * Use watermarks on video output (not available for audio) * Fine-tune transcripts and regenerate individual segments instead of the entire clip Refer to our [pricing page](https://elevenlabs.io/pricing) for detailed credit costs. ## List of supported languages for dubbing | No | Language Name | Language Code | | -- | ------------- | ------------- | | 1 | English | en | | 2 | Hindi | hi | | 3 | Portuguese | pt | | 4 | Chinese | zh | | 5 | Spanish | es | | 6 | French | fr | | 7 | German | de | | 8 | Japanese | ja | | 9 | Arabic | ar | | 10 | Russian | ru | | 11 | Korean | ko | | 12 | Indonesian | id | | 13 | Italian | it | | 14 | Dutch | nl | | 15 | Turkish | tr | | 16 | Polish | pl | | 17 | Swedish | sv | | 18 | Filipino | fil | | 19 | Malay | ms | | 20 | Romanian | ro | | 21 | Ukrainian | uk | | 22 | Greek | el | | 23 | Czech | cs | | 24 | Danish | da | | 25 | Finnish | fi | | 26 | Bulgarian | bg | | 27 | Croatian | hr | | 28 | Slovak | sk | | 29 | Tamil | ta | ## FAQ Dubbing can be performed on all types of short and long form video and audio content. We recommend dubbing content with a maximum of 9 unique speakers at a time to ensure a high-quality dub. Yes. Our models analyze each speaker’s original delivery to recreate the same tone, pace, and style in your target language. We use advanced source separation to isolate individual voices from ambient sound. Multiple overlapping speakers can be split into separate tracks. Via the user interface, the maximum file size is 500MB up to 45 minutes. Through the API, you can process files up to 1GB and 2.5 hours. You can choose to dub only certain portions of your video/audio or tweak translations/voices in our interactive Dubbing Studio. # Sound effects > Learn how to create high-quality sound effects from text with ElevenLabs. ## Overview ElevenLabs [sound effects](/docs/api-reference/text-to-sound-effects/convert) API turns text descriptions into high-quality audio effects with precise control over timing, style and complexity. The model understands both natural language and audio terminology, enabling you to: * Generate cinematic sound design for films & trailers * Create custom sound effects for games & interactive media * Produce Foley and ambient sounds for video content Listen to an example: ## Usage Sound effects are generated using text descriptions & two optional parameters: * **Duration**: Set a specific length for the generated audio (in seconds) * Default: Automatically determined based on the prompt * Range: 0.1 to 30 seconds * Cost: 40 credits per second when duration is specified * **Looping**: Enable seamless looping for sound effects longer than 30 seconds * Creates sound effects that can be played on repeat without perceptible start/end points * Perfect for atmospheric sounds, ambient textures, and background elements * Example: Generate 30s of 'soft rain' then loop it endlessly for atmosphere in audiobooks, films, games * **Prompt influence**: Control how strictly the model follows the prompt * High: More literal interpretation of the prompt * Low: More creative interpretation with added variations Learn how to integrate sound effects into your application. Step-by-step guide for using sound effects in ElevenLabs. ### Prompting guide #### Simple effects For basic sound effects, use clear, concise descriptions: * "Glass shattering on concrete" * "Heavy wooden door creaking open" * "Thunder rumbling in the distance" #### Complex sequences For multi-part sound effects, describe the sequence of events: * "Footsteps on gravel, then a metallic door opens" * "Wind whistling through trees, followed by leaves rustling" * "Sword being drawn, then clashing with another blade" #### Musical elements The API also supports generation of musical components: * "90s hip-hop drum loop, 90 BPM" * "Vintage brass stabs in F minor" * "Atmospheric synth pad with subtle modulation" #### Audio Terminology Common terms that can enhance your prompts: * **Impact**: Collision or contact sounds between objects, from subtle taps to dramatic crashes * **Whoosh**: Movement through air effects, ranging from fast and ghostly to slow-spinning or rhythmic * **Ambience**: Background environmental sounds that establish atmosphere and space * **One-shot**: Single, non-repeating sound * **Loop**: Repeating audio segment * **Stem**: Isolated audio component * **Braam**: Big, brassy cinematic hit that signals epic or dramatic moments, common in trailers * **Glitch**: Sounds of malfunction, jittering, or erratic movement, useful for transitions and sci-fi * **Drone**: Continuous, textured sound that creates atmosphere and suspense ## FAQ The maximum duration is 30 seconds per generation. For longer sequences, you can either generate multiple effects and combine them, or use the looping feature to create seamless repeating sound effects. Yes, you can generate musical elements like drum loops, bass lines, and melodic samples. However, for full music production, consider combining multiple generated elements. Use detailed prompts, appropriate duration settings, and high prompt influence for more predictable results. For complex sounds, generate components separately and combine them. Generated audio is provided in MP3 format with professional-grade quality. For WAV downloads of non-looping sound effects, audio is delivered at 48kHz sample rate - the industry standard for film, TV, video, and game audio, ensuring no resampling is needed for professional workflows. Looping sound effects are designed to play seamlessly on repeat without noticeable start or end points. This is perfect for creating continuous atmospheric sounds, ambient textures, or background elements that need to play indefinitely. For example, you can generate 30 seconds of rain sounds and loop them endlessly for background atmosphere in audiobooks, films, or games. # Voices > Learn how to create, customize, and manage voices with ElevenLabs. ## Overview ElevenLabs provides models for voice creation & customization. The platform supports a wide range of voice options, including voices from our extensive [voice library](https://elevenlabs.io/app/voice-library), voice cloning, and artificially designed voices using text prompts. ### Voice types * **Community**: Voices shared by the community from the ElevenLabs [voice library](/docs/product-guides/voices/voice-library). * **Cloned**: Custom voices created using instant or professional [voice cloning](/docs/product-guides/voices/voice-cloning). * **Voice design**: Artificially designed voices created with the [voice design](/docs/product-guides/voices/voice-design) tool. * **Default**: Pre-designed, high-quality voices optimized for general use. Voices that you personally own, either created with Instant Voice Cloning, Professional Voice Cloning, or Voice Design, can be used for [Voice Remixing](/docs/capabilities/voice-remixing). #### Community The [voice library](/docs/product-guides/voices/voice-library) contains over 5,000 voices shared by the ElevenLabs community. Use it to: * Discover unique voices shared by the ElevenLabs community. * Add voices to your personal collection. * Share your own voice clones for cash rewards when others use it. Share your voice with the community, set your terms, and earn cash rewards when others use it. We've paid out over **\$1M** already. Learn how to use voices from the voice library #### Cloned Clone your own voice from 30-second samples with Instant Voice Cloning, or create hyper-realistic voices using Professional Voice Cloning. * **Instant Voice Cloning**: Quickly replicate a voice from short audio samples. * **Professional Voice Cloning**: Generate professional-grade voice clones with extended training audio. Voice-captcha technology is used to verify that **all** voice clones are created from your own voice samples. A Creator plan or higher is required to create voice clones. Clone a voice instantly Create a perfect voice clone Learn how to create instant & professional voice clones #### Voice design With [Voice Design](/docs/product-guides/voices/voice-design), you can create entirely new voices by specifying attributes like age, gender, accent, and tone. Generated voices are ideal for: * Realistic voices with nuanced characteristics. * Creative character voices for games and storytelling. The voice design tool creates 3 voice previews, simply provide: * A **voice description** between 20 and 1000 characters. * A **text** to preview the voice between 100 and 1000 characters. ##### Voice design with Eleven v3 (alpha) Using the new [Eleven v3 model](/docs/models#eleven-v3-alpha), voices that are capable of a wide range of emotion can be designed via a prompt. Using v3 gets you the following benefits: * More natural and versatile voice generation. * Better control over voice characteristics. * Audio tags supported in Preview generations. * Backward compatibility with v2 models. Integrate voice design into your application. Learn how to craft voices from a single prompt. #### Default Our curated set of default voices is optimized for core use cases. These voices are: * **Reliable**: Available long-term. * **Consistent**: Carefully crafted and quality-checked for performance. * **Model-ready**: Fine-tuned on new models upon release. Default voices are available to all users via the **my voices** tab in the [voice lab dashboard](https://elevenlabs.io/app/voice-lab). Default voices were previously referred to as `premade` voices. The latter term is still used when accessing default voices via the API. ### Managing voices All voices can be managed through **My Voices**, where you can: * Search, filter, and categorize voices * Add descriptions and custom tags * Organize voices for quick access Learn how to manage your voice collection in [My Voices documentation](/docs/product-guides/voices/voice-library). * **Search and Filter**: Find voices using keywords or tags. * **Preview Samples**: Listen to voice demos before adding them to **My Voices**. * **Add to Collection**: Save voices for easy access in your projects. > **Tip**: Try searching by specific accents or genres, such as "Australian narration" or "child-like character." ### Supported languages All ElevenLabs voices support multiple languages. Experiment by converting phrases like `Hello! こんにちは! Bonjour!` into speech to hear how your own voice sounds across different languages. ElevenLabs supports voice creation in 32 languages. Match your voice selection to your target region for the most natural results. * **Default Voices**: Optimized for multilingual use. * **Generated and Cloned Voices**: Accent fidelity depends on input samples or selected attributes. Our multilingual v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* Flash v2.5 supports 32 languages - all languages from v2 models plus: *Hungarian, Norwegian & Vietnamese* [Learn more about our models](/docs/models) ## FAQ Yes, you can create custom voices with Voice Design or clone voices using Instant or Professional Voice Cloning. Both options are accessible in **My Voices**. Instant Voice Cloning uses short audio samples for near-instantaneous voice creation. Professional Voice Cloning requires longer samples but delivers hyper-realistic, high-quality results. Professional Voice Clones can be shared privately or publicly in the Voice Library. Generated voices and Instant Voice Clones cannot currently be shared. Use **My Voices** to search, filter, and organize your voice collection. You can also delete, tag, and categorize voices for easier management. Use clean and consistent audio samples. For Professional Voice Cloning, provide a variety of recordings in the desired speaking style. Yes, Professional Voice Clones can be shared in the Voice Library. Instant Voice Clones and Generated Voices cannot currently be shared. Generated Voices are ideal for unique characters in games, animations, and creative storytelling. Go to **Voices > Voice Library** in your dashboard or access it via API. # Voice remixing > Learn how to transform and enhance existing voices by modifying their attributes including gender, accent, style, pacing, audio quality, and more. Voice remixing is currently in alpha. ## Overview ElevenLabs voice remixing is available on the core platform and via API. This feature transforms existing voices by allowing you to modify their core attributes while maintaining the unique characteristics that make them recognizable. This is particularly useful for adapting voices to different contexts, creating variations for different characters, or improving and/or changing the audio quality of existing voice profiles. As an example, here is an original voice: And here is a remixed version, switching to a San Francisco accent: ## Usage The voice remixing model allows you to iteratively transform voices you own by adjusting multiple attributes through natural language prompts and customizable settings. Integrate voice remixing into your application. {/* Learn how to craft voices from a single prompt. */} ### Key Features * **Attribute Modification**: Change gender, accent, speaking style, pacing, and audio quality of any voice you own * **Iterative Editing**: Continue refining voices based on previously remixed versions * **Script Flexibility**: Use default scripts or input custom scripts with v3 model audio tags like `[laughs]` or `[whispers]` * **Prompt Strength Control**: Adjust remix intensity from low to high for precise control over transformations ### Remixing parameters #### Prompt Strength Voice remixing offers varying degrees of prompt strength to control how much your voice transforms: * **Low**: Subtle changes that maintain most of the original voice characteristics * **Medium**: Balanced transformation that modifies key attributes while preserving voice identity * **High**: Strong adherence to remix prompt, may significantly change the tonality of the original voice * **Max**: A full transformation of the voice, but at the cost of changing the voice entirely #### Script Options * **Default Scripts**: Pre-configured scripts optimized for voice remixing * **Custom Scripts**: Input your own text with support for v3 model audio tags such as: * `[laughs]` - Add laughter * `[whispers]` - Convert to whispered speech * `[sighs]` - Add sighing * Additional emotion and style tags supported which can help craft the voice ### Tips and Tricks #### Getting Started Start with a high prompt strength early in your experimentation to understand the full range of transformation possibilities. You’ll need to have a voice to start with, if you haven’t already created a voice, experiment with default voices available in your library to understand how different base voices respond to remixing. You can create custom voices using [Voice Design](/docs/product-guides/voices/voice-design) as starting points for unique remixes. #### Advanced Techniques * **Iterative refinement**: Sometimes multiple iterations are needed to achieve the desired voice quality. Each remix can serve as the base for the next transformation * **Combine attributes gradually**: When making multiple changes (e.g., accent and pacing), consider applying them in separate iterations for more control * **Test with varied content**: Different scripts may highlight different aspects of your remixed voice ### Supported Voice Formats #### Input * Any cloned voice that you personally own (Instant Voice Clone or Professional Voice Clone) * Voices created through our Voice Design product #### Output * Full-quality voice model in v3 (but backwards compatibility to all other models) * Iteratively editable voice that can be further remixed ## FAQ Voice remixing costs are calculated based on the length of the test script used during the remixing process. No, voice remixing is only available for voices in your personal library that you have ownership or appropriate permissions for. There is no limit to iterative remixing. You can continue refining a voice through multiple generations of remixes. No, remixing creates a new voice variant. Your original voice remains unchanged and available in your library. Voice Design creates new voices from scratch using text prompts, while Voice Remixing modifies existing voices you already own. # Forced Alignment > Learn how to turn spoken audio and text into a time-aligned transcript with ElevenLabs. ## Overview The ElevenLabs [Forced Alignment](/docs/api-reference/forced-alignment) API turns spoken audio and text into a time-aligned transcript. This is useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. This can be used for: * Matching subtitles to a video recording * Generating timings for an audiobook recording of an ebook ## Usage The Forced Alignment API can be used by interfacing with the ElevenLabs API directly. Learn how to integrate Forced Alignment into your application. ## Supported languages Our multilingual v2 models support 29 languages: *English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.* ## FAQ Forced alignment is a technique used to align spoken audio with text. You provide an audio file and a transcript of the audio file and the API will return a time-aligned transcript. It's useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. The input text should be a string with no special formatting i.e. JSON. Example of good input text: ``` "Hello, how are you?" ``` Example of bad input text: ``` { "text": "Hello, how are you?" } ``` Forced Alignment costs the same as the [Speech to Text](/docs/capabilities/speech-to-text#pricing) API. Forced Alignment does not support diarization. If you provide diarized text, the API will likely return unwanted results. The maximum file size for Forced Alignment is 3GB. For audio files, the maximum duration is 10 hours. For the text input, the maximum length is 675k characters. # Eleven Music > Learn how to create studio-grade music with natural language prompts in any style with ElevenLabs. ## Overview Eleven Music is a Text to Music model that generates studio-grade music with natural language prompts in any style. It's designed to understand intent and generate complete, context-aware audio based on your goals. The model understands both natural language and musical terminology, providing you with state-of-the-art features: * Complete control over genre, style, and structure * Vocals or just instrumental * Multilingual, including English, Spanish, German, Japanese and more * Edit the sound and lyrics of individual sections or the whole song Listen to a sample: Created in collaboration with labels, publishers, and artists, Eleven Music is cleared for nearly all commercial uses, from film and television to podcasts and social media videos, and from advertisements to gaming. For more information on supported usage across our different plans, [see our music terms](http://elevenlabs.io/music-terms). ## Usage Eleven Music is available today on the ElevenLabs website, with public API access and integration into our Agents Platform coming soon. Created in collaboration with labels, publishers, and artists, Eleven Music is cleared for nearly all commercial uses, from film and television to podcasts and social media videos, and from advertisements to gaming. For more information on supported usage across our different plans, head here. Eleven Music is available today on our website, with public API access and integration into our Agents Platform coming soon. Check out our prompt engineering guide to help you master the full range of the model’s capabilities. Learn how to use Eleven Music with natural language prompts. Step-by-step guide for using Eleven Music on the ElevenLabs Creative Platform. Step-by-step guide for using Eleven Music with the API. ## FAQ Generated music has a minimum duration of 10 seconds and a maximum duration of 5 minutes. Yes, refer to the [developer quickstart](/docs/cookbooks/music) for more information. Yes, Eleven Music is cleared for nearly all commercial uses, from film and television to podcasts and social media videos, and from advertisements to gaming. For more information on supported usage across our different plans, [see our music terms](http://elevenlabs.io/music-terms). Generated audio is provided in MP3 format with professional-grade quality (44.1kHz, 128-192kbps). Other audio formats will be supported soon. # Streaming text to speech > Learn how to stream text into speech in Python or Node.js. In this tutorial, you'll learn how to convert [text to speech](https://elevenlabs.io/text-to-speech) with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application. If you want to jump straight to an example you can find them in the [Python](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) and [Node.js](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) example repositories. ## Requirements * An ElevenLabs account with an API key (here’s how to [find your API key](/docs/developer-guides/quickstart#authentication)). * Python or Node installed on your machine * (Optionally) an AWS account with access to S3. ## Setup ### Installing our SDK Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip: ```bash Python pip install elevenlabs ``` ```bash TypeScript npm install @elevenlabs/elevenlabs-js ``` Additionally, install necessary packages to manage your environmental variables: ```bash Python pip install python-dotenv ``` ```bash TypeScript npm install dotenv npm install @types/dotenv --save-dev ``` Next, create a `.env` file in your project directory and fill it with your credentials like so: ```bash .env ELEVENLABS_API_KEY=your_elevenlabs_api_key_here ``` ## Convert text to speech (file) To convert text to speech and save it as a file, we’ll use the `convert` method of the ElevenLabs SDK and then it locally as a `.mp3` file. ```python Python import os import uuid from dotenv import load_dotenv from elevenlabs import VoiceSettings from elevenlabs.client import ElevenLabs load_dotenv() ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY") elevenlabs = ElevenLabs( api_key=ELEVENLABS_API_KEY, ) def text_to_speech_file(text: str) -> str: # Calling the text_to_speech conversion API with detailed parameters response = elevenlabs.text_to_speech.convert( voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice output_format="mp3_22050_32", text=text, model_id="eleven_turbo_v2_5", # use the turbo model for low latency # Optional voice settings that allow you to customize the output voice_settings=VoiceSettings( stability=0.0, similarity_boost=1.0, style=0.0, use_speaker_boost=True, speed=1.0, ), ) # uncomment the line below to play the audio back # play(response) # Generating a unique file name for the output MP3 file save_file_path = f"{uuid.uuid4()}.mp3" # Writing the audio to a file with open(save_file_path, "wb") as f: for chunk in response: if chunk: f.write(chunk) print(f"{save_file_path}: A new audio file was saved successfully!") # Return the path of the saved audio file return save_file_path ``` ```typescript TypeScript import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js'; import * as dotenv from 'dotenv'; import { createWriteStream } from 'fs'; import { v4 as uuid } from 'uuid'; dotenv.config(); const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY; const elevenlabs = new ElevenLabsClient({ apiKey: ELEVENLABS_API_KEY, }); export const createAudioFileFromText = async (text: string): Promise => { return new Promise(async (resolve, reject) => { try { const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', { modelId: 'eleven_multilingual_v2', text, outputFormat: 'mp3_44100_128', // Optional voice settings that allow you to customize the output voiceSettings: { stability: 0, similarityBoost: 0, useSpeakerBoost: true, speed: 1.0, }, }); const fileName = `${uuid()}.mp3`; const fileStream = createWriteStream(fileName); audio.pipe(fileStream); fileStream.on('finish', () => resolve(fileName)); // Resolve with the fileName fileStream.on('error', reject); } catch (error) { reject(error); } }); }; ``` You can then run this function with: ```python Python text_to_speech_file("Hello World") ``` ```typescript TypeScript await createAudioFileFromText('Hello World'); ``` ## Convert text to speech (streaming) If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature. ```python Python import os from typing import IO from io import BytesIO from dotenv import load_dotenv from elevenlabs import VoiceSettings from elevenlabs.client import ElevenLabs load_dotenv() ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY") elevenlabs = ElevenLabs( api_key=ELEVENLABS_API_KEY, ) def text_to_speech_stream(text: str) -> IO[bytes]: # Perform the text-to-speech conversion response = elevenlabs.text_to_speech.stream( voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice output_format="mp3_22050_32", text=text, model_id="eleven_multilingual_v2", # Optional voice settings that allow you to customize the output voice_settings=VoiceSettings( stability=0.0, similarity_boost=1.0, style=0.0, use_speaker_boost=True, speed=1.0, ), ) # Create a BytesIO object to hold the audio data in memory audio_stream = BytesIO() # Write each chunk of audio data to the stream for chunk in response: if chunk: audio_stream.write(chunk) # Reset stream position to the beginning audio_stream.seek(0) # Return the stream for further use return audio_stream ``` ```typescript TypeScript import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js'; import * as dotenv from 'dotenv'; dotenv.config(); const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY; if (!ELEVENLABS_API_KEY) { throw new Error('Missing ELEVENLABS_API_KEY in environment variables'); } const elevenlabs = new ElevenLabsClient({ apiKey: ELEVENLABS_API_KEY, }); export const createAudioStreamFromText = async (text: string): Promise => { const audioStream = await elevenlabs.textToSpeech.stream('JBFqnCBsd6RMkjVDRZzb', { modelId: 'eleven_multilingual_v2', text, outputFormat: 'mp3_44100_128', // Optional voice settings that allow you to customize the output voiceSettings: { stability: 0, similarityBoost: 1.0, useSpeakerBoost: true, speed: 1.0, }, }); const chunks: Buffer[] = []; for await (const chunk of audioStream) { chunks.push(chunk); } const content = Buffer.concat(chunks); return content; }; ``` You can then run this function with: ```python Python text_to_speech_stream("This is James") ``` ```typescript TypeScript await createAudioStreamFromText('This is James'); ``` ## Bonus - Uploading to AWS S3 and getting a secure sharing link Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link. To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your `.env` file. Follow these steps to find the credentials: 1. Log in to your AWS Management Console: Navigate to the AWS home page and sign in with your account. 2. Access the IAM (Identity and Access Management) Dashboard: You can find IAM under "Security, Identity, & Compliance" on the services menu. The IAM dashboard manages access to your AWS services securely. 3. Create a New User (if necessary): On the IAM dashboard, select "Users" and then "Add user". Enter a user name. 4. Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it's best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket. 5. Review and create the user: Review your settings and create the user. Upon creation, you'll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step. 6. Get AWS region name: ex. us-east-1 If you do not have an AWS S3 bucket, you will need to create a new one by following these steps: 1. Access the S3 dashboard: You can find S3 under "Storage" on the services menu. 2. Create a new bucket: On the S3 dashboard, click the "Create bucket" button. 3. Enter a bucket name and click on the "Create bucket" button. You can leave the other bucket options as default. The newly added bucket will appear in the list. Install `boto3` for interacting with AWS services using `pip` and `npm`. ```bash Python pip install boto3 ``` ```bash TypeScript npm install @aws-sdk/client-s3 npm install @aws-sdk/s3-request-presigner ``` Then add the environment variables to `.env` file like so: ``` AWS_ACCESS_KEY_ID=your_aws_access_key_id_here AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here AWS_REGION_NAME=your_aws_region_name_here AWS_S3_BUCKET_NAME=your_s3_bucket_name_here ``` Add the following functions to upload the audio stream to S3 and generate a signed URL. ```python s3_uploader.py (Python) import os import boto3 import uuid AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") AWS_REGION_NAME = os.getenv("AWS_REGION_NAME") AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME") session = boto3.Session( aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, region_name=AWS_REGION_NAME, ) s3 = session.client("s3") def generate_presigned_url(s3_file_name: str) -> str: signed_url = s3.generate_presigned_url( "get_object", Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name}, ExpiresIn=3600, ) # URL expires in 1 hour return signed_url def upload_audiostream_to_s3(audio_stream) -> str: s3_file_name = f"{uuid.uuid4()}.mp3" # Generates a unique file name using UUID s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name) return s3_file_name ``` ```typescript s3_uploader.ts (TypeScript) import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3'; import { getSignedUrl } from '@aws-sdk/s3-request-presigner'; import * as dotenv from 'dotenv'; import { v4 as uuid } from 'uuid'; dotenv.config(); const { AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME, AWS_S3_BUCKET_NAME } = process.env; if (!AWS_ACCESS_KEY_ID || !AWS_SECRET_ACCESS_KEY || !AWS_REGION_NAME || !AWS_S3_BUCKET_NAME) { throw new Error('One or more environment variables are not set. Please check your .env file.'); } const s3 = new S3Client({ credentials: { accessKeyId: AWS_ACCESS_KEY_ID, secretAccessKey: AWS_SECRET_ACCESS_KEY, }, region: AWS_REGION_NAME, }); export const generatePresignedUrl = async (objectKey: string) => { const getObjectParams = { Bucket: AWS_S3_BUCKET_NAME, Key: objectKey, Expires: 3600, }; const command = new GetObjectCommand(getObjectParams); const url = await getSignedUrl(s3, command, { expiresIn: 3600 }); return url; }; export const uploadAudioStreamToS3 = async (audioStream: Buffer) => { const remotePath = `${uuid()}.mp3`; await s3.send( new PutObjectCommand({ Bucket: AWS_S3_BUCKET_NAME, Key: remotePath, Body: audioStream, ContentType: 'audio/mpeg', }) ); return remotePath; }; ``` You can then call uploading function with the audio stream from the text. ```python Python s3_file_name = upload_audiostream_to_s3(audio_stream) ``` ```typescript TypeScript const s3path = await uploadAudioStreamToS3(stream); ``` After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing. You can now generate a URL from a file with: ```python Python signed_url = generate_presigned_url(s3_file_name) print(f"Signed URL to access the file: {signed_url}") ``` ```typescript TypeScript const presignedUrl = await generatePresignedUrl(s3path); console.log('Presigned URL:', presignedUrl); ``` If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire. To put it all together, you can use the following script: ```python main.py (Python) import os from dotenv import load_dotenv load_dotenv() from text_to_speech_stream import text_to_speech_stream from s3_uploader import upload_audiostream_to_s3, generate_presigned_url def main(): text = "This is James" audio_stream = text_to_speech_stream(text) s3_file_name = upload_audiostream_to_s3(audio_stream) signed_url = generate_presigned_url(s3_file_name) print(f"Signed URL to access the file: {signed_url}") if __name__ == "__main__": main() ``` ```typescript index.ts (Typescript) import 'dotenv/config'; import { generatePresignedUrl, uploadAudioStreamToS3 } from './s3_uploader'; import { createAudioFileFromText } from './text_to_speech_file'; import { createAudioStreamFromText } from './text_to_speech_stream'; (async () => { // save the audio file to disk const fileName = await createAudioFileFromText( 'Today, the sky is exceptionally clear, and the sun shines brightly.' ); console.log('File name:', fileName); // OR stream the audio, upload to S3, and get a presigned URL const stream = await createAudioStreamFromText( 'Today, the sky is exceptionally clear, and the sun shines brightly.' ); const s3path = await uploadAudioStreamToS3(stream); const presignedUrl = await generatePresignedUrl(s3path); console.log('Presigned URL:', presignedUrl); })(); ``` ## Conclusion You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically. Here are some examples of what you could build with this. 1. **Educational Podcasts**: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting. 2. **Accessibility Features for Websites**: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning. 3. **Automated Customer Support Messages**: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails. 4. **Audio Books and Narration**: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading. 5. **Language Learning Tools**: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way. For more details, visit the following to see the full project files which give a clear structure for setting up your application: For Python: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) For TypeScript: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues). # Stitching multiple requests > Learn how to maintain voice prosody over multiple chunks/generations. When converting a large body of text into audio, you may encounter abrupt changes in prosody from one chunk to another. This can be particularly noticeable when converting text that spans multiple paragraphs or sections. In order to maintain voice prosody over multiple chunks, you can use the Request Stitching feature. This feature allows you to provide context on what has already been generated and what will be generated in the future, helping to maintain a consistent voice and prosody throughout the entire text. Request stitching is not available for the `eleven_v3` model. Here's an example without Request Stitching: