# ElevenLabs
> ElevenLabs is an AI audio research and deployment company.
{/* Light mode wave */}
{/* Dark mode wave */}
## Most popular
Learn how to integrate ElevenLabs
Deploy voice agents in minutes
Learn how to use ElevenLabs
Dive into our API reference
## Meet the models
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
3,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
Real-time speech recognition model
Accurate transcription in 99 languages
Real-time transcription
Low latency (~150ms†)
Precise word-level timestamps
[Explore all](/docs/models)
## Capabilities
Text to Speech
Convert text into lifelike speech
Speech to Text
Transcribe spoken audio into text
Voice changer
Modify and transform voices
Voice isolator
Isolate voices from background noise
Dubbing
Dub audio and videos seamlessly
Sound effects
Create cinematic sound effects
Voices
Clone and design custom voices
Agents Platform
Deploy intelligent voice agents
## Product guides
Product guides
Explore our product guides for step-by-step guidance
† Excluding application & network latency
# Developer quickstart
> Learn how to make your first ElevenLabs API request.
The ElevenLabs API provides a simple interface to state-of-the-art audio [models](/docs/models) and [features](/docs/api-reference/introduction). Follow this guide to learn how to create lifelike speech with our Text to Speech API. See the [developer guides](/docs/quickstart#explore-our-developer-guides) for more examples with our other products.
## Using the Text to Speech API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
{/* This snippet was auto-generated */}
```python
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play
import os
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio = elevenlabs.text_to_speech.convert(
text="The first move is what sets everything in motion.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
play(audio)
```
```typescript
import { ElevenLabsClient, play } from '@elevenlabs/elevenlabs-js';
import { Readable } from 'stream';
import 'dotenv/config';
const elevenlabs = new ElevenLabsClient();
const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
text: 'The first move is what sets everything in motion.',
modelId: 'eleven_multilingual_v2',
outputFormat: 'mp3_44100_128',
});
const reader = audio.getReader();
const stream = new Readable({
async read() {
const { done, value } = await reader.read();
if (done) {
this.push(null);
} else {
this.push(value);
}
},
});
await play(stream);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the audio play through your speakers.
## Explore our developer guides
Now that you've made your first ElevenLabs API request, you can explore the other products that ElevenLabs offers.
Convert spoken audio into text
Deploy conversational voice agents
Generate studio-quality music
Clone a voice
Remix a voice
Generate sound effects from text
Transform the voice of an audio file
Isolate background noise from audio
Generate voices from a single text prompt
Dub audio/video from one language to another
Generate time-aligned transcripts for audio
# Models
> Learn about the models that power the ElevenLabs API.
## Flagship models
### Text to Speech
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
3,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
### Speech to Text
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
Real-time speech recognition model
Accurate transcription in 99 languages
Real-time transcription
Low latency (~150ms†)
Precise word-level timestamps
### Music
Studio-grade music with natural language prompts in any style
Complete control over genre, style, and structure
Vocals or just instrumental
Multilingual, including English, Spanish, German, Japanese and more
Edit the sound and lyrics of individual sections or the whole song
[Pricing](https://elevenlabs.io/pricing/api)
## Models overview
The ElevenLabs API offers a range of audio models optimized for different use cases, quality levels, and performance requirements.
| Model ID | Description | Languages |
| ---------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `eleven_v3` | Human-like and expressive speech generation | [70+ languages](/docs/models#supported-languages) |
| `eleven_ttv_v3` | Human-like and expressive voice design model (Text to Voice) | [70+ languages](/docs/models#supported-languages) |
| `eleven_multilingual_v2` | Our most lifelike model with rich emotional expression | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` |
| `eleven_flash_v2_5` | Ultra-fast model optimized for real-time use (\~75ms†) | All `eleven_multilingual_v2` languages plus: `hu`, `no`, `vi` |
| `eleven_flash_v2` | Ultra-fast model optimized for real-time use (\~75ms†) | `en` |
| `eleven_turbo_v2_5` | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`, `hu`, `no`, `vi` |
| `eleven_turbo_v2` | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms) | `en` |
| `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` |
| `eleven_multilingual_ttv_v2` | State-of-the-art multilingual voice designer model (Text to Voice) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` |
| `eleven_english_sts_v2` | English-only voice changer model (Speech to Speech) | `en` |
| `scribe_v2_realtime` | Real-time speech recognition model | [99 languages](/docs/capabilities/speech-to-text#supported-languages) |
| `scribe_v1` | State-of-the-art speech recognition. Outclassed by v2 models | [99 languages](/docs/capabilities/speech-to-text#supported-languages) |
† Excluding application & network latency
### Deprecated models
The `eleven_monolingual_v1` and `eleven_multilingual_v1` models are deprecated and will be removed in the future. Please migrate to newer models for continued service.
| Model ID | Description | Languages | Replacement model suggestion |
| ------------------------ | ---------------------------------------------------- | ---------------------------------------------- | ---------------------------- |
| `eleven_monolingual_v1` | First generation TTS model (outclassed by v2 models) | `en` | `eleven_multilingual_v2` |
| `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models) | `en`, `fr`, `de`, `hi`, `it`, `pl`, `pt`, `es` | `eleven_multilingual_v2` |
## Eleven v3 (alpha)
This model is currently in alpha and is subject to change. Eleven v3 is not made for real-time
applications like Agents Platform. When integrating Eleven v3 into your application, consider
generating several generations and allowing the user to select the best one.
Eleven v3 is our latest and most advanced speech synthesis model. It is a state-of-the-art model that produces natural, life-like speech with high emotional range and contextual understanding across multiple languages.
This model works well in the following scenarios:
* **Character Discussions**: Excellent for audio experiences with multiple characters that interact with each other.
* **Audiobook Production**: Perfect for long-form narration with complex emotional delivery.
* **Emotional Dialogue**: Generate natural, lifelike dialogue with high emotional range and contextual understanding.
With Eleven v3 comes a new Text to Dialogue API, which allows you to generate natural, lifelike dialogue with high emotional range and contextual understanding across multiple languages. Eleven v3 can also be used with the Text to Speech API to generate natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
Read more about the Text to Dialogue API [here](/docs/capabilities/text-to-dialogue).
### Supported languages
The Eleven v3 model supports 70+ languages, including:
*Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).*
## Multilingual v2
Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent.
This model excels in scenarios requiring high-quality, emotionally nuanced speech:
* **Character Voiceovers**: Ideal for gaming and animation due to its emotional range.
* **Professional Content**: Well-suited for corporate videos and e-learning materials.
* **Multilingual Projects**: Maintains consistent voice quality across language switches.
* **Stable Quality**: Produces consistent, high-quality audio output.
While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.
Our multilingual v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
## Flash v2.5
Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and Agents Platform. It delivers high-quality speech with ultra-low latency (\~75ms†) across 32 languages.
The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.
This model is particularly well-suited for:
* **Agents Platform**: Perfect for real-time voice agents and chatbots.
* **Interactive Applications**: Ideal for games and applications requiring immediate response.
* **Large-Scale Processing**: Efficient for bulk text-to-speech conversion.
With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages.
Flash v2.5 supports 32 languages - all languages from v2 models plus:
*Hungarian, Norwegian & Vietnamese*
† Excluding application & network latency
### Considerations
When using Flash v2.5, numbers aren't normalized by default in a way you might expect. For example, phone numbers might be read out in way that isn't clear for the user. Dates and currencies are affected in a similar manner.
By default, normalization is disabled for Flash v2.5 to maintain the low latency. However, Enterprise customers can now enable text normalization for v2.5 models by setting the `apply_text_normalization` parameter to "on" in your request.
The Multilingual v2 model does a better job of normalizing numbers, so we recommend using it for phone numbers and other cases where number normalization is important.
For low-latency or Agents Platform applications, best practice is to have your LLM [normalize the text](/docs/best-practices/prompting/normalization) before passing it to the TTS model, or use the `apply_text_normalization` parameter (Enterprise plans only for v2.5 models).
## Turbo v2.5
Eleven Turbo v2.5 is our high-quality, low-latency model with a good balance of quality and speed.
This model is an ideal choice for all scenarios where you'd use Flash v2.5, but where you're willing to trade off latency for higher quality voice generation.
## Model selection guide
Use `eleven_multilingual_v2`
Best for high-fidelity audio output with rich emotional expression
Use Flash models
Optimized for real-time applications (\~75ms latency)
Use either either `eleven_multilingual_v2` or `eleven_flash_v2_5`
Both support up to 32 languages
Use `eleven_turbo_v2_5`
Good balance between quality and speed
Use `eleven_multilingual_v2`
Ideal for professional content, audiobooks & video narration.
Use `eleven_flash_v2_5`, `eleven_flash_v2`, `eleven_multilingual_v2`, `eleven_turbo_v2_5` or `eleven_turbo_v2`
Perfect for real-time conversational applications
Use `eleven_multilingual_sts_v2`
Specialized for Speech-to-Speech conversion
## Character limits
The maximum number of characters supported in a single text-to-speech request varies by model.
| Model ID | Character limit | Approximate audio duration |
| ------------------------ | --------------- | -------------------------- |
| `eleven_v3` | 5,000 | \~3 minutes |
| `eleven_flash_v2_5` | 40,000 | \~40 minutes |
| `eleven_flash_v2` | 30,000 | \~30 minutes |
| `eleven_turbo_v2_5` | 40,000 | \~40 minutes |
| `eleven_turbo_v2` | 30,000 | \~30 minutes |
| `eleven_multilingual_v2` | 10,000 | \~10 minutes |
| `eleven_multilingual_v1` | 10,000 | \~10 minutes |
| `eleven_english_sts_v2` | 10,000 | \~10 minutes |
| `eleven_english_sts_v1` | 10,000 | \~10 minutes |
For longer content, consider splitting the input into multiple requests.
## Scribe v1
Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging.
This model excels in scenarios requiring accurate speech-to-text conversion:
* **Transcription Services**: Perfect for converting audio/video content to text
* **Meeting Documentation**: Ideal for capturing and documenting conversations
* **Content Analysis**: Well-suited for audio content processing and analysis
* **Multilingual Recognition**: Supports accurate transcription across 99 languages
Key features:
* Accurate transcription with word-level timestamps
* Speaker diarization for multi-speaker audio
* Dynamic audio tagging for enhanced context
* Support for 99 languages
Read more about Scribe v1 [here](/docs/capabilities/speech-to-text).
## Scribe v2 Realtime
Scribe v2 Realtime, our fastest and most accurate live speech recognition model, delivers state-of-the-art accuracy in over 92 languages with an ultra-low 150ms of latency.
This model excels in conversational use cases:
* **Live meeting transcription**: Perfect for realtime transcription
* **AI Agents**: Ideal for live conversations
* **Multilingual Recognition**: Supports accurate transcription across 99 languages with automatic language recognition
Key features:
* Ultra-low latency: Get partial transcriptions in \~150 milliseconds
* Streaming support: Send audio in chunks while receiving transcripts in real-time
* Multiple audio formats: Support for PCM (8kHz to 48kHz) and μ-law encoding
* Voice Activity Detection (VAD): Automatic speech segmentation based on silence detection
* Manual commit control: Full control over when to finalize transcript segments
Read more about Scribe v2 Realtime [here](/docs/cookbooks/speech-to-text/streaming).
## Eleven Music
Eleven Music is our studio-grade music generation model. It allows you to generate music with natural language prompts in any style.
This model is excellent for the following scenarios:
* **Game Soundtracks**: Create immersive soundtracks for games
* **Podcast Backgrounds**: Enhance podcasts with professional music
* **Marketing**: Add background music to ad reels
Key features:
* Complete control over genre, style, and structure
* Vocals or just instrumental
* Multilingual, including English, Spanish, German, Japanese and more
* Edit the sound and lyrics of individual sections or the whole song
Read more about Eleven Music [here](/docs/capabilities/music).
## Concurrency and priority
Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue.
Speech to Text has an elevated concurrency limit.
Once the concurrency limit is met, subsequent requests are processed in a queue alongside lower-priority requests.
In practice this typically only adds \~50ms of latency.
| Plan | Concurrency Limit (Multilingual v2) | Concurrency Limit (Turbo & Flash) | STT Concurrency Limit | Realtime STT Concurrency limit | Music Concurrency limit | Priority level |
| ---------- | ----------------------------------------- | --------------------------------------- | --------------------- | ------------------------------ | ----------------------- | -------------- |
| Free | 2 | 4 | 8 | 4 | N/A | 3 |
| Starter | 3 | 6 | 12 | 6 | 4 | 4 |
| Creator | 5 | 10 | 20 | 10 | 5 | 5 |
| Pro | 10 | 20 | 40 | 20 | 5 | 5 |
| Scale | 15 | 30 | 60 | 30 | 5 | 5 |
| Business | 15 | 30 | 60 | 30 | 5 | 5 |
| Enterprise | Elevated | Elevated | Elevated | Elevated | Highest | 5 |
Startup grants recipients receive Scale level benefits.
The response headers include `current-concurrent-requests` and `maximum-concurrent-requests` which you can use to monitor your concurrency.
### API requests per minute vs concurrent requests
It's important to understand that **API requests per minute** and **concurrent requests** are different metrics that depend on your usage patterns.
API requests per minute can be different from concurrent requests since it depends on the length of time for each request and how the requests are batched.
**Example 1: Spaced requests**
If you had 180 requests per minute that each took 1 second to complete and you sent them each 0.33 seconds apart, the max concurrent requests would be 3 and the average would be 3 since there would always be 3 in flight.
**Example 2: Batched requests**
However, if you had a different usage pattern such as 180 requests per minute that each took 3 seconds to complete but all fired at once, the max concurrent requests would be 180 and the average would be 9 (first 3 seconds of the minute saw 180 requests at once, final 57 seconds saw 0 requests).
Since our system cares about concurrency, requests per minute matter less than how long each of the requests take and the pattern of when they are sent.
How endpoint requests are made impacts concurrency limits:
* With HTTP, each request counts individually toward your concurrency limit.
* With a WebSocket, only the time where our model is generating audio counts towards your concurrency limit, this means a for most of the time an open websocket doesn't count towards your concurrency limit at all.
### Understanding concurrency limits
The concurrency limit associated with your plan should not be interpreted as the maximum number of simultaneous conversations, phone calls character voiceovers, etc that can be handled at once.
The actual number depends on several factors, including the specific AI voices used and the characteristics of the use case.
As a general rule of thumb, a concurrency limit of 5 can typically support up to approximately 100 simultaneous audio broadcasts.
This is because of the speed it takes for audio to be generated relative to the time it takes for the TTS request to be processed.
The diagram below is an example of how 4 concurrent calls with different users can be facilitated while only hitting 2 concurrent requests.
Where TTS is used to facilitate dialogue, a concurrency limit of 5 can support about 100 broadcasts for balanced conversations between AI agents and human participants.
For use cases in which the AI agent speaks less frequently than the human, such as customer support interactions, more than 100 simultaneous conversations could be supported.
Generally, more than 100 simultaneous character voiceovers can be supported for a concurrency limit of 5.
The number can vary depending on the character’s dialogue frequency, the length of pauses, and in-game actions between lines.
Concurrent dubbing streams generally follow the provided heuristic.
If the broadcast involves periods of conversational pauses (e.g. because of a soundtrack, visual scenes, etc), more simultaneous dubbing streams than the suggestion may be possible.
If you exceed your plan's concurrency limits at any point and you are on the Enterprise plan, model requests may still succeed, albeit slower, on a best efforts basis depending on available capacity.
To increase your concurrency limit & queue priority, [upgrade your subscription
plan](https://elevenlabs.io/pricing/api).
Enterprise customers can request a higher concurrency limit by contacting their account manager.
### Scale testing concurrency limits
Scale testing can be useful to identify client side scaling issues and to verify concurrency limits are set correctly for your usecase.
It is heavily recommended to test end-to-end workflows as close to real world usage as possible, simulating and measuring how many users can be supported is the recommended methodology for achieving this. It is important to:
* Simulate users, not raw requests
* Simulate typical user behavior such as waiting for audio playback, user speaking or transcription to finish before making requests
* Ramp up the number of users slowly over a period of minutes
* Introduce randomness to request timings and to the size of requests
* Capture latency metrics and any returned error codes from the API
For example, to test an agent system designed to support 100 simultaneous conversations you would create up to 100 individual "users" each simulating a conversation. Conversations typically consist of a repeating cycle of \~10 seconds of user talking, followed by the TTS API call for \~150 characters, followed by \~10 seconds of audio playback to the user. Therefore, each user should follow the pattern of making a websocket Text-to-Speech API call for 150 characters of text every 20 seconds, with a small amount of randomness introduced to the wait period and the number of characters requested. The test would consist of spawning one user per second until 100 exist and then testing for 10 minutes in total to test overall stability.
This example uses [locust](https://locust.io/) as the testing framework with direct API calls to the ElevenLabs API.
It follows the example listed above, testing a conversational agent system with each user sending 1 request every 20 seconds.
```python title="Python" {12}
import json
import random
import time
import gevent
import locust
from locust import User, task, events, constant_throughput
import websocket
# Averages up to 10 seconds of audio when played, depends on the voice speed
DEFAULT_TEXT = (
"Hello, this is a test message. I am testing if a long input will cause issues for the model "
"like this sentence. "
)
TEXT_ARRAY = [
"Hello.",
"Hello, this is a test message.",
DEFAULT_TEXT,
DEFAULT_TEXT * 2,
DEFAULT_TEXT * 3
]
# Custom command line arguments
@events.init_command_line_parser.add_listener
def on_parser_init(parser):
parser.add_argument("--api-key", default="YOUR_API_KEY", help="API key for authentication")
parser.add_argument("--encoding", default="mp3_22050_32", help="Encoding")
parser.add_argument("--text", default=DEFAULT_TEXT, help="Text to use")
parser.add_argument("--use-text-array", default="false", help="Text to use")
parser.add_argument("--voice-id", default="aria", help="Text to use")
class WebSocketTTSUser(User):
# Each user will send a request every 20 seconds, regardless of how long each request takes
wait_time = constant_throughput(0.05)
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.api_key = self.environment.parsed_options.api_key
self.voice_id = self.environment.parsed_options.voice_id
self.text = self.environment.parsed_options.text
self.encoding = self.environment.parsed_options.encoding
self.use_text_array = self.environment.parsed_options.use_text_array
if self.use_text_array:
self.text = random.choice(TEXT_ARRAY)
self.all_recieved = False
@task
def tts_task(self):
# Do jitter waiting of up to 1 second
# Users appear to be spawned every second so this ensures requests are not aligned
gevent.sleep(random.random())
max_wait_time = 10
# Connection details
uri = f"{self.environment.host}/v1/text-to-speech/{self.voice_id}/stream-input?auto_mode=true&output_format={self.encoding}"
headers = {"xi-api-key": self.api_key}
ws = None
self.all_recieved = False
try:
init_msg = {"text": " "}
# Use proper header format for websocket - this is case sensitive!
ws = websocket.create_connection(uri, header=headers)
ws.send(json.dumps(init_msg))
# Start measuring after websocket initiated but before any messages are sent
send_request_time = time.perf_counter()
ws.send(json.dumps({"text": self.text}))
# Send to flush and receive the audio
ws.send(json.dumps({"text": ""}))
def _receive():
t_first_response = None
audio_size = 0
try:
while True:
# Wait up to 10 seconds for a response
ws.settimeout(max_wait_time)
response = ws.recv()
response_data = json.loads(response)
if "audio" in response_data and response_data["audio"]:
audio_size = audio_size + len(response_data["audio"])
if t_first_response is None:
t_first_response = time.perf_counter()
first_byte_ms = (
t_first_response - send_request_time
) * 1000
if audio_size is None:
# The first response should always have audio
locust.events.request.fire(
request_type="websocket",
name="Bad Response (no audio)",
response_time=first_byte_ms,
response_length=audio_size,
exception=Exception("Response has no audio"),
)
break
if "isFinal" in response_data and response_data["isFinal"]:
# Fire this event once finished streaming, but report the important TTFB metric
locust.events.request.fire(
request_type="websocket",
name="TTS Stream Success (First Byte)",
response_time=first_byte_ms,
response_length=audio_size,
exception=None,
)
break
except websocket.WebSocketTimeoutException:
locust.events.request.fire(
request_type="websocket",
name="TTS Stream Timeout",
response_time=max_wait_time * 1000,
response_length=audio_size,
exception=Exception("Timeout waiting for response"),
)
except Exception as e:
# Typically JSON decode error if the server returns HTTP backoff error
locust.events.request.fire(
request_type="websocket",
name="TTS Stream Failure",
response_time=0,
response_length=0,
exception=e,
)
finally:
self.all_recieved = True
gevent.spawn(_receive)
# Sleep until recieved so new tasks aren't spawned
while not self.all_recieved:
gevent.sleep(1)
except websocket.WebSocketTimeoutException:
locust.events.request.fire(
request_type="websocket",
name="TTS Stream Timeout",
response_time=max_wait_time * 1000,
response_length=0,
exception=Exception("Timeout waiting for response"),
)
except Exception as e:
locust.events.request.fire(
request_type="websocket",
name="TTS Stream Failure",
response_time=0,
response_length=0,
exception=e,
)
finally:
# Try and close the websocket gracefully
try:
if ws:
ws.close()
except Exception:
pass
```
# November 12, 2025
### Scribe v2 Realtime
Scribe v2 Realtime, our fastest and most accurate live speech recognition model, has officially launched. It delivers state-of-the-art accuracy in over 92 languages with an ultra-low 150ms of latency. [Read more here](/docs/models#scribe-v2-realtime).
### Agents Platform
- **Widget end-call feedback**: Widgets now support customizable end-of-call feedback collection with optional rating and comment fields, enabling you to gather user feedback directly within your agent conversations.
- **Soft timeout configuration**: Added comprehensive soft timeout settings to control how agents handle pauses in conversation, with configurable prompts and behavior across turn and conversation levels.
- **Advanced conversation filtering**: The conversations endpoint now supports filtering by call duration range, evaluation parameters, data collection parameters, and specific tool names.
- **Enhanced conversation feedback**: Updated conversation feedback system with structured feedback types including ratings and comments for better user experience tracking.
### Studios
- **Extended project metadata**: Projects now include comprehensive asset tracking with video thumbnails, external audio references, and enhanced snapshot information including audio duration.
### Billing
- **Enhanced invoice details**: Invoice responses now include detailed discount information with a new `discounts` array, replacing the deprecated `discount_percent_off` and `discount_amount_off` fields. The `subtotal_cents` and `tax_cents` fields provide clearer invoice breakdowns.
### Music API
- **Stem separation latency improvements**: The stem separation endpoint has been updated to better handle audio files and provide more predictable latency for processing requests.
### SDK Releases
#### JavaScript SDK
- [v2.23.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.23.0) - Updated Speech to Text (Scribe) endpoint to `scribe_realtime` and exported additional TypeScript types for improved developer experience.
- [v2.22.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.22.0) - Updated with latest API schema changes including conversation filtering, soft timeout configuration, and widget feedback features.
#### Python SDK
- [v2.22.1](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.22.1) - Updated Speech to Text (Scribe) endpoint to `scribe_realtime` and fixed circular import issues for improved stability.
- [v2.22.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.22.0) - Updated with latest API schema changes including conversation filtering, soft timeout configuration, and widget feedback features.
#### Agents Packages
- [@elevenlabs/client@0.10.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/client@0.10.0) - Updated realtime endpoint configuration and improved documentation.
- [@elevenlabs/react@0.10.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react@0.10.0) - Updated realtime endpoint configuration and improved documentation.
- [@elevenlabs/types@0.2.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/types@0.2.0) - Updated type definitions to match latest API schema changes.
- [@elevenlabs/react@0.9.2](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react@0.9.2) - Minor bug fixes and stability improvements.
- [@elevenlabs/convai-widget-embed@0.5.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/convai-widget-embed@0.5.1) - Updated widget embed functionality with end-call feedback support.
- [@elevenlabs/convai-widget-core@0.5.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/convai-widget-core@0.5.1) - Added end-call feedback configuration to widget core.
- [@elevenlabs/convai-widget-embed@0.5.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/convai-widget-embed@0.5.0) - Added support for customizable end-of-call feedback collection.
- [@elevenlabs/convai-widget-core@0.5.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/convai-widget-core@0.5.0) - Introduced `WidgetEndFeedbackConfig` and `WidgetEndFeedbackType` for feedback handling.
- [@elevenlabs/client@0.9.2](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/client@0.9.2) - Updated client configuration with latest API changes.
#### Android SDK
- [v0.5.0](https://github.com/elevenlabs/elevenlabs-android/releases/tag/v0.5.0) - Added `setVolume` and `getVolume` functions for programmatic audio control, with example implementation including a volume seek bar in the sample app.
### API
## Updated Endpoints
### Agents Platform
#### Conversation Management
- [Get conversations](/docs/api-reference/conversations/list)
- Added `call_duration_min_secs` query parameter (integer) to filter conversations by minimum call duration
- Added `call_duration_max_secs` query parameter (integer) to filter conversations by maximum call duration
- Added `evaluation_params` query parameter (array of strings) for filtering by evaluation criteria
- Added `data_collection_params` query parameter (array of strings) for filtering by data collection parameters
- Added `tool_names` query parameter (array of strings) to filter conversations that used specific tools
- [Provide conversation feedback](/docs/api-reference/conversations/create)
- Now uses `ConversationFeedbackRequestModel` with structured feedback types
- Added `type` field to specify feedback category
- Added `rating` field for numeric feedback
- Added `comment` field for text feedback
#### Agent Configuration
- [Get agent widget](/docs/api-reference/widget/get)
- Added `end_feedback` field with `WidgetEndFeedbackConfig` for configuring end-of-call feedback collection
- Supports optional rating and comment fields in feedback forms
- [Get agent](/docs/api-reference/agents/get) & [Update agent](/docs/api-reference/agents/update)
- Response schema updated to include new platform configuration options
- Request schema enhanced with additional settings fields
#### Turn and Conversation Configuration
- [Create agent](/docs/api-reference/agents/create), [Simulate conversation](/docs/api-reference/agents/simulate-conversation), [Simulate conversation stream](/docs/api-reference/agents/simulate-conversation-stream)
- Added `soft_timeout_config` field for controlling pause behavior during conversations
- Added `turn` configuration overrides via `TurnConfigOverride` and `TurnConfigOverrideConfig`
- Added `initial_wait_time` to `TurnConfig` for controlling initial response timing
- Configuration available at conversation and workflow override levels
#### Testing
- [Run agent tests](/docs/api-reference/tests/run-tests)
- Request schema enhanced for workflow expression support
- Response schema updated with additional test result fields
- [Get test invocation](/docs/api-reference/tests/test-invocations/get)
- Response includes enhanced test result data
- Updated schema for workflow expression results
### Projects and Studio
- [Get project](/docs/api-reference/studio/get-project)
- Added `assets` field (required) containing videos and external audio references
- Introduced `ProjectVideoResponseModel` for video asset metadata
- Introduced `ProjectExternalAudioResponseModel` for external audio tracking
- Introduced `ProjectVideoThumbnailSheetResponseModel` for video thumbnails
- [Get project snapshot](/docs/api-reference/studio/get-project-snapshot)
- Response restructured with new `ProjectSnapshotExtendedResponseModel`
- Added `audio_duration_secs` field (required) to snapshot data
- Removed `character_alignments` from `ProjectSnapshotResponseModel`
- Introduced `ProjectSnapshotsResponseModel` for snapshot collections
### Billing
- [Get subscription](/docs/api-reference/user/subscription/get)
- Added `discounts` array (required) with `DiscountResponseModel` entries
- Deprecated `discount_percent_off` field (still available but marked for removal)
- Deprecated `discount_amount_off` field (still available but marked for removal)
- Added `subtotal_cents` field (integer, nullable) for pre-tax invoice totals
- Added `tax_cents` field (integer, nullable) for tax amounts
- Updated invoice examples to include new discount and total fields
### Speech to Text
- [Transcribe audio](/docs/api-reference/speech-to-text/convert)
- Response schema updated with enhanced transcription data
### Text to Speech
- [Text to speech with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) & [Stream text to speech with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps)
- Response schema updated with enhanced timestamp data
- [Text to dialogue with timestamps](/docs/api-reference/text-to-dialogue/convert-with-timestamps) & [Stream text to dialogue with timestamps](/docs/api-reference/text-to-dialogue/stream-with-timestamps)
- Response schema updated with enhanced timestamp and voice segment data
### Voice Management
- [Get voice](/docs/api-reference/voices/get), [Get voice settings](/docs/api-reference/voices/settings/get), [Get default voice settings](/docs/api-reference/voices/settings/get-default)
- Response schemas updated with new voice metadata and configuration fields
- [Edit voice settings](/docs/api-reference/voices/settings/update)
- Request schema updated for voice settings modification
- [Generate voice](/docs/api-reference/text-to-voice/create)
- Response schema updated with new voice generation data
### Professional Voice Cloning (PVC)
- [Get PVC sample audio](/docs/api-reference/voices/pvc/samples/get-audio) & [Get PVC separated speaker audio](/docs/api-reference/voices/pvc/samples/get-separated-speaker-audio)
- Response schemas updated for audio sample retrieval
### Language Presets
- Language preset models now include `soft_timeout_translation` field for localized soft timeout messages
### API Integration Triggers
- Registered new API collection: `convai_api_integration_trigger_connections`
## Deprecations
- Invoice fields `discount_percent_off` and `discount_amount_off` are deprecated; use `discounts` array instead
# November 5, 2025
### UI
- **Improved notifications**: Updated notification system to display notifications relevant to the active platform you're viewing, providing a more focused and contextual experience.
### Agents Platform
- **Dynamic variable transfer destinations**: Agent transfers now support dynamic variables for phone numbers and SIP URIs, enabling runtime-determined transfer destinations based on conversation context.
- **MCP tool configuration overrides**: Added the ability to create, update, retrieve, and delete custom configuration overrides for specific MCP Server tools, allowing fine-grained control over tool behavior and parameters.
### Text to Dialogue
- **Timestamps and voice segments**: Text to Dialogue now supports timestamped outputs with character-level alignment and voice segment tracking, making it easier to synchronize dialogue with animations or subtitles.
### Music API
- **Stem separation**: Added new stem separation endpoint to isolate different audio components (vocals, drums, bass, instruments) from existing music tracks.
- **Increased prompt length**: Music generation now supports prompts up to 4,100 characters, with individual lyric lines supporting up to 200 characters.
### Security
- **Single-use tokens**: Introduced time-limited single-use token generation for secure operations, providing enhanced security for sensitive API operations.
### SDK Releases
#### JavaScript SDK
- [v2.22.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.22.0) - Updated with latest API schema changes including workspace model approvals and MCP tool configuration endpoints.
- [v2.21.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.21.0) - Added support for intercepting raw WebSocket messages via a general `message` handler, allowing developers to access all messages beyond the standard callbacks, including undocumented message types like `agent_tool_response`.
#### Python SDK
- [v2.22.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.22.0) - Updated with latest API schema changes including workspace model approvals and MCP tool configuration endpoints.
- [v2.21.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.21.0) - Replaced print statements with proper logging to support better debugging and production use cases.
#### Agents Packages
- [@elevenlabs/agents-cli@0.6.1](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs/agents-cli%400.6.1) - Package deprecated in favor of the unified [ElevenLabs CLI](https://www.npmjs.com/package/@elevenlabs/cli).
- [@elevenlabs/react@0.9.1](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs/react%400.9.1) - Fixed issue where `end_call` tool wasn't properly ending conversations, and improved React Native reconnection logic after manual disconnection.
- [@elevenlabs/react-native@0.5.2](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs/react-native%400.5.2) - Fixed issue where `end_call` tool wasn't properly ending conversations, and improved reconnection logic after manual disconnection.
- [@elevenlabs/client@0.9.1](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs/client%400.9.1) - Fixed issue where `end_call` tool wasn't properly ending conversations, and improved React Native reconnection logic.
#### Android SDK
- [v0.4.0](https://github.com/elevenlabs/elevenlabs-android/releases/tag/v0.4.0) - Improved type safety by using `ConversationMode` enums instead of strings, migrated from LiveData to StateFlow with backward compatibility support, and fixed bug where agents wouldn't end calls when requested.
#### iOS SDK
- [v2.0.16](https://github.com/elevenlabs/elevenlabs-swift-sdk/releases/tag/v2.0.16) - Fixed issue where the `end_call` tool wasn't properly ending conversations, ensuring agents can correctly terminate calls.
#### CLI
- [@elevenlabs/cli@0.2.0](https://github.com/elevenlabs/cli/releases/tag/%40elevenlabs/cli%400.2.0) - Removed `--env` flag support for virtual environment isolation. This feature will be reintroduced once proper environment isolation is supported in the product.
### API
#### TLS
Our TLS endpoints no longer allow some older and insecure cipher modes. Most clients should not be affected as they already negotiate a modern cipher mode.
## New Endpoints
### Agents Platform
#### MCP Tool Configuration
- [POST /v1/convai/mcp-servers/{mcp_server_id}/tool-configs](/docs/api-reference/mcp/tool-configuration/create) - Create configuration overrides for a specific MCP tool.
- [GET /v1/convai/mcp-servers/{mcp_server_id}/tool-configs/{tool_name}](/docs/api-reference/mcp/tool-configuration/get) - Retrieve configuration overrides for a specific MCP tool.
- [PATCH /v1/convai/mcp-servers/{mcp_server_id}/tool-configs/{tool_name}](/docs/api-reference/mcp/tool-configuration/update) - Update configuration overrides for a specific MCP tool.
- [DELETE /v1/convai/mcp-servers/{mcp_server_id}/tool-configs/{tool_name}](/docs/api-reference/mcp/tool-configuration/delete) - Remove configuration overrides for a specific MCP tool.
### Text to Dialogue
- [POST /v1/text-to-dialogue/with-timestamps](/docs/api-reference/text-to-dialogue/convert-with-timestamps) - Generate dialogue with character-level alignment and voice segment information.
- [POST /v1/text-to-dialogue/stream/with-timestamps](/docs/api-reference/text-to-dialogue/stream-with-timestamps) - Stream dialogue generation with character-level alignment and voice segment information.
### Tokens
- [POST /v1/single-use-token/{token_type}](/docs/api-reference/single-use/create) - Generate time-limited single-use tokens for secure operations.
### Music API
- [POST /v1/music/stem-separation](/docs/api-reference/music/separate-stems) - Separate audio into individual stems (vocals, drums, bass, instruments). Accepts multipart file upload and returns a ZIP archive.
## Updated Endpoints
### Text to Dialogue
- [Text to dialogue with timestamps](/docs/api-reference/text-to-dialogue/convert-with-timestamps)
- Added `voice_segments` array with required `dialogue_input_index` field to track which dialogue input each segment corresponds to
- Enhanced response schema with character-level alignment data
- [Stream text to dialogue with timestamps](/docs/api-reference/text-to-dialogue/stream-with-timestamps)
- Added streaming support for `voice_segments` with character-level alignment
- Includes `dialogue_input_index` in voice segment chunks
### Music
- Music prompt endpoints updated:
- Maximum prompt length increased from previous limit to 4,100 characters
- Lyrics lines now support up to 200 characters per line
### Workspace
- [Get workspace resource](/docs/api-reference/workspace/resources/get)
- Added `dashboard` and `dashboard_configuration` to `WorkspaceResourceType` enum
- Response schema changes (breaking compatibility - see migration notes below)
### Billing
- [Get subscription](/docs/api-reference/user/subscription/get)
- Added `next_billing_period` field (required) to `PendingSubscriptionSwitchResponseModel`
- Added `subtotal_cents` and `tax_cents` (nullable) to `InvoiceResponseModel` for better invoice breakdown
### Agents Platform
#### Batch Calling
- [Submit batch calling](/docs/api-reference/batch-calling/create)
- Made `agent_phone_number_id` nullable and optional
- Made `phone_number` nullable and optional in recipient models
- Response schema changes (breaking compatibility - see migration notes below)
- [Get batch calling status](/docs/api-reference/batch-calling/get)
- Made `phone_number_id` and `phone_provider` nullable
- [Cancel batch calling](/docs/api-reference/batch-calling/cancel) & [Retry batch calling](/docs/api-reference/batch-calling/retry)
- Made phone-related fields nullable
- Response schema changes (breaking compatibility - see migration notes below)
#### Transfer Destinations
- Transfer-related endpoints now support dynamic variable transfer destinations:
- Added `PhoneNumberDynamicVariableTransferDestination` type for phone number transfers using dynamic variables
- Added `SIPUriDynamicVariableTransferDestination` type for SIP URI transfers using dynamic variables
- Updated `PhoneNumberTransfer` discriminators to include dynamic variable types
#### MCP Servers
- [Create MCP server](/docs/api-reference/mcp/create), [Get MCP server](/docs/api-reference/mcp/get), [Update MCP server](/docs/api-reference/mcp/update)
- Added `tool_config_overrides` field for per-tool configuration customization
- Added `request_headers` support in `MCPServerConfigUpdateRequestModel`
- Added `tool_call_sound` and `tool_call_sound_behavior` configuration options
#### Tools
- Tool schema enhancements:
- `LiteralJsonSchemaProperty` now includes `is_system_provided` flag
- Agent tool headers now support `ConvAIDynamicVariable` for dynamic header values
#### Twilio Integration
- Phone number endpoints updated:
- Added optional `region_config` field for Twilio phone numbers
- Added `TwilioRegionId` enum for region selection
- Added `TwilioEdgeLocation` enum for edge location configuration
#### Agents
- [Get agent](/docs/api-reference/agents/get) & [Update agent](/docs/api-reference/agents/update)
- Response schema changes (breaking compatibility - see migration notes below)
# October 27, 2025
### Workspaces
- **External voice sharing controls**: External voice sharing and Voice Library publishing can now be disabled via workspace groups. Internal (in-organization) voice sharing is not impacted.
### Agents Platform
- **Multi-environment agent management**: The Agents CLI now supports the `--env` flag, allowing you to manage agents deployed to multiple accounts simultaneously.
- **Search functionality**: Added `search` parameter to the [Get conversations](/docs/api-reference/conversations/get-conversations) endpoint for filtering conversations.
### SDK Releases
#### JavaScript SDK
- [v2.20.1](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.20.1) - Fixed Music API type definitions and added Realtime Scribe language code support
- [v2.20.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.20.0) - Added helper methods for Realtime Scribe with improved ASR capabilities. This is a beta feature and not widely available yet
#### Python SDK
- [v2.20.1](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.20.1) - Fixed file parameter handling in client methods
- [v2.20.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.20.0) - API schema updates and improvements
- [v2.19.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.19.0) - Added helper method for Scribe Realtime with improved ASR capabilities. This is a beta feature and not widely available yet
#### Agents CLI
- [@elevenlabs/agents-cli@0.6.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/agents-cli@0.6.0) - Added support for `--env` flag to manage agents across multiple accounts, plus bug fixes and reliability improvements
#### React Native
- [@elevenlabs/react-native@0.5.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.5.1) - Removed build script and postinstall steps for improved installation experience
#### React
- [@elevenlabs/react@0.8.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react@0.8.1) - Removed build script and postinstall steps for improved installation experience
#### Client
- [@elevenlabs/client@0.8.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/client@0.8.1) - Removed build script and postinstall steps for improved installation experience
### API
## Updated Endpoints
### Agents Platform
- [Get conversations](/docs/api-reference/conversations/list)
- Added optional `search` query parameter (string) for full-text or fuzzy search over transcript messages
- [Create agent](/docs/api-reference/agents/create), [Get agent](/docs/api-reference/agents/get), [Update agent](/docs/api-reference/agents/update)
- Added optional `tool_call_sound` field to play predefined sounds during tool execution
- Added optional `tool_call_sound_behavior` field (default: `auto`) to control when tool call sounds play
- [Simulate conversation](/docs/api-reference/agents/simulate-conversation), [Simulate conversation stream](/docs/api-reference/agents/simulate-conversation-stream)
- Added optional `tool_call_sound` field for tool execution audio feedback
- Added optional `tool_call_sound_behavior` field for sound playback control
### MCP Servers
- [Update MCP server](/docs/api-reference/mcp/update)
- Added optional `execution_mode` field to control when and how tools execute (`immediate`, `post_tool_speech`, or `async`)
- [Create MCP server](/docs/api-reference/mcp/create), [Get MCP server](/docs/api-reference/mcp/get)
- Added `execution_mode` field (default: `immediate`) for tool execution timing control
### Server Tools
- Server tool configuration schemas now support:
- `execution_mode` field to control tool execution timing (immediate, post-speech, or async)
- `tool_call_sound` and `tool_call_sound_behavior` fields for audio feedback
### Supported LLM Models
- Added support for new Claude Haiku models:
- `claude-haiku-4-5`
- `claude-haiku-4-5@20251001`
### User Management
- [Get user info](/docs/api-reference/user/get)
- Removed `subscription_extras` field from response
# October 20, 2025
### Agents Platform
- **Genesys configuration overrides**: Genesys users can now override agent configuration directly from their Genesys flows using session variables. Use `system__override_system_prompt`, `system__override_first_message`, and `system__override_language` to customize the agent's behavior, first message, and language for specific call flows without modifying the base agent configuration.
- **Batch call ID copying**: Added copy-to-clipboard functionality for batch call IDs throughout the Agents Platform interface. Click on batch call IDs in the batch call details page or conversation history to quickly copy them for API interactions, support requests, or debugging.
### SDK Releases
#### Python SDK
- [v2.18.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.18.0) - Added streaming support for Agents Platform conversations via the `Conversation` class, Music API inpainting with new output format options, and Agent Workflows support including workflow models, node types, and conditional logic operators (AND, OR, equals, greater than, less than, etc.)
#### JavaScript SDK
- [v2.19.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.19.0) - Added Agent Workflows support including `AgentWorkflowRequestModel`, `AgentWorkflowResponseModel`, and AST node types for building workflow conditions (boolean nodes, dynamic variable nodes, comparison operators), Music API inpainting with new output format options, and the `archived` parameter for agent list requests
#### Agents CLI
- [@elevenlabs/agents-cli@0.6.0](https://www.npmjs.com/package/@elevenlabs/agents-cli/v/0.6.0) - Added `--env` flag to manage agents deployed to multiple accounts simultaneously
- [@elevenlabs/agents-cli@0.5.0](https://www.npmjs.com/package/@elevenlabs/agents-cli/v/0.5.0) - Reworked agents pull functionality
#### React Native
- [@elevenlabs/react-native@0.4.3](https://www.npmjs.com/package/@elevenlabs/react-native/v/0.4.3) - Fixed `onConnect` timing issue
- [@elevenlabs/react-native@0.4.2](https://www.npmjs.com/package/@elevenlabs/react-native/v/0.4.2) - Reverted change to ICE transport policy
### API
## Updated Endpoints
### Studio
- [Create podcast](/docs/api-reference/studio/create-podcast) - Added `safety-identifier` header parameter
# October 14, 2025
### Agents Platform
- **LLM overrides**: Added support for overriding an agent's LLM during a conversation, enabling you to specify a different language model on a per-conversation basis. This is useful for testing different models or accommodating specific requirements while maintaining HIPAA and data residency compliance.
- **Post-call webhook failures**: Added the option to send post-call webhook events in the event of a phone call failure. This allows you to track and respond to failed call attempts through your webhook endpoint, providing better visibility into call issues.
### SDK Releases
#### Python SDK
- [v2.18.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.18.0) - Added support for streaming, Music API inpainting, and Agent Workflows
#### JavaScript SDK
- [v2.19.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.19.0) - Added support for Music API inpainting and Agent Workflows
- [v2.18.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.18.0) - API schema updates
#### Client Packages
- [@elevenlabs/agents-cli@0.5.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/agents-cli@0.5.0) - Significantly reworked agents pull command with bugfixes and improvements
- [@elevenlabs/react@0.8.0](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react@0.8.0) - Fixed import issues
- [@elevenlabs/react-native@0.4.3](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.4.3) - Fixed onConnect timing
- [@elevenlabs/react-native@0.4.2](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.4.2) - Reverted change to ICE transport policy
- [@elevenlabs/react-native@0.4.1](https://github.com/elevenlabs/packages/releases/tag/@elevenlabs/react-native@0.4.1) - Fixed import issues
# October 7, 2025
### Agents Platform
- **Gemini 2.5 Flash Preview models**: Added support for `gemini-2.5-flash-preview-09-2025` and `gemini-2.5-flash-lite-preview-09-2025` LLM models, providing access to the latest September 2025 preview versions of Google's Gemini 2.5 Flash models.
- **Claude Sonnet 4.5**: Added support for `claude-sonnet-4-5` and `claude-sonnet-4-5@20250929` models, enabling access to the latest Claude Sonnet 4.5 model released on September 29, 2025.
- **Test invocations listing**: Added new `GET /v1/convai/test-invocations` endpoint to list all test invocations with pagination support. Includes `agent_id` filter, `page_size` parameter (default 30, max 100), and `cursor` for pagination. Response includes test run counts, pass/fail statistics, and titles.
- **Agent archiving**: Added `archived` field (boolean, default false) to agent platform settings, allowing agents to be archived without deletion while keeping them out of active agent lists.
- **MCP Server interruption control**: Added `disable_interruptions` field (boolean, default false) to MCP server configuration, preventing user interruptions during tool execution for more reliable tool completion.
- **Streaming agent responses**: Added `agent_chat_response_part` WebSocket event type for receiving partial agent chat responses in real-time during streaming conversations.
- **Workflow edge ordering**: Added `edge_order` field (array of strings) to all workflow node types, enabling explicit control over edge evaluation order for deterministic workflow execution.
- **Test suite agent tracking**: Added `agent_id` field (string, nullable) to test invocation responses for associating test runs with specific agents.
### Voice Management
- **Voice generation source tracking**: Added `VoiceGeneration` as a new source type in the History API for tracking audio generated from voice generation features.
### Telephony
- **SIP trunk TLS validation**: Added `remote_domains` field (array of strings, nullable) to SIP trunk configuration for specifying domains used in TLS certificate validation.
## SDK Releases
### JavaScript SDK
- [v2.18.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.18.0) - Updated with latest API schema changes from October 8, 2025
### Python SDK
- [v2.17.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.17.0) - Updated with latest API schema changes and URL generation fixes from October 6, 2025
### Packages
All packages updated with latest API schema changes:
- [@elevenlabs/react-native@0.3.2](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Freact-native%400.3.2) - Updated TypeScript types and API client with new fields for agent archiving, MCP server configuration, and test invocations
- [@elevenlabs/react@0.7.1](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Freact%400.7.1) - Updated React hooks and components with support for new agent settings and WebSocket events
- [@elevenlabs/client@0.7.1](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Fclient%400.7.1) - Core client library updated with new endpoint for test invocations listing and reorganized SDK method paths for secrets management
- [@elevenlabs/agents-cli@0.4.2](https://github.com/elevenlabs/elevenlabs-js/releases/tag/%40elevenlabs%2Fagents-cli%400.4.2) - CLI tool updated with support for new agent archiving flag and test invocation commands
### MCP Server
- [v0.9.0](https://github.com/elevenlabs/elevenlabs-mcp/releases/tag/v0.9.0) - Added option to return MCP server results as resource items for better integration with resource-based workflows
## API
## New Endpoints
### Agents Platform
- `GET /v1/convai/test-invocations` - List all test invocations with pagination support
- **Parameters:**
- `agent_id` (required, string) - Filter by agent ID
- `page_size` (optional, integer, default=30, max=100) - Number of results per page
- `cursor` (optional, string) - Pagination cursor from previous response
- **Response:** Returns paginated list with test run counts, pass/fail statistics, titles, and next cursor
## New Fields
### Agents Platform
- **Agent Settings**: Added `archived` field (boolean, default false) to `AgentPlatformSettingsRequestModel` and `AgentPlatformSettingsResponseModel` for archiving agents
- **MCP Servers**: Added `disable_interruptions` field (boolean, default false) to MCP server configuration schemas for preventing user interruptions during tool execution
- **Workflows**: Added `edge_order` field (array of strings) to all workflow node types for explicit edge evaluation ordering
- **Test Invocations**: Added `agent_id` field (string, nullable) to `GetTestSuiteInvocationResponseModel` for agent tracking
### Telephony
- **SIP Trunks**: Added `remote_domains` field (array of strings, nullable) to `GetPhoneNumberInboundSIPTrunkConfigResponseModel` and `InboundSIPTrunkConfigRequestModel` for TLS certificate validation
### WebSocket Events
- Added `agent_chat_response_part` to `ServerEventType` enum for streaming partial agent chat responses
### Voice Management
- Added `VoiceGeneration` to speech history source types
## New LLM Models
Added the following models to the LLM enum:
- `claude-sonnet-4-5` - Claude Sonnet 4.5 latest
- `claude-sonnet-4-5@20250929` - Claude Sonnet 4.5 dated release (September 29, 2025)
- `gemini-2.5-flash-preview-09-2025` - Gemini 2.5 Flash preview (September 2025)
- `gemini-2.5-flash-lite-preview-09-2025` - Gemini 2.5 Flash Lite preview (September 2025)
## Other Changes
### Pronunciation Dictionaries
- Updated parameter description for `version_id` in `GET /v1/pronunciation-dictionaries/{dictionary_id}/{version_id}/download` from "The id of the version of the pronunciation dictionary" to "The id of the pronunciation dictionary version"
- Removed documentation note about UI limitation for multiple pronunciation dictionaries (multiple dictionaries now fully supported in UI)
### Conversation History
- Made `type` field optional in `ConversationHistoryTranscriptOtherToolsResultCommonModel` (previously required)
# September 29, 2025
### v1 TTS model deprecation
The `eleven_monolingual_v1` and `eleven_multilingual_v1` models are deprecated and will be removed on December 15th, 2025. Please [migrate to newer models](https://elevenlabs.io/docs/models#deprecated-models) for continued service.
### Agents Platform
- **Workflow Expressions**: Workflows now support complex expressions that allow for defining deterministic conditions using logical operators, dynamic variables and LLM evaluation. This enables more sophisticated agent logic and decision-making capabilities.
- **MCP Server Interrupt Control**: Added option to disable interruptions during all tool calls for MCP Servers, providing better control over agent behavior during tool execution.
- **Audio Alignment Data**: Agents now have a flag to enable alignment data in audio events, useful for audio-text synchronization use cases such as lip sync applications.
- **Ignore Default Personality Setting**: The Agents Platform configuration page now includes a checkbox to toggle whether agents should ignore the default helpful personality, giving developers more control over agent behavior.
### Speech to Text
- **Fixed Base64 Encoding Flag**: Resolved an issue where the `is_base64_encoded` flag in STT responses was incorrectly set to false for PDF and DOCX formats, even when content was actually base64 encoded.
### SDK Releases
#### JavaScript SDK
- **v2.16.0**: Updated with latest API schema changes from September 19, 2025.
#### Packages
- **@elevenlabs/types@0.0.1**: New public TypeScript types package providing shared type definitions across ElevenLabs integrations.
- **@elevenlabs/react@0.7.0** and **@elevenlabs/client@0.7.0**: Added support for passing custom script paths to avoid blob: and data: URLs for improved security and flexibility.
- **@elevenlabs/convai-widget-embed@0.3.0** and **@elevenlabs/convai-widget-core@0.3.0**: Added `use_rtc` attribute for widget functionality and added expand event support for better widget interaction handling.
### API
## Updated Endpoints
### Agents Platform
- **POST /v1/convai/agents/create**: Added `ignore_default_personality` boolean parameter to control whether agents should ignore the default helpful personality behavior
- **PATCH /v1/convai/agents/{agent_id}**: Added `ignore_default_personality` field support for agent updates
- **GET /v1/convai/agents/{agent_id}**: Response now includes `ignore_default_personality` field in agent configuration
- **POST /v1/convai/mcp-servers**: Added interrupt control configuration parameters for disabling interruptions during tool calls
- **PATCH /v1/convai/mcp-servers/{mcp_server_id}**: Enhanced with interrupt control settings for MCP server configuration
- **GET /v1/convai/mcp-servers/{mcp_server_id}**: Response includes new interrupt control configuration fields
- **GET /v1/convai/conversations/{conversation_id}**: Response enhanced with alignment data fields for audio-text synchronization support
- **POST /v1/convai/agent-testing/create**: Enhanced to support workflow expressions functionality in agent testing
- **GET /v1/convai/agent-testing/{test_id}**: Response includes additional fields for workflow expression test results
- **PUT /v1/convai/agent-testing/{test_id}**: Request and response schemas updated for workflow expression support
- **POST /v1/convai/agents/{agent_id}/simulate-conversation**: Request schema updated to support workflow expressions in conversation simulation
- **POST /v1/convai/agents/{agent_id}/simulate-conversation/stream**: Streaming conversation simulation with workflow expression support
- **GET /v1/convai/settings**: Response includes new platform configuration options
- **PATCH /v1/convai/settings**: Request schema updated with new platform settings
- **POST /v1/convai/batch-calling/submit**: Request schema updates for batch calling functionality
- **PATCH /v1/convai/mcp-servers/{mcp_server_id}/approval-policy**: Response schema updated for approval policy management
- **POST /v1/convai/mcp-servers/{mcp_server_id}/tool-approvals**: Response schema enhanced for tool approval handling
- **DELETE /v1/convai/mcp-servers/{mcp_server_id}/tool-approvals/{tool_name}**: Response schema updated for tool approval removal
### Speech to Text
- **POST /v1/speech-to-text**: Fixed `is_base64_encoded` boolean flag to correctly return `true` when PDF and DOCX document content is base64 encoded
### Text to Speech
- **POST /v1/text-to-speech/{voice_id}/with-timestamps**: Request and response schemas updated for enhanced timestamp functionality
- **POST /v1/text-to-speech/{voice_id}/stream**: Request schema updated for improved streaming parameters
- **POST /v1/text-to-speech/{voice_id}/stream/with-timestamps**: Request and response schemas updated for streaming with timestamps
- **POST /v1/text-to-voice/create-previews**: Request schema enhanced with new preview generation options
- **POST /v1/text-to-voice**: Response schema updated with additional voice creation data
- **POST /v1/text-to-voice/{voice_id}/remix**: Request schema enhanced for voice remixing parameters
### Voice Management
- **GET /v1/voices**: Response schema updated with new voice metadata fields
- **GET /v1/voices/{voice_id}**: Response schema enhanced with additional voice properties
- **GET /v1/voices/settings/default**: Response schema updated for default voice settings
- **GET /v1/voices/{voice_id}/settings**: Response schema enhanced with new configuration options
- **POST /v1/voices/{voice_id}/settings/edit**: Request schema updated for voice settings modification
- **POST /v1/voices/pvc/{voice_id}/samples/{sample_id}**: Request schema enhanced for PVC sample management
- **GET /v1/voices/pvc/{voice_id}/samples/{sample_id}/audio**: Response schema updated for audio sample retrieval
- **GET /v1/voices/pvc/{voice_id}/samples/{sample_id}/speakers/{speaker_id}/audio**: Response schema enhanced for speaker-specific audio
- **POST /v1/voice-generation/create-voice**: Response schema updated with new voice generation data
### Studio
- **POST /v1/studio/podcasts**: Request schema enhanced with new podcast creation parameters
### User Management
- **GET /v1/user**: Response schema updated with additional user profile data
All changes are backward compatible and do not require immediate action from developers.
# September 22, 2025
### Productions launch
Introducing Productions - our new managed service offering for ordering human-edited content that looks, sounds and feels natural. Made for creators and media businesses.
Our network of linguists and audio professionals offer end-to-end production quality for:
- Dubbing
- Captions and subtitles
- Transcription
- Audiobooks
You can order a project directly from the 'Productions' page in your ElevenLabs account, or by emailing productions@elevenlabs.io. Pricing starts at $2/minute, contact us for more details.
### Agents Platform
- **MCP pre-tool speech**: Added support for configuring tools extracted from an MCP Server to require pre-tool execution speech. This enhancement allows agents to provide verbal context before executing specific tools, improving the conversational flow during tool usage.
- **ElevenLabs hosted LLMs**: Added support for [ElevenLabs hosted LLMs](/docs/agents-platform/customization/llm#elevenlabs-experimental) which unlock lower latency by running on ElevenLabs infrastructure alongside Speech to Text and Text to Speech services.
- **Enum values for tool parameters**: Added support for specifying a tool's parameters as [enum values](/docs/api-reference/tools/create#response.body.tool_config.WebhookToolConfig.api_schema.request_body_schema.properties.LiteralJsonSchemaProperty.enum) for greater control
### SDK Releases
#### JavaScript SDK
- **v2.16.0**: Updated the [elevenlabs-js](https://github.com/elevenlabs/elevenlabs-js) SDK with the latest API schema changes, including new MCP server endpoints and enhanced history filtering capabilities.
#### Python SDK
- **v2.16.0**: Updated the [elevenlabs-python](https://github.com/elevenlabs/elevenlabs-python) SDK with the latest API schema changes, including new MCP server endpoints and enhanced history filtering capabilities.
- **v2.15.1**: Fixed conversation handling when no authentication is required and added asyncio event loop support for better async operations.
#### Package Updates
- **@elevenlabs/agents-cli@0.3.2**: Updated the Agents CLI package with improvements to agent development tools. The ConvAI CLI has been renamed to Agents CLI to align with the ElevenLabs Agents Platform branding.
- **@elevenlabs/convai-cli@0.2.3**: Final release of the legacy ConvAI CLI package before migration to the new Agents CLI.
- **@elevenlabs/react@0.6.3**: Updated the React components package with enhanced functionality.
### API
## New Endpoints
- `PATCH /v1/convai/mcp-servers/{mcp_server_id}` - [Update MCP Server Configuration](/docs/api-reference/mcp/update): Added new endpoint to update MCP server configurations, replacing the deprecated approval policy endpoint.
## Updated Endpoints
### History Management
- `GET /v1/history` - [Get generated items](/docs/api-reference/history/list): Enhanced with additional filtering parameters:
- Added `model_id` parameter for filtering by specific models
- Added `date_before_unix` parameter for filtering items before a specific date
- Added `date_after_unix` parameter for filtering items after a specific date
- Added `sort_direction` parameter for controlling sort order
## Deprecated Endpoints
- `PATCH /v1/convai/mcp-servers/{mcp_server_id}/approval-policy` - Deprecated in favor of the new general MCP server update endpoint
# September 15, 2025
### Text to Speech
- **WebSocket output format**: Added support for specifying output format in the first message of a WebSocket connection, providing greater flexibility for real-time audio streaming workflows.
### Agents Platform
- **First message interruption control**: Added `disable_first_message_interruptions` setting to prevent agents from being interrupted during important opening messages like legal disclaimers.
### MCP Server
- **Version 0.8.1**: Added data residency support.
## SDK Releases
### JavaScript SDK
- **v2.15.0** - Added new Text to Voice Remix endpoint
### Python SDK
- **v2.15.1** - Fixed conversation authentication issue and added asyncio event loop support
- **v2.15.0** - Added new Text to Voice Remix endpoint and fixed Pydantic issues
### Packages
- **@elevenlabs/react@0.6.2** - Added correction and MCP tool call events
- **@elevenlabs/client@0.6.2** - Added correction and MCP tool call events
- **@elevenlabs/react-native@0.3.1** - Added correction and MCP tool call events
## API
## New Endpoints
- `DELETE /v1/speech-to-text/transcripts/{transcription_id}` - [Delete Transcript By Id](/docs/api-reference/speech-to-text/delete)
## Updated Endpoints
### Backward Compatible Changes
- [Get dubbing](/docs/api-reference/dubbing/get) - Added the optional `order_by` and `order_direction` parameters.
- [List Agents](/docs/api-reference/agents/list) - Added the optional `sort_by` and `sort_direction` parameters.
- [List knowledge base documents](/docs/api-reference/knowledge-base/list) - Added the optional `sort_by` and `sort_direction` parameters.
# September 8, 2025
### Text to Speech
- **Language code support**: All Text to Speech models now support language codes for improved output. Normalization has been enabled for Eleven v3, Flash, and Turbo models to enhance audio quality and consistency.
### Agents Platform
- **Multi-voice agent history**: Messages from multi-voice agents are now displayed in conversation history with clear separation by voice, making it easier to follow which voice spoke which part of a conversation.
### SDK Releases
#### JavaScript SDK
- **v2.15.0** - Adds support for new voice remix functionality
#### Python SDK
- **v2.15.0** - Adds support for new voice remix functionality. Also fixed an issue with Pydantic.
#### React Components
- **@elevenlabs/react@0.6.1** - Fix output bytes and device input/output switching
- **@elevenlabs/client@0.6.1** - Fix output bytes and device input/output switching
### MCP Server
- **v0.7.0** - Latest release of the [ElevenLabs MCP Server](https://github.com/elevenlabs/elevenlabs-mcp) with new features and improvements for Claude Desktop integration. Includes new `loop` parameter for SFX generation.
### API
## New Endpoints
- [Remix a voice](/docs/api-reference/text-to-voice/remix) - Create voice variations from existing voices
- [Get Transcript By Id](/docs/api-reference/speech-to-text/get) - Retrieve specific transcription results
## Updated Endpoints
### Backward Compatible Changes
- [Get Project](/docs/api-reference/studio/get-project) - Added optional `share_id` query parameter for project sharing functionality
- [Convert Speech to Text](/docs/api-reference/speech-to-text/convert) - Modified `enable_logging` parameter for improved logging control
All API changes in this release are backward compatible and will not break existing integrations.
# September 1, 2025
### Agents Platform
- **Gemini 2.5 Flash Lite HIPAA compliance**: Added Gemini 2.5 Flash Lite to the list of [HIPAA approved models](/docs/agents-platform/legal/hipaa) for compliant conversations when a BAA is signed and zero-retention mode is enabled.
- **Conversation ID in signed URLs**: Added support for including conversation IDs in signed URL requests, providing better tracking and identification capabilities for conversation audio access.
## SDK Releases
### JavaScript SDK
- **[v2.13.0](https://github.com/elevenlabs/elevenlabs-js)** - Released August 29, 2025. Adds support for new `loop` parameter in SFX.
### Python SDK
- **[v2.13.0](https://github.com/elevenlabs/elevenlabs-python)** - Released August 29, 2025. Adds support for new `loop` parameter in SFX.
### ConvAI packages
- **[@elevenlabs/react v0.6.0 and @elevenlabs/client v0.6.0](https://github.com/elevenlabs/packages)** - Released August 29, 2025. Fixed setVolume functionality, added client tool debugging, and added audio device controls.
### MCP Server
- **[ElevenLabs MCP Server v0.6.0](https://github.com/elevenlabs/elevenlabs-mcp)** - Released August 26, 2025. Fixed diarization functionality in speech-to-text and added music generation endpoints.
## API
## Updated Endpoints
### Dubbing
- **[Render project](/docs/api-reference/dubbing/resources/render-project)** - Added optional `should_normalize_volume` query parameter to control audio normalization during rendering
### Agents Platform
- **[Get signed URL](/docs/api-reference/conversations/get-signed-url)** - Added optional `include_conversation_id` query parameter to include conversation ID in the response
### Sound Effects
- **[Create sound effect](/docs/api-reference/text-to-sound-effects/convert)** - Added optional `loop` parameter to create sound effects that loop smoothly
## Removed Endpoints
- **Delete workspace member** - Removed the `DELETE /v1/workspace/members` endpoint for deleting workspace members. This endpoint was never meant to be publicly available.
# August 25, 2025
### Agents Platform
- **Agent testing framework**: Introduced a comprehensive testing framework for ElevenLabs agents, allowing developers to create, manage, and execute automated tests for their agents. This includes test creation, execution tracking, and result analysis capabilities.
- **Test invocation management**: Added support for resubmitting failed test invocations and viewing detailed test results to help developers debug and improve their agents.
- **Enhanced agent configuration**: Improved agent creation and management with additional workspace override capabilities and refined platform settings.
### Text to Speech
- **Pronunciation dictionary updates**: Added support for updating pronunciation dictionaries with PATCH operations, enabling more flexible dictionary management.
- **Enhanced timestamp support**: Improved timestamp generation for text-to-speech conversions with better alignment data and streaming capabilities.
### SDK Releases
- **TypeScript SDK v2.12.2**: Updated with the latest API schema changes, including full support for the new agent testing endpoints and enhanced Agents Platform capabilities.
- **Python SDK v2.12.1**: Released with complete support for all new API features, including agent testing framework and improved workspace resource management.
### API
## New Endpoints
Added 10 new endpoints this week:
### ElevenLabs agent Testing
- `POST /v1/convai/agent-testing/create` - [Create Agent Response Test](/docs/api-reference/tests/create) - Create automated tests for your ElevenLabs agents
- `GET /v1/convai/agent-testing/{test_id}` - [Get Agent Response Test By Id](/docs/api-reference/tests/get) - Retrieve specific test configurations and results
- `PUT /v1/convai/agent-testing/{test_id}` - [Update Agent Response Test](/docs/api-reference/tests/update) - Modify existing test setups and parameters
- `DELETE /v1/convai/agent-testing/{test_id}` - [Delete Agent Response Test](/docs/api-reference/tests/delete) - Remove test configurations from your workspace
- `POST /v1/convai/agent-testing/summaries` - [Get Agent Response Test Summaries By Ids](/docs/api-reference/tests/summaries) - Retrieve aggregated test results for multiple tests
- `GET /v1/convai/agent-testing` - [List Agent Response Tests](/docs/api-reference/tests/list) - Browse all available tests in your workspace
- `POST /v1/convai/agents/{agent_id}/run-tests` - [Run Tests On The Agent](/docs/api-reference/tests/run-tests) - Execute test suites against specific agents
- `GET /v1/convai/test-invocations/{test_invocation_id}` - [Get Test Invocation](/docs/api-reference/tests/test-invocations/get) - Retrieve detailed test execution results
- `POST /v1/convai/test-invocations/{test_invocation_id}/resubmit` - [Resubmit Tests](/docs/api-reference/tests/test-invocations/resubmit) - Re-run failed test invocations
### Pronunciation Dictionaries
- `PATCH /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}` - [Update Pronunciation Dictionary](/docs/api-reference/pronunciation-dictionaries/update) - Update existing pronunciation dictionaries with new rules or modifications
# August 20, 2025
### Eleven v3 API
Eleven v3 is now available via the API.
To start using it, simply specify the model ID `eleven_v3` when making [Text to Speech requests](/docs/api-reference/text-to-speech/convert).
Additionally the [Text to Dialogue](/docs/cookbooks/text-to-dialogue) API endpoint is now available to all.
### Music Generation API
The Eleven Music API is now freely available to all paid users.
Visit the [quickstart](/docs/cookbooks/music/quickstart) to lean how to integrate. The API section below highlights the new endpoints that have been released.
### Global TTS API preview
ElevenLabs is launching inference servers in additional geographical regions to reduce latency for clients outside of the US. Initial request processing will be available in the Netherlands and in Singapore in addition to the US.
To learn how to get started [head to the docs](/docs/best-practices/latency-optimization#global-tts-api-preview).
### API
## New Endpoints
- Added 4 new endpoints:
- [Compose music](/docs/api-reference/music/compose) - Create music from text prompts
- [Create composition plan](/docs/api-reference/music/create-composition-plan) - Optimize music generation parameters before processing
- [Compose music with details](/docs/api-reference/music/compose-detailed) - Advanced music generation with detailed parameters
- [Stream music](/docs/api-reference/music/stream) - Real-time streaming music generation
## Updated Endpoints
### Text to Speech
- Updated Text to Speech endpoints with improved parameter handling:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Enhanced voice settings and text input parameter handling
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Improved streaming parameter management
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Better alignment parameter handling
### Voice Management
- Updated Voice endpoints with enhanced parameter support:
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Improved preview generation parameters
- [Create voice from preview](/docs/api-reference/text-to-voice/create) - Enhanced voice creation options
- [Get voice](/docs/api-reference/voices/get) - Updated voice parameter responses
- [List voices](/docs/api-reference/voices/search) - Improved voice listing parameters
### Speech to Text
- Updated Speech to Text endpoint:
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Enhanced transcription parameter handling
### Usage and Analytics
- Updated Usage endpoints:
- [Get character stats](/docs/api-reference/usage/get) - Added aggregation bucket size parameter and improved breakdown type options
### Workspace Management
- Updated Workspace endpoints:
- [Get workspace resource](/docs/api-reference/workspace/get-resource) - Enhanced resource type parameter handling
- [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource) - Updated sharing parameter structure
- [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource) - Updated unsharing parameter structure
# August 11, 2025
### Music
**Eleven Music**: Officially released new music generation model that creates studio-grade music with natural language prompts in any style. See the [capabilities page](/docs/capabilities/music) and [prompting guide](/docs/best-practices/prompting/eleven-music) for more information.
### SDKs
v2.9.0 of the TypesScript SDK released
- Includes better typing support for Speech to Text requests in webhook mode
- Includes new enums for ChatGPT 5
v2.9.2 of the Python SDK released
- Includes new enums for ChatGPT 5
### Agents Platform
**Agent response correction**: Updated WebSocket event schema and handling for improved agent response correction functionality.
### API
### User Account Changes
- Updated user account endpoint:
- [Get user subscription info](/docs/api-reference/user/get) - Deprecated `convai_chars_per_minute` and `convai_asr_chars_per_minute` fields in the response schema. These fields will now always return `None`.
### Parameter Removals
- Updated conversation token endpoint:
- [Get conversation token](/docs/api-reference/conversations/get-webrtc-token) - Removed `source` and `version` query parameters. These were internal parameters not meant for public use and their removal does not affect functionality.
# August 4, 2025
### Agents Platform
- **Conversation token generation**: Added new route to generate Conversation Tokens for WebRTC connections. [Learn more](/docs/api-reference/conversations/get-webrtc-token)
- **Expandable widget options**: Our embeddable [widget](/docs/agents-platform/customization/widget) can now be customized to start in the expanded state and disable collapsing altogether.
- **Simplified operation IDs**: We simplified the OpenAPI operator IDs for Agents Platform endpoints to improve developer experience.
### Workspaces
- **Simplified operation IDs**: We simplified the operation IDs for our workspace endpoints to improve API usability.
### SDK Releases
- **Python SDK v2.8.2**: Released latest version with improvements and bug fixes. [View release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.8.2)
### NPM Packages
- **@elevenlabs/react-native@0.1.2**: Enhanced React Native support
- **@elevenlabs/client@0.4.4**: Client library improvements
- **@elevenlabs/react@0.4.5**: React component updates
### API
## New Endpoints
### Agents Platform
- [Get conversation token](/docs/api-reference/conversations/get-webrtc-token) - Generate authentication token for WebRTC connections
## Updated Endpoints
### Voice Management
- [List voices](/docs/api-reference/voices/search) - Added `voice_ids` query parameter for filtering specific voices
### Agents Platform Core
- [List conversations](/docs/api-reference/conversations/list) - Added `summary_mode` parameter for conversation summaries
### Operation ID Improvements
- **Agents Platform endpoints**: Simplified operation IDs for better developer experience while maintaining full backward compatibility
- **Workspace endpoints**: Streamlined operation IDs across all workspace-related endpoints to improve API usability
# July 28, 2025
### Workspaces
- **Service account API key management**: Added comprehensive API endpoints for managing service account API keys, including creation, retrieval, updating, and deletion capabilities. See [Service Accounts documentation](/docs/product-guides/administration/workspaces/service-accounts).
### Agents Platform
- **Post-call webhook migration**: The post call webhook format is being migrated so that webhook handlers can be auto generated in the SDKs. This is not a breaking change, and no further action is required if your current handler accepts additional fields. Please see more information [here](/docs/agents-platform/workflows/post-call-webhooks#migration-notice-enhanced-webhook-format).
- **Agent transfer improvements**: Fixed system variable `system_agent_id` to properly update after agent-to-agent transfers, ensuring accurate conversation context tracking. Added new `system_current_agent_id` variable for tracking current active agent. Learn more about [dynamic variables](/docs/agents-platform/customization/personalization/dynamic-variables#system-dynamic-variables).
- **Enhanced public agent page**: Added text input functionality and dynamic variable support to the public talk-to-agent page. You can now pass dynamic variables via URL parameters (e.g., `?var_username=value`) and use text input during voice conversations. See [dynamic variables guide](/docs/agents-platform/customization/personalization/dynamic-variables#public-talk-to-page-integration).
- **Voicemail detection**: Added voicemail detection as a built-in tool for ElevenLabs agents to improve call handling. Learn about [voicemail detection](/docs/agents-platform/customization/tools/system-tools/voicemail-detection).
- **Conversation filtering**: Added `user_id` query parameter to [conversation list endpoint](/docs/agents-platform/api-reference/conversations/list#request.query.user_id.user_id) for filtering conversations by initiating user.
### Speech to Text
- **Multi-channel transcription**: Added `use_multi_channel` parameter to transcription endpoint for processing audio files with multiple speakers on separate channels. Supports up to 5 channels with per-channel transcription results. See [multichannel guide](/docs/cookbooks/speech-to-text/multichannel-transcription).
### Studio
- **Caption support**: Added caption functionality to Studio projects with new `captions_enabled` and `caption_style` properties for both podcasts and general projects. Learn more about [Studio](/docs/product-guides/products/studio).
## SDKs
- **[JavaScript SDK v2.7.0](https://github.com/elevenlabs/elevenlabs-js)**: Released with latest API support and improvements
- **[Python SDK v2.8.1](https://github.com/elevenlabs/elevenlabs-python)**: Released with latest API support and improvements
- **[@elevenlabs/client v0.4.1](https://github.com/elevenlabs/packages/tree/main/packages/client)**: Updated client library with latest features, including WebRTC support
- **[@elevenlabs/react v0.4.1](https://github.com/elevenlabs/packages/tree/main/packages/react)**: Enhanced React components with latest features, including WebRTC support
- **[@elevenlabs/react-native v0.1.1](https://github.com/elevenlabs/packages/tree/main/packages/react-native)**: New React Native package for mobile integration with ElevenLabs Agents, based on WebRTC
- **[@elevenlabs/convai-widget-embed v0.1.0](https://github.com/elevenlabs/packages/tree/main/packages/convai-widget-embed)**: New package for embedding Agents Platform widgets into web applications
- **[Swift SDK v2.0.3](https://github.com/elevenlabs/elevenlabs-swift-sdk/releases/tag/v2.0.3)**: Released with WebRTC support for real-time Agents Platform integration on Apple platforms
## API Schema Updates
### New Endpoints
- **Service Account Management**: Added 5 new endpoints for service account API key management:
- `GET /v1/service-accounts/{service_account_user_id}/api-keys` - Retrieve service account API keys
- `POST /v1/service-accounts/{service_account_user_id}/api-keys` - Create service account API key
- `DELETE /v1/service-accounts/{service_account_user_id}/api-keys/{api_key_id}` - Delete service account API key
- `PATCH /v1/service-accounts/{service_account_user_id}/api-keys/{api_key_id}` - Update service account API key
- `GET /v1/service-accounts` - Get workspace service accounts
### Removed Endpoints
- **Legacy Project Endpoints**: Removed 22 deprecated project management endpoints as part of Studio API consolidation:
- All `/v1/projects/*` endpoints (replaced by `/v1/studio/projects/*`)
- Legacy Text to Voice endpoints (`/v1/text-to-voice/create-voice-from-preview`, `/v1/text-to-voice/remixing-sessions/*`)
- Legacy ConvAI knowledge base endpoints
### Updated Endpoints
#### Speech to Text
- **Multi-channel support**: Updated `/v1/speech-to-text` endpoint:
- Added `use_multi_channel` parameter for processing multi-speaker audio files
- Modified response structure to include optional `language_code`, `language_probability`, `text`, and `words` properties
#### Agents Platform
- **Enhanced agent configuration**: Updated agent creation and management endpoints:
- Added voicemail detection to built-in tools
- Improved RAG configuration with `max_retrieved_rag_chunks_count` parameter
- Enhanced conversation token endpoint with `source` and `version` parameters
- Added `user_id` filtering to conversations list endpoint
#### Studio Projects
- **Caption support**: Updated Studio project endpoints to include:
- `captions_enabled` property for enabling/disabling captions
- `caption_style` property for global caption styling configuration
#### Text to Voice
- **Improved voice generation**: Enhanced voice creation endpoints with:
- `loudness` control (-1 to 1 range, 0 corresponds to -24 LUFS)
- `quality` parameter for balancing output quality vs variety
- `guidance_scale` parameter for controlling AI creativity vs prompt adherence
# July 22, 2025
### Agents Platform
- **Agent workspace overrides**: Enhanced agent configuration with workspace-level overrides for better enterprise management and customization.
- **Agent API improvements**: Updated agent creation and modification endpoints with enhanced configuration options, though these changes may break backward compatibility.
### Dubbing
- **Dubbing endpoint access**: Added new endpoint to list all available dubs.
### API
## New Endpoints
- Added 1 new endpoints:
- [List dubs you have access to](/docs/api-reference/dubbing/list) - `GET /v1/dubbing`
## Updated Endpoints
### Text to Speech
- Updated Text to Speech endpoints with backward compatible changes:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Enhanced response schema
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Improved timestamp handling
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Enhanced streaming response
### Voice Management
- Updated Voice endpoints with backward compatible improvements:
- [Get voices](/docs/api-reference/voices/get-all) - Enhanced voice information schema
- [Get voice](/docs/api-reference/voices/get) - Improved voice details response
- [Get voice settings](/docs/api-reference/voices/get-settings) - Enhanced settings schema
### Voice Creation
- Updated Voice Creation endpoints:
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Enhanced preview creation
- [Create voice from preview](/docs/api-reference/text-to-voice/create) - Improved voice generation
- [Create voice](/docs/api-reference/text-to-voice/create) - Enhanced voice creation response
### Dubbing
- Updated Dubbing endpoints with backward compatible changes:
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Enhanced dubbing request schema
- [Get dubbing project](/docs/api-reference/dubbing/get) - Improved project response
### Workspace Management
- **Breaking Change**: Updated Workspace endpoints:
- [Get workspace resource](/docs/api-reference/workspace/get-resource) - Modified `resource_type` query parameter handling and response schema
- [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource) - Enhanced sharing configuration
- [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource) - Improved unsharing workflow
### Speech to Text
- Updated Speech to Text endpoint:
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Enhanced transcription request and response schemas
### Agents Platform
Updated Agents Platform endpoints with enhanced changes:
- [Create agent](/docs/api-reference/agents/create) - Modified agent creation schema with workspace overrides
- [Get agent](/docs/api-reference/agents/get) - Enhanced agent response with new configuration options
- [Update agent](/docs/api-reference/agents/update) - Improved agent update capabilities
- [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Enhanced conversation simulation
- [Stream conversation simulation](/docs/api-reference/agents/simulate-conversation-stream) - Improved streaming simulation
### Other Updates
- [Get conversation](/docs/api-reference/conversations/get-conversation) - Enhanced conversation details
- [Get Agents Platform settings](/docs/api-reference/workspace/get) - Improved settings response
- [Update Agents Platform settings](/docs/api-reference/workspace/update) - Enhanced settings modification
# July 14, 2025
### Agents Platform
- **Azure OpenAI custom LLM support**: Added support for Azure-hosted OpenAI models in custom LLM configurations. When using an Azure endpoint, a new required field for [API version](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.custom_llm.api_version) is now available in the UI.
- **Genesys output variables**: Added support for output variables when using [Genesys integrations](/docs/agents-platform/phone-numbers/c-caa-s-integrations/genesys), enabling better call analytics and data collection.
- **Gemini 2.5 Preview Models Deprecation**: [Models](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) `gemini-2.5-flash-preview-05-20` and `gemini-2.5-flash-preview-04-17` have been deprecated in Agents Platform as they are being deprecated on 15th July by Google. All agents using these models will automatically be transferred to `gemini-2.5-flash` the next time they are used. No action is required.
- **WebRTC rollout**: Began progressive rollout of WebRTC capabilities for improved connection stability and performance. WebRTC mode can be selected in the React SDK and is used in 11.ai.
- **Keypad touch tone**: Fixed an issue affecting playing keypad touch tones on Twilio. See [keypad touch tone documentation](/docs/agents-platform/customization/tools/system-tools/play-keypad-touch-tone).
### Voices
- **Language collection navigation**: Added quick navigation from language preview collections to view all available voices in that language, making it easier to explore voice options by language.
### Text to Voice
- **Preview streaming**: Added new streaming endpoint for Text to Voice previews, allowing real-time streaming of generated voice previews via `/v1/text-to-voice/{generated_voice_id}/stream`.
- **Enhanced voice design**: Added [`stream_previews`](/docs/api-reference/text-to-voice/design#request.body.stream_previews) option to voice design endpoint, enabling streaming-only preview generation for improved performance.
- **Improved parameter controls**: Enhanced [`loudness`](/docs/api-reference/text-to-voice/design#request.body.loudness), quality, and guidance scale parameters with better control options for more precise voice generation.
### Studio
- **Podcast customization**: Added support for [intro](/docs/api-reference/studio/create-podcast#request.body.intro) and [outro](/docs/api-reference/studio/create-podcast#request.body.outro) text in podcast creation, along with custom instructions prompts for better style and tone control.
### SDKs
- **[JavaScript SDK v2.6.0](https://github.com/elevenlabs/elevenlabs-js)**: Released with latest API support and improvements
- **[Python SDK v2.7.1](https://github.com/elevenlabs/elevenlabs-python)**: Released with bug fixes and enhancements
- **[@elevenlabs/client v0.3.0](https://github.com/elevenlabs/packages/tree/main/packages/client)**: Updated client library with support for User IDs in Agents Platform.
- **[@elevenlabs/react v0.3.0](https://github.com/elevenlabs/packages/tree/main/packages/react)**: Add WebRTC debug support.
### API
## New Endpoints
- Added 1 new endpoint:
- [Stream Text to Voice Preview](/docs/api-reference/text-to-voice/stream) - Stream generated voice previews in real-time
## Updated Endpoints
### Text to Voice
- [Create voice previews](/docs/api-reference/text-to-voice/create) - Enhanced `loudness`, `quality`, and `guidance_scale` parameter descriptions
- [Design voice](/docs/api-reference/text-to-voice/design) - Added `stream_previews` property for streaming-only preview generation
### Studio
- [Create podcast](/docs/api-reference/studio/create-podcast) - Added `intro`, `outro`, and `instructions_prompt` properties
### Agents Platform
- [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Enhanced simulation configuration with improved parameter descriptions
- [Stream simulate conversation](/docs/api-reference/agents/simulate-conversation-stream) - Enhanced simulation configuration with improved parameter descriptions
- [Get Agents Platform settings](/docs/api-reference/workspace/get) - Updated RAG retention period configuration
- [Update Agents Platform settings](/docs/api-reference/workspace/update) - Updated RAG retention period configuration
- [Retry batch calling](/docs/api-reference/batch-calling/retry) - Added batch retry functionality
# July 7, 2025
### Agents Platform
- **HIPAA Compliance**: [Gemini 2.5 Flash is now available for HIPAA customers](/docs/agents-platform/legal/hipaa), providing enhanced AI capabilities while maintaining strict healthcare compliance standards.
- **Post-call Audio**: Added support for returning call audio in [post-call webhooks](/docs/agents-platform/workflows/post-call-webhooks), enabling comprehensive conversation analysis and quality assurance workflows.
- **Enhanced Widget**: Added additional [text customization options](/docs/agents-platform/customization/widget) including start chat button text, chatting status text, and input placeholders for text-only and new conversations.
- **Agent Transfers**: Improved [agent transfer capabilities](/docs/agents-platform/customization/tools/system-tools/agent-transfer) with transfer delay configuration, custom transfer messages, and control over transferred agent first message behavior.
- **SIP Trunk Enhancements**: Added support for separate inbound and outbound [SIP trunk configurations](/docs/agents-platform/phone-numbers/sip-trunking) with enhanced access control and transfer options.
### Dubbing
- **API Schema Update**: Updated our API documentation to explicitly require the `target_language` parameter for [dubbing projects](/docs/capabilities/dubbing). This parameter has always been required - we're just making it clearer in our docs. No code changes needed.
- **Duration Validation**: Added validation to ensure calculated duration makes sense, preventing zero-credit charges for invalid audio uploads.
### Speech to Text
- **Deterministic Sampling**: Added `seed` parameter support for deterministic sampling, enabling reproducible [speech-to-text results](/docs/capabilities/speech-to-text).
### Forced Alignment
- **Confidence Scoring**: Added confidence scoring with `loss` field for words and overall transcript accuracy assessment using [forced alignment](/docs/capabilities/forced-alignment).
### Usage Analytics
- **Workspace Breakdown**: Added reporting workspace ID breakdown for character usage statistics, providing detailed usage insights across [workspaces](/docs/product-guides/administration/workspaces/overview).
### SDKs
- **React Agents Platform SDK**: Released [v0.2.0](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs%2Freact%400.2.0) with support for Indian data residency and WebRTC mode for Agents Platform.
- **Python SDK**: Released [v2.6.1](https://github.com/elevenlabs/elevenlabs-python/releases) with enhanced Agents Platform capabilities and bug fixes.
- **JavaScript SDK**: Released [v2.5.0](https://github.com/elevenlabs/elevenlabs-js/releases) with improved Agents Platform SDK support and new features.
### API
## Deprecations
- `POST /v1/convai/phone-numbers/create` has been deprecated in favor of [POST /v1/convai/phone-numbers](/docs/api-reference/phone-numbers/create). Please note that migrating to the new endpoint requires a few adjustments:
- Replace `provider_config` field with `inbound_trunk` and `outbound_trunk` for SIP trunk configurations
- Update response parsing to handle the new trunk configuration structure
### Schema Removals
- Removed `SIPTrunkConfigResponseModel`, `SIPTrunkCredentials`, `TransferToNumberToolConfig`
- Removed `incomplete_expired` and `canceled` subscription statuses
## New Features
### Enhanced SIP Trunk Support
- [SIP trunk configuration](/docs/agents-platform/phone-numbers/sip-trunking) now uses separate inbound and outbound trunk configs instead of single configuration
- Deprecated `provider_config` field in SIP trunk response from [the new endpoint](/docs/api-reference/phone-numbers/create) (replaced with `inbound_trunk` and `outbound_trunk`)
- Inbound trunk access control with allowed addresses and phone numbers
- SIP URI transfer destinations alongside phone number transfers
- Transfer to number improvements (conference or SIP refer)
### Agent Transfers
- [Transfer delay configuration](/docs/agents-platform/customization/tools/system-tools/agent-transfer) with `delay_ms`
- Custom transfer messages
- Control over transferred agent first message behavior
### Conversation Enhancements
- ElevenLabs Assistant integration tracking
- User ID tracking for conversation participants and initiators
- Audio data in [post-call webhooks](/docs/agents-platform/workflows/post-call-webhooks) (configurable)
- [MCP (Model Context Protocol)](/docs/agents-platform/customization/mcp) tool call details in conversation history
### Widget Improvements
- Additional [text customization options](/docs/agents-platform/customization/widget):
- Start chat button text
- Chatting status text
- Input placeholders for text-only and new conversations
### API Improvements
#### Speech to Text
- Added deterministic sampling with `seed` parameter in [Convert speech to text](/docs/api-reference/speech-to-text/convert)
#### Forced Alignment
- Added confidence scoring with `loss` field for words and overall transcript in [Forced alignment](/docs/api-reference/forced-alignment/create)
#### Usage Analytics
- Added reporting workspace ID breakdown for character stats in [Get characters usage metrics](/docs/api-reference/usage/get)
#### Tool Configuration
- [Client tool](/docs/agents-platform/customization/tools/client-tools) response timeout increased from 30 to 120 seconds
#### Workspace Resources
- Added agent response tests resource type
## Deprecations
- Phone number `provider_config` field (use `inbound_trunk`/`outbound_trunk` instead)
- `phone_number` field in transfer configurations (use `transfer_destination` instead)
# June 30, 2025
### Text to Voice
- **Voice Design**: Launched new [Text to Voice Design](/docs/api-reference/text-to-voice/design#request.body.model_id) with Eleven v3 for creating custom voices from text descriptions.
### Speech to Text
- **Enhanced Diarization**: Added `diarization_threshold` parameter to the [Speech to Text](/docs/api-reference/speech-to-text/convert#request.body.diarization_threshold.diarization_threshold) endpoint. Fine-tune the balance between speaker accuracy and total speaker count by adjusting the threshold between 0.1 and 0.4.
### Professional Voice Cloning
- **Background Noise Removal**: Added `remove_background_noise` to clean up voice samples using audio isolation models for [better quality training data](/docs/api-reference/voices/pvc/samples/create#request.body.remove_background_noise.remove_background_noise).
### Studio
- **Video Support Detection**: Added `has_video` property to chapter responses to indicate whether [chapters contain video content](/docs/api-reference/studio/get-chapters#response.body.chapters.has_video).
### Workspaces
- **Service Account Groups**: Service accounts can now be added to workspace groups for better permission management and access control.
- **Workspace Authentication**: Added support for workspace authentication connections, enabling secure webhook tool integrations with external services.
### SDKs
- **Python SDK**: Released [v2.6.0](https://github.com/elevenlabs/elevenlabs-python/releases) with latest API support and bug fixes.
- **JavaScript SDK**: Released [v2.5.0](https://github.com/elevenlabs/elevenlabs-js/releases) with latest API support and bug fixes.
- **React Agents Platform SDK**: Added WebRTC support in [0.2.0](https://github.com/elevenlabs/packages/releases/tag/%40elevenlabs%2Freact%400.2.0)
### API
## New Endpoints
- Added 2 new endpoints:
- [Design a Voice](/docs/api-reference/text-to-voice/design) - Create voice previews from text descriptions
- [Create Voice From Preview](/docs/api-reference/text-to-voice/create) - Convert voice previews to permanent voices
## Updated Endpoints
### Speech to Text
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `diarization_threshold` parameter for fine-tuning speaker separation
### Voice Management
- [Get voice sample audio](/docs/api-reference/voices/pvc/samples/create#request.body.remove_background_noise.remove_background_noise) - Added `remove_background_noise` query parameter and moved from request body to query parameters
# June 23, 2025
### Tools migration
- **Agents Platform tools migration**: The way tools in Agents Platform are handled is being migrated, please see the guide here to understand [what's changing and how to migrate](/docs/agents-platform/customization/tools/agent-tools-deprecation)
### Text to Speech
- **Audio tags automatic removal**: Audio tags are now automatically removed when switching from V3 to V2 models, ensuring optimal compatibility and performance.
### Agents Platform
- **Tools management UI**: Added a new comprehensive [tools management interface](https://elevenlabs.io/app/agents/tools) for creating, configuring, and managing tools across all agents in your workspace.
- **Streamlined agent creation**: Introduced a new [agent creation flow](https://elevenlabs.io/app/agents/new) with improved user experience and better configuration options.
- **Agent duplication**: Added the ability to [duplicate existing agents](/docs/api-reference/agents/duplicate), allowing you to quickly create variations of successful agent configurations.
### SIP Trunking
- **Inbound media encryption**: Added support for configurable [inbound media encryption settings](/docs/agents-platform/phone-numbers/sip-trunking#configure-transport-and-encryption) for SIP trunk phone numbers, enhancing security options.
### Voices
- **Famous voice category**: Added a new "famous" voice category to the voice library, expanding the available voice options for users.
### Dubbing
- **CSV frame rate control**: Added `csv_fps` parameter to control frame rate when parsing CSV files for dubbing projects, providing more precise timing control.
## SDKs
- **ElevenLabs JavaScript SDK v2.4.0**: Released with new Agents Platform SDK support for Node.js. [View release notes](https://github.com/elevenlabs/elevenlabs-js/releases)
- **ElevenLabs Python SDK v2.5.0**: Updated with enhanced Agents Platform capabilities. [View release notes](https://github.com/elevenlabs/elevenlabs-python/releases)
### API
## New Endpoints
### Agents Platform
- [Duplicate agent](/docs/api-reference/agents/duplicate) - Create a new agent by duplicating an existing one
- [Create tool](/docs/api-reference/tools/create) - Add a new tool to the available tools in the workspace
- [List tools](/docs/api-reference/tools/list) - Retrieve all tools available in the workspace
- [Get tool](/docs/api-reference/tools/get) - Retrieve a specific tool configuration
- [Update tool](/docs/api-reference/tools/update) - Update an existing tool configuration
- [Delete tool](/docs/api-reference/tools/delete) - Remove a tool from the workspace
- [Get tool dependent agents](/docs/api-reference/tools/get-dependent-agents) - List all agents that depend on a specific tool
## Updated Endpoints
### Agents Platform
- **Agent configuration**:
- Added `built_in_tools` configuration for system tools management
- Deprecated inline `tools` configuration in favor of `tool_ids` for better tool management
- **Tool system**:
- Refactored tool configuration structure to use centralized tool management
### Dubbing
- **CSV processing**:
- [Create dubbing project](/docs/api-reference/dubbing/create) - Added `csv_fps` parameter for custom frame rate control
### SIP Trunking
- **Phone number creation**:
- [Create SIP trunk phone number](/docs/api-reference/phone-numbers) - Added `inbound_media_encryption` parameter for security configuration
### Voice Library
- **Voice categories**:
- Updated voice response models to include "famous" as a new voice category option
- Enhanced voice search and filtering capabilities
# June 17, 2025
### Agents Platform
- **Dynamic variables in simulated conversations**: Added support for [dynamic variable population in simulated conversations](/docs/api-reference/agents/simulate-conversation#request.body.simulation_specification.simulated_user_config.dynamic_variables), enabling more flexible and context-aware conversation testing scenarios.
- **MCP server integration**: Introduced comprehensive support for [Model Context Protocol (MCP) servers](/docs/agents-platform/customization/mcp), allowing agents to connect to external tools and services through standardized protocols with configurable approval policies.
- **Burst pricing for extra concurrency**: Added [bursting capability](/docs/agents-platform/guides/burst-pricing) for workspace call limits, automatically allowing up to 3x the configured concurrency limit during peak usage for overflow capacity.
### Studio
- **JSON content initialization**: Added support for initializing Studio projects with structured JSON content through the `from_content_json` parameter, enabling programmatic project creation with predefined chapters, blocks, and voice configurations.
### Workspaces
- **Webhook management**: Introduced workspace-level webhook management capabilities, allowing administrators to view, configure, and monitor webhook integrations across the entire workspace with detailed usage tracking and failure diagnostics.
### API
## New Endpoints
### Agents Platform - MCP Servers
- [Create MCP server](/docs/api-reference/mcp/create) - Create a new MCP server configuration in the workspace
- [List MCP servers](/docs/api-reference/mcp/list) - Retrieve all MCP server configurations available in the workspace
- [Get MCP server](/docs/api-reference/mcp/get) - Retrieve a specific MCP server configuration from the workspace
- [Update MCP server approval policy](/docs/api-reference/mcp/approval-policies/update) - Update the approval policy configuration for an MCP server
- [Create MCP server tool approval](/docs/api-reference/mcp/approval-policies/create) - Add approval for a specific MCP tool when using per-tool approval mode
- [Delete MCP server tool approval](/docs/api-reference/mcp/approval-policies/delete) - Remove approval for a specific MCP tool when using per-tool approval mode
### Workspace
- [Get workspace webhooks](/docs/api-reference/webhooks/list) - Retrieve all webhook configurations for the workspace with optional usage information
## Updated Endpoints
### Agents Platform
- **Agent simulation**:
- [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Added `dynamic_variables` parameter for populating conversation context with runtime values
- [Simulate conversation stream](/docs/api-reference/agents/simulate-conversation-stream) - Added `dynamic_variables` parameter for streaming conversation simulations
- **Agent configuration**:
- [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.call_limits) - Added `bursting_enabled` parameter to control burst pricing for call limits
- **WebSocket events**:
- Enhanced `ClientEvent` enum to include `mcp_connection_status` for real-time MCP server monitoring
- **Conversation charging**:
- Added `is_burst` indicator to conversation metadata for tracking burst pricing usage
### Studio
- [Create Studio project](/docs/api-reference/studio/add-project#request.body.from_content_json.from_content_json) - Added `from_content_json` parameter for JSON-based project setup
### User Management
- **User profile**:
- [Get user](/docs/api-reference/user/get) - Deprecated `can_use_delayed_payment_methods` field in user response model
### Subscription Management
- **Subscription status**:
- Removed `canceled` and `unpaid` from available subscription status types, streamlining subscription state management
# June 8, 2025
### Text to Speech
- **Eleven v3 (alpha)**: Released Eleven v3 (alpha), our most expressive Text to Speech model, as a research preview.
### Agents Platform
- **Custom voice settings in multi-voice**: Added support for configuring individual [voice settings per supported voice](/docs/agents-platform/customization/voice/multi-voice-support) in multi-voice agents, allowing fine-tuned control over stability, speed, similarity boost, and streaming latency for each voice.
- **Silent transfer to human in Twilio**: Added backend configuration support for silent (cold) [transfer to human](/docs/agents-platform/customization/tools/system-tools/transfer-to-human) in the Twilio native integration, enabling seamless handoff without announcing the transfer to callers.
- **Batch calling retry and cancel**: Added support for retrying outbound calls to phone numbers that did not respond during a [batch call](/docs/agents-platform/phone-numbers/batch-calls), along with the ability to cancel ongoing batch operations for better campaign management.
- **LLM pinning**: Added support for [versioned LLM models with explicit checkpoint identifiers](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm)
- **Custom LLM headers**: Added support for passing [custom headers to custom LLMs](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.custom_llm.request_headers)
- **Fixed issue in non-latin languages**: Fixed an issue causing some conversations in non latin alphabet languages to fail.
### SDKs
- **Python SDK v2.3.0**: Released [Python SDK v2.3.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.3.0)
- **JavaScript SDK v2.2.0**: Released [JavaScript SDK v2.2.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.2.0)
### API
## New Endpoints
### Agents Platform
- **Batch Calling**:
- [Cancel batch call](/docs/api-reference/batch-calling/cancel) - Cancel a running batch call and set all recipients to cancelled status
- [Retry batch call](/docs/api-reference/batch-calling/retry) - Retry a batch call by setting completed recipients back to pending status
- **Knowledge Base RAG**:
- [Get document RAG indexes](/docs/api-reference/knowledge-base/get-document-rag-indexes) - Get information about all RAG indexes of a knowledge base document
- [Delete document RAG index](/docs/api-reference/knowledge-base/delete-document-rag-index) - Delete a specific RAG index for a knowledge base document
- [RAG index overview](/docs/api-reference/knowledge-base/rag-index-overview) - Get total size and information of RAG indexes used by knowledge base documents
## Updated Endpoints
### Agents Platform
- **Supported Voices**:
- [Agent configuration](/docs/api-reference/agents/update#request.body.tts.supported_voices) - Added `optimize_streaming_latency`, `stability`, `speed`, and `similarity_boost` parameters for per-voice TTS customization
- **Transfer to Human**:
- [Agent configuration](/docs/api-reference/agents/update#request.body.system_tools.transfer_to_number) - Added `enable_client_message` parameter to control whether a message is played to the client during transfer
- **Knowledge Base**:
- Knowledge base documents now use `supported_usages` instead of `prompt_injectable` for better usage mode control
- RAG index creation now returns enhanced response model with usage information
- **Custom LLM**:
- [Agent configuration](/docs/api-reference/agents/update#request.body.llm.custom_llm) - Added `request_headers` parameter for custom header configuration
- **Widget Configuration**:
- [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.widget_config) - Added comprehensive `styles` configuration for widget appearance customization
- **LLM**:
- Added support for [versioned LLM models](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) with explicit version identifiers
# June 1, 2025
### Agents Platform
- **Multi-voice support for agents**: Enable ElevenLabs agents to [dynamically switch between different voices](/docs/agents-platform/customization/voice/multi-voice-support) during conversations for multi-character storytelling, language tutoring, and role-playing scenarios.
- **Claude Sonnet 4 support**: Added [Claude Sonnet 4 as a new LLM option](/docs/agents-platform/customization/llm#anthropic) for conversational agents, providing enhanced reasoning capabilities and improved performance.
- **Genesys Cloud integration**: Introduced AudioHook Protocol integration for seamless connection with [Genesys Cloud contact center platform](/docs/agents-platform/phone-numbers/c-caa-s-integrations/genesys).
- **Force delete knowledge base documents**: Added [`force` parameter](/docs/api-reference/knowledge-base/delete#request.query.force.force) to knowledge base document deletion, allowing removal of documents even when used by agents.
- **Multimodal widget**: Added text input and text-only mode defaults for better user experience with [improved widget configuration](/docs/agents-platform/customization/widget).
### API
## Updated Endpoints
### Speech to Text
- [Create transcript](/docs/api-reference/speech-to-text/convert) - Added `webhook` parameter for asynchronous processing with webhook delivery
### Agents Platform
- **Knowledge Base**:
- [Delete knowledge base document](/docs/api-reference/knowledge-base/delete) - Added `force` query parameter to delete documents regardless of agent dependencies
- **Widget**:
- [Widget configuration](/docs/api-reference/widget/get#response.body.widget_config.supports_text_only) - Added text input and text-only mode support for multi-modality
# May 26, 2025
### Forced Aligment
- **Forced alignment improvements**: Fixed a rare failure case in forced alignment processing to improve reliability.
### Voices
- **Live moderated voices filter**: Added `include_live_moderated` query parameter to the shared voices endpoint, allowing you to include or exclude voices that are live moderated.
### Agents Platform
- **Secret dynamic variables**: Added support for specifying dynamic variables as secrets with the `secret__` prefix. Secret dynamic variables can only be used in webhook tool headers and are never sent to an LLM, enhancing security for sensitive data. [Learn more](/docs/agents-platform/customization/personalization/dynamic-variables#secret-dynamic-variables).
- **Skip turn system tool**: Introduced a new system tool called **skip_turn**. When enabled, the agent will skip its turn if the user explicitly indicates they need a moment to think or perform an action (e.g., "just a sec", "give me a minute"). This prevents turn timeout from being triggered during intentional user pauses. See the [skip turn tool docs](/docs/agents-platform/customization/tools/system-tools/skip-turn) for more information.
- **Text input support**: Added text input support in websocket connections via "user_message" event with text field. Also added "user_activity" event support to indicate typing or other UI activity, improving agent turn-taking when there's interleaved text and audio input.
- **RAG chunk limit**: Added ability to configure the [maximum number of chunks](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.rag.max_retrieved_rag_chunks_count) collected during RAG retrieval, giving users
more control over context window usage and costs.
- **Enhanced widget configuration**: Expanded widget customization options to include [text input and text only mode](/docs/api-reference/widget/get#response.body.widget_config.text_only).
- **LLM usage calculator**: Introduced tools to calculate expected LLM token usage and costs for agents, helping with cost estimation and planning.
### Audio Native
- **Accessibility improvements**: Enhanced accessibility for the AudioNative player with multiple improvements:
- Added aria-labels for all buttons
- Enabled keyboard navigation for all interactive elements
- Made progress bar handle focusable and keyboard-accessible
- Improved focus indicator visibility for better screen reader compatibility
### API
## New Endpoints
- Added 3 new endpoints:
- [Get Agent Knowledge Base Size](/docs/agents-platform/api-reference/knowledge-base/size) - Returns the number of pages in the agent's knowledge base.
- [Calculate Agent LLM Usage](/docs/agents-platform/api-reference/llm-usage/calculate) - Calculates expected number of LLM tokens needed for the specified agent.
- [Calculate LLM Usage](/docs/agents-platform/api-reference/llm-usage/calculate) - Returns a list of LLM models and the expected cost for using them based on the provided values.
## Updated Endpoints
### Voices
- [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_live_moderated` query parameter to `GET /v1/shared-voices` to filter voices by live moderation status.
### Agents Platform
- **Agent Configuration**:
- Enhanced system tools with new `skip_turn` tool configuration
- Improved RAG configuration with `max_retrieved_rag_chunks_count` parameter
- **Widget Configuration**:
- Added support for text-only mode
- **Batch Calling**:
- Batch call responses now include `phone_provider` field with default value "twilio"
### Text to Speech
- **Voice Settings**:
- Added `quality` parameter to voice settings for controlling audio generation quality
- Model response schema updated to include `can_use_quality` field
# May 19, 2025
### SDKs
- **SDKs V2**: Released new v2 SDKs for both [Python](https://github.com/elevenlabs/elevenlabs-python) and [JavaScript](https://github.com/elevenlabs/elevenlabs-js)
### Speech to Text
- **Speech to text logprobs**: The Speech to Text response now includes a `logprob` field for word prediction confidence.
### Billing
- **Improved API error messages**: Enhanced API error messages for subscriptions with failed payments. This provides clearer information if a failed payment has caused a user to reach their quota threshold sooner than expected.
### Agents Platform
- **Batch calls**: Released new batch calling functionality, which allows you to [automate groups of outbound calls](/docs/agents-platform/phone-numbers/batch-calls).
- **Increased evaluation criteria limit**: The maximum number of evaluation criteria for agent performance evaluation has been increased from 5 to 10.
- **Human-readable IDs**: Introduced human-readable IDs for key Agents Platform entities (e.g., agents, conversations). This improves usability and makes resources easier to identify and manage through the API and UI.
- **Unanswered call tracking**: 'Not Answered' outbound calls are now reliably detected and visible in the conversation history.
- **LLM cost visibility in dashboard**: The Agents Platform dashboard now displays the total and per-minute average LLM costs.
- **Zero retention mode (ZRM) for agents**: Allowed enabling Zero Retention Mode (ZRM) per agent.
- **Dynamic variables in headers**: Added option of setting dynamic variable as a [header value for tools](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.webhook.api_schema.request_headers.Conv-AI-Dynamic-Variable)
- **Customisable tool timeouts**: Shipped setting different [timeout durations per tool](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.client.response_timeout_secs).
### Workspaces
- **Simplified secret updates**: Workspace secrets can now be updated more granularly using a `PATCH` request via the API, simplifying the management of individual secret values. For technical details, please see the API changes section below.
### API
## New Endpoints
- Added 6 new endpoints:
- [Get Signed Url](/docs/agents-platform/api-reference/conversations/get-signed-url) - Get a signed URL to start a conversation with an agent that requires authorization.
- [Simulate Conversation](/docs/agents-platform/api-reference/agents/simulate-conversation) - Run a conversation between an agent and a simulated user.
- [Simulate Conversation (Stream)](/docs/agents-platform/api-reference/agents/simulate-conversation-stream) - Run and stream a conversation simulation between an agent and a simulated user.
- [Update Convai Workspace Secret](/docs/agents-platform/api-reference/workspace/secrets/update) - Update an existing secret for the Convai workspace.
- [Submit Batch Call Request](/docs/agents-platform/api-reference/batch-calling/create) - Submit a batch call request to schedule calls for multiple recipients.
- [Get All Batch Calls for Workspace](/docs/agents-platform/api-reference/batch-calling/list) - Retrieve all batch calls for the current workspace.
## Updated Endpoints
### Agents Platform
- **Agents & Conversations**:
- Endpoint `GET /v1/convai/conversation/get_signed_url` (snake_case path) has been deprecated. Use the new `GET /v1/convai/conversation/get-signed-url` (kebab-case path) instead.
- **Phone Numbers**:
- [Get Phone Number Details](/docs/agents-platform/api-reference/phone-numbers/get) - Response schema for `GET /v1/convai/phone-numbers/{phone_number_id}` updated to distinct `Twilio` and `SIPTrunk` provider details.
- [Update Phone Number](/docs/agents-platform/api-reference/phone-numbers/update) - Response schema for `PATCH /v1/convai/phone-numbers/{phone_number_id}` updated similarly for `Twilio` and `SIPTrunk`.
- [List Phone Numbers](/docs/agents-platform/api-reference/phone-numbers/list) - Response schema for `GET /v1/convai/phone-numbers/` list items updated for `Twilio` and `SIPTrunk` providers.
### Text To Speech
- [Text to Speech Endpoints](/docs/api-reference/text-to-speech) - Default `model_id` changed from `eleven_monolingual_v1` to `eleven_multilingual_v2` for the following endpoints:
- `POST /v1/text-to-speech/{voice_id}/stream`
- `POST /v1/text-to-speech/{voice_id}/stream-with-timestamps`
- `POST /v1/text-to-speech/{voice_id}`
- `POST /v1/text-to-speech/{voice_id}/with-timestamps`
### Voices
- [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_custom_rates` query parameter to `GET /v1/shared-voices`.
- **Schema Updates**:
- `LibraryVoiceResponseModel` and `VoiceSharingResponseModel` now include an optional `fiat_rate` field (USD per 1000 credits).
# May 12, 2025
### Billing
- **Downgraded Plan Pricing Fix**: Fixed an issue where customers with downgraded subscriptions were shown their current price instead of the correct future price.
### Agents Platform
- **Edit Knowledge Base Document Names**: You can now edit the names of knowledge base documents.
See: [Knowledge Base](/docs/agents-platform/customization/knowledge-base)
- **Conversation Simulation**: Released a [new endpoint](/docs/agents-platform/api-reference/agents/simulate-conversation) that allows you to test an agent over text
### Studio
- **Export Paragraphs as Zip**: Added support for exporting separated paragraphs in a zip file.
See: [Studio](/docs/product-guides/products/studio)
### SDKs
- **Released new SDKs**:
- [ElevenLabs Python v1.58.1](https://github.com/elevenlabs/elevenlabs-python)
- [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js)
### API
#### New Endpoints
- [Update metadata for a speaker](/docs/api-reference/dubbing)
`PATCH /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}`
Amend the metadata associated with a speaker, such as their voice. Both voice cloning and using voices from the ElevenLabs library are supported.
- [Search similar voices for a speaker](/docs/api-reference/dubbing)
`GET /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}/similar-voices`
Fetch the top 10 similar voices to a speaker, including IDs, names, descriptions, and sample audio.
- [Simulate a conversation](/docs/api-reference/agents/simulate-conversation)
`POST /v1/convai/agents/{agent_id}/simulate_conversation`
Run a conversation between the agent and a simulated user.
- [Simulate a conversation (stream)](/docs/api-reference/agents/simulate-conversation-stream)
`POST /v1/convai/agents/{agent_id}/simulate_conversation/stream`
Stream a simulated conversation between the agent and a simulated user.
- [Handle outbound call via SIP trunk](/docs/api-reference/sip-trunk/outbound-call)
`POST /v1/convai/sip-trunk/outbound-call`
Initiate an outbound call using SIP trunking.
#### Updated Endpoints
- [List conversations](/docs/api-reference/conversations/get-conversations)
`GET /v1/convai/conversations`
Added `call_start_after_unix` query parameter to filter conversations by start date.
- [Update knowledge base document](/docs/api-reference/knowledge-base/update-knowledge-base-document)
`PATCH /v1/convai/knowledge-base/{documentation_id}`
Now supports updating the name of a document.
- [Text to Speech endpoints](/docs/api-reference/text-to-speech)
The default model for all TTS endpoints is now `eleven_multilingual_v2` (was `eleven_monolingual_v1`).
#### Removed Endpoints
- None.
# May 5, 2025
### Dubbing
- **Disable Voice Cloning**: Added an option in the [Dubbing Studio UI](https://elevenlabs.io/app/dubbing) to disable voice cloning when uploading audio, aligning with the existing `disable_voice_cloning` API parameter.
### Billing
- **Quota Exceeded Error**: Improved error messaging for exceeding character limits. Users attempting to generate audio beyond their quota within a short billing window will now receive a clearer `401 unauthorized: This request exceeds your quota limit of...` error message indicating the limit has been exceeded.
## SDKs
- **Released new SDKs**: Added [ElevenLabs Python v1.58.0](https://github.com/elevenlabs/elevenlabs-python) and [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js) to fix a breaking change that had been mistakenly shipped
# April 28, 2025
### Agents Platform
- **Custom Dashboard Charts**: The Agents Platform dashboard can now be extended with custom charts displaying the results of evaluation criteria over time. See the new [GET](/docs/api-reference/workspace/dashboard/get) and [PATCH](/docs/api-reference/workspace/dashboard/update) endpoints for managing dashboard settings.
- **Call History Filtering**: Added the ability to filter the call history by start date using the new `call_start_before_unix` parameter in the [List Conversations](/docs/agents-platform/api-reference/conversations/list#request.query.call_start_before_unix) endpoint. [Try it here](https://elevenlabs.io/app/agents/history).
- **Server Tools**: Added option of making PUT requests in [server tools](/docs/agents-platform/customization/tools/server-tools)
- **Transfer to human**: Added call forwarding functionality to support forwarding to operators, see docs [here](/docs/agents-platform/customization/tools/system-tools/transfer-to-human)
- **Language detection**: Fixed an issue where the [language detection system tool](/docs/agents-platform/customization/tools/system-tools/language-detection) would trigger on a user replying yes in non-English language.
### Usage Analytics
- **Custom Aggregation**: Added an optional `aggregation_interval` parameter to the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint to control the interval over which to aggregate character usage (hour, day, week, month, or cumulative).
- **New Metric Breakdowns**: The Usage Analytics section now supports additional metric breakdowns including `minutes_used`, `request_count`, `ttfb_avg`, and `ttfb_p95`, selectable via the new `metric` parameter in the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint. Furthermore, you can now get a breakdown and filter by `request_queue`.
### API
## New Endpoints
- Added 2 new endpoints for managing Agents Platform dashboard settings:
- [Get Dashboard Settings](/docs/api-reference/workspace/dashboard/get) - Retrieves custom chart configurations for the ConvAI dashboard.
- [Update Dashboard Settings](/docs/api-reference/workspace/dashboard/update) - Updates custom chart configurations for the ConvAI dashboard.
## Updated Endpoints
### Audio Generation (TTS, S2S, SFX, Voice Design)
- Updated endpoints to support new `output_format` option `pcm_48000`:
- [Text to Speech](/docs/api-reference/text-to-speech/convert) (`POST /v1/text-to-speech/{voice_id}`)
- [Text to Speech with Timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/with-timestamps`)
- [Text to Speech Stream](/docs/api-reference/text-to-speech/convert-as-stream) (`POST /v1/text-to-speech/{voice_id}/stream`)
- [Text to Speech Stream with Timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/stream/with-timestamps`)
- [Speech to Speech](/docs/api-reference/speech-to-speech/convert) (`POST /v1/speech-to-speech/{voice_id}`)
- [Speech to Speech Stream](/docs/api-reference/speech-to-speech/stream) (`POST /v1/speech-to-speech/{voice_id}/stream`)
- [Sound Generation](/docs/api-reference/text-to-sound-effects/convert) (`POST /v1/sound-generation`)
- [Create Voice Previews](/docs/api-reference/legacy/voices/create-previews) (`POST /v1/text-to-voice/create-previews`)
### Usage Analytics
- Updated usage metrics endpoint:
- [Get Usage Metrics](/docs/api-reference/usage/get) (`GET /v1/usage/character-stats`) - Added optional `aggregation_interval` and `metric` query parameters.
### Agents Platform
- Updated conversation listing endpoint:
- [List Conversations](/docs/agents-platform/api-reference/conversations/list#request.query.call_start_before_unix) (`GET /v1/convai/conversations`) - Added optional `call_start_before_unix` query parameter for filtering by start date.
## Schema Changes
### Agents Platform
- Added detailed LLM usage and pricing information to conversation [charging and history models](/docs/agents-platform/api-reference/conversations/get#response.body.metadata.charging).
- Added `tool_latency_secs` to [tool result schemas](/docs/api-reference/conversations/get-conversation#response.body.transcript.tool_results.tool_latency_secs)
- Added `access_info` to [`GET /v1/convai/agents/{agent_id}`](/docs/api-reference/agents/get#response.body.access_info)
# April 21, 2025
### Professional Voice Cloning (PVC)
- **PVC API**: Introduced a comprehensive suite of API endpoints for managing Professional Voice Clones (PVC). You can now programmatically create voices, add/manage/delete audio samples, retrieve audio/waveforms, manage speaker separation, handle verification, and initiate training. For a full list of new endpoints check the API changes summary below or read the PVC API reference [here](/docs/api-reference/voices/pvc/create).
### Speech to Text
- **Enhanced Export Options**: Added options to include or exclude timestamps and speaker IDs when exporting Speech to Text results in segmented JSON format via the API.
### Agents Platform
- **New LLM Models**: Added support for new GPT-4.1 models: `gpt-4.1`, `gpt-4.1-mini`, and `gpt-4.1-nano` [here](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm)
- **VAD Score**: Added a new client event which sends VAD scores to the client, see reference [here](/docs/agents-platform/customization/events/client-events#vad_score)
### API
## New Endpoints
- Added 16 new endpoints:
- [Create PVC Voice](/docs/api-reference/voices/pvc/create) - Creates a new PVC voice.
- [Edit PVC Voice](/docs/api-reference/voices/pvc/update) - Edits PVC voice metadata.
- [Add Samples To PVC Voice](/docs/api-reference/voices/pvc/samples/create) - Adds audio samples to a PVC voice.
- [Update PVC Voice Sample](/docs/api-reference/voices/pvc/samples/update) - Updates a PVC voice sample (noise removal, speaker selection, trimming).
- [Delete PVC Voice Sample](/docs/api-reference/voices/pvc/samples/delete) - Deletes a sample from a PVC voice.
- [Retrieve Voice Sample Audio](/docs/api-reference/voices/pvc/samples/get-audio) - Retrieves audio for a PVC voice sample.
- [Retrieve Voice Sample Visual Waveform](/docs/api-reference/voices/pvc/samples/get-waveform) - Retrieves the visual waveform for a PVC voice sample.
- [Retrieve Speaker Separation Status](/docs/api-reference/voices/pvc/samples/get-speaker-separation-status) - Gets the status of speaker separation for a sample.
- [Start Speaker Separation](/docs/api-reference/voices/pvc/samples/separate-speakers) - Initiates speaker separation for a sample.
- [Retrieve Separated Speaker Audio](/docs/api-reference/voices/pvc/samples/get-separated-speaker-audio) - Retrieves audio for a specific separated speaker.
- [Get PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha) - Gets the captcha for PVC voice verification.
- [Verify PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha/verify) - Submits captcha verification for a PVC voice.
- [Run PVC Training](/docs/api-reference/voices/pvc/train) - Starts the training process for a PVC voice.
- [Request Manual Verification](/docs/api-reference/voices/pvc/verification/request) - Requests manual verification for a PVC voice.
## Updated Endpoints
### Speech to Text
- Updated endpoint with changes:
- [Create Forced Alignment Task](/docs/api-reference/forced-alignment/create#request.body.enabled_spooled_file) - Added `enabled_spooled_file` parameter to allow streaming large files (`POST /v1/forced-alignment`).
## Schema Changes
### Agents Platform
- `GET conversation details`: Added `has_audio`, `has_user_audio`, `has_response_audio` boolean fields [here](/docs/api-reference/conversations/get-conversation#response.body.has_audio)
### Dubbing
- `GET dubbing resource `: Added `status` field to each render [here](/docs/api-reference/dubbing/get-dubbing-resource#response.body.renders.status)
# April 14, 2025
### Voices
- **New PVC flow**: Added new flow for Professional Voice Clone creation, try it out [here](https://elevenlabs.io/app/voice-lab?action=create&creationType=professionalVoiceClone)
### Agents Platform
- **Agent-agent transfer:** Added support for agent-to-agent transfers via a new system tool, enabling more complex conversational flows. See the [Agent Transfer tool documentation](/docs/agents-platform/customization/tools/system-tools/agent-transfer) for details.
- **Enhanced tool debugging:** Improved how tool execution details are displayed in the conversation history for easier debugging.
- **Language detection fix:** Resolved an issue regarding the forced calling of the language detection tool.
### Dubbing
- **Render endpoint:** Introduced a new endpoint to regenerate audio or video renders for specific languages within a dubbing project. This automatically handles missing transcriptions or translations. See the [Render Dub endpoint](/docs/api-reference/dubbing/render-dub).
- **Increased size limit:** Raised the maximum allowed file size for dubbing projects to 1 GiB.
### API
## New Endpoints
- [Added render dub endpoint](/docs/api-reference/dubbing/render-dub) - Regenerate dubs for a specific language.
## Updated Endpoints
### Pronunciation Dictionaries
- Updated the response for the [`GET /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/`](/docs/api-reference/pronunciation-dictionary/get#response.body.permission_on_resource) endpoint and related components to include the `permission_on_resource` field.
### Speech to Text
- Updated [Speech to Text endpoint](/docs/api-reference/speech-to-text/convert) (`POST /v1/speech-to-text`):
- Added `cloud_storage_url` parameter to allow transcription directly from public S3 or GCS URLs (up to 2GB).
- Made the `file` parameter optional; exactly one of `file` or `cloud_storage_url` must now be provided.
### Speech to Speech
- Added optional `file_format` parameter (`pcm_s16le_16` or `other`) for lower latency with PCM input to [`POST /v1/speech-to-speech/{voice_id}`](/docs/api-reference/speech-to-speech/convert)
### Agents Platform
- Updated components to support [agent-agent transfer](/docs/agents-platform/customization/tools/system-tools/agent-transfer) tool
### Voices
- Updated [`GET /v1/voices/{voice_id}`](/docs/api-reference/voices/get#response.body.samples.trim_start) `samples` field to include optional `trim_start` and `trim_end` parameters.
### AudioNative
- Updated [`Get /v1/audio-native/{project_id}/settings`](/docs/api-reference/audio-native/get-settings#response.body.settings.status) to include `status` field (`processing` or `ready`).
# April 7, 2025
## Speech to text
- **`scribe_v1_experimental`**: Launched a new experimental preview of the [Scribe v1 model](/docs/capabilities/speech-to-text) with improvements including improved performance on audio files with multiple languages, reduced hallucinations when audio is interleaved with silence, and improved audio tags. The new model is available via the API under the model name [`scribe_v1_experimental`](/docs/api-reference/speech-to-text/convert#request.body.model_id)
### Text to speech
- **A-law format support**: Added [a-law format](/docs/api-reference/text-to-speech/convert#request.query.output_format) with 8kHz sample rate to enable integration with European telephony systems.
- **Fixed quota issues**: Fixed a database bug that caused some requests to be mistakenly rejected as exceeding their quota.
### Agents Platform
- **Document type filtering**: Added support for filtering knowledge base documents by their [type](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.types) (file, URL, or text).
- **Non-audio agents**: Added support for conversational agents that don't output audio but still send response transcripts and can use tools. Non-audio agents can be enabled by removing the audio [client event](/docs/agents-platform/customization/events/client-events).
- **Improved agent templates**: Updated all agent templates with enhanced configurations and prompts. See more about how to improve system prompts [here](/docs/agents-platform/best-practices/prompting-guide).
- **Fixed stuck exports**: Fixed an issue that caused exports to be stuck for extended periods.
### Studio
- **Fixed volume normalization**: Fixed issue with streaming project snapshots when volume normalization is enabled.
### New API endpoints
- **Forced alignment**: Added new [forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text, perfect for subtitle generation.
- **Batch calling**: Added batch calling [endpoint](/docs/agents-platform/api-reference/batch-calling/create) for scheduling calls to multiple recipients
### API
## New Endpoints
- Added [Forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text
- Added dedicated endpoints for knowledge base document types:
- [Create text document](/docs/api-reference/knowledge-base/create-from-text)
- [Create file document](/docs/api-reference/knowledge-base/create-from-file)
- [Create URL document](/docs/api-reference/knowledge-base/create-from-url)
## Updated Endpoints
### Text to Speech
- Added a-law format (8kHz) to all audio endpoints:
- [Text to speech](/docs/api-reference/text-to-speech/convert)
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream)
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps)
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps)
- [Speech to speech](/docs/api-reference/speech-to-speech)
- [Stream speech to speech](/docs/api-reference/speech-to-speech/stream)
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews)
- [Sound generation](/docs/api-reference/sound-generation)
### Voices
- [Get voices](/docs/api-reference/voices/search) - Added `collection_id` parameter for filtering voices by collection
### Knowledge Base
- [Get knowledge base](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Added `types` parameter for filtering documents by type
- General endpoint for creating knowledge base documents marked as deprecated in favor of specialized endpoints
### User Subscription
- [Get user subscription](/docs/api-reference/user/subscription/get) - Added `professional_voice_slots_used` property to track number of professional voices used in a workspace
### Agents Platform
- Added `silence_end_call_timeout` parameter to set maximum wait time before terminating a call
- Removed `/v1/convai/agents/{agent_id}/add-secret` endpoint (now handled by workspace secrets endpoints)
# March 31, 2025
### Text to speech
- **Opus format support**: Added support for Opus format with 48kHz sample rate across multiple bitrates (32-192 kbps).
- **Improved websocket error handling**: Updated TTS websocket API to return more accurate error codes (1011 for internal errors instead of 1008) for better error identification and SLA monitoring.
### Agents Platform
- **Twilio outbound**: Added ability to natively run outbound calls.
- **Post-call webhook override**: Added ability to override post-call webhook settings at the agent level, providing more flexible configurations.
- **Large knowledge base document viewing**: Enhanced the knowledge base interface to allow viewing the entire content of large RAG documents.
- **Added call SID dynamic variable**: Added `system__call_sid` as a system dynamic variable to allow referencing the call ID in prompts and tools.
### Studio
- **Actor Mode**: Added Actor Mode in Studio, allowing you to use your own voice recordings to direct the way speech should sound in Studio projects.
- **Improved keyboard shortcuts**: Updated keyboard shortcuts for viewing settings and editor shortcuts to avoid conflicts and simplified shortcuts for locking paragraphs.
### Dubbing
- **Dubbing duplication**: Made dubbing duplication feature available to all users.
- **Manual mode foreground generation**: Added ability to generate foreground audio when using manual mode with a file and CSV.
### Voices
- **Enhanced voice collections**: Improved voice collections with visual upgrades, language-based filtering, navigation breadcrumbs, collection images, and mouse dragging for carousel navigation.
- **Locale filtering**: Added locale parameter to shared voices endpoint for more precise voice filtering.
### API
## Updated Endpoints
### Text to Speech
- Updated Text to Speech endpoints:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Added `apply_language_text_normalization` parameter for improved text pronunciation in supported languages (currently Japanese)
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added `apply_language_text_normalization`
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added `apply_language_text_normalization`
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added `apply_language_text_normalization`
### Audio Format
- Added Opus format support to multiple endpoints:
- [Text to speech](/docs/api-reference/text-to-speech/convert) - Added support for Opus format with 48kHz sample rate at multiple bitrates (32, 64, 96, 128, 192 kbps)
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added Opus format options
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added Opus format options
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added Opus format options
- [Speech to speech](/docs/api-reference/speech-to-speech) - Added Opus format options
- [Stream speech to speech](/docs/api-reference/speech-to-speech/stream) - Added Opus format options
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added Opus format options
- [Sound generation](/docs/api-reference/sound-generation) - Added Opus format options
### Agents Platform
- Updated Agents Platform endpoints:
- [Delete agent](/docs/api-reference/agents/delete) - Changed success response code from 200 to 204
- [Updated RAG embedding model options](/docs/api-reference/knowledge-base/compute-rag-index#request.body.model) - replaced `gte_Qwen2_15B_instruct` with `multilingual_e5_large_instruct`
### Voices
- Updated Voice endpoints:
- [Get shared voices](/docs/api-reference/voice-library/get-shared) - Added locale parameter for filtering voices by language region
### Dubbing
- Updated Dubbing endpoint:
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Renamed beta feature `use_replacement_voices_from_library` parameter to `disable_voice_cloning` for clarity
# March 24, 2025
### Voices
- **List Voices V2**: Added a new [V2 voice search endpoint](/docs/api-reference/voices/search) with better search and additional filtering options
### Agents Platform
- **Native outbound calling**: Added native outbound calling for Twilio-configured numbers, eliminating the need for complex setup configurations. Outbound calls are now visible in the Call History page.
- **Automatic language detection**: Added new system tool for automatic language detection that enables agents to switch languages based on both explicit user requests ("Let's talk in Spanish") and implicit language in user audio.
- **Pronunciation dictionary improvements**: Fixed phoneme tags in pronunciation dictionaries to work correctly with Agents Platform.
- **Large RAG document viewing**: Added ability to view the entire content of large RAG documents in the knowledge base.
- **Customizable widget controls**: Updated UI to include an optional mute microphone button and made widget icons customizable via slots.
### Sound Effects
- **Fractional duration support**: Fixed an issue where users couldn't enter fractional values (like 0.5 seconds) for sound effect generation duration.
### Speech to Text
- **Repetition handling**: Improved detection and handling of repetitions in speech-to-text processing.
### Studio
- **Reader publishing fixes**: Added support for mp3_44100_192 output format (high quality) so users below Publisher tier can export audio to Reader.
### Mobile
- **Core app signup**: Added signup endpoints for the new Core mobile app.
### API
## New Endpoints
- Added 5 new endpoints:
- [List voices (v2)](/docs/api-reference/voices/search) - Enhanced voice search capabilities with additional filtering options
- [Initiate outbound call](/docs/api-reference/conversations/outbound-call) - New endpoint for making outbound calls via Twilio integration
- [Add pronunciation dictionary from rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Create pronunciation dictionaries directly from rules without file upload
- [Get knowledge base document content](/docs/api-reference/knowledge-base/get-knowledge-base-document-content) - Retrieve full document content from the knowledge base
- [Get knowledge base document chunk](/docs/api-reference/knowledge-base/get-knowledge-base-document-part-by-id) - Retrieve specific chunks from knowledge base documents
## Updated Endpoints
### Agents Platform
- Updated Agents Platform endpoints:
- [Create agent](/docs/api-reference/agents/create) - Added `mic_muting_enabled` property for UI control and `workspace_overrides` property for workspace-specific configurations
- [Update agent](/docs/api-reference/agents/update) - Added `workspace_overrides` property for customizing agent behavior per workspace
- [Get agent](/docs/api-reference/agents/get) - Added `workspace_overrides` property to the response
- [Get widget](/docs/api-reference/widget/get-agent-widget) - Added `mic_muting_enabled` property for controlling microphone muting in the widget UI
- [Get conversation](/docs/api-reference/conversations/get-conversation) - Added rag information to view knowledge base content used during conversations
- [Create phone number](/docs/api-reference/phone-numbers) - Replaced generic structure with specific twilio phone number and sip trunk options
- [Compute RAG index](/docs/agents-platform/api-reference/knowledge-base/compute-rag-index) - Removed `force_reindex` query parameter for more controlled indexing
- [List knowledge base documents](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Changed response structure to support different document types
- [Get knowledge base document](/docs/api-reference/knowledge-base/get) - Modified to return different response models based on document type
### Text to Speech
- Updated Text to Speech endpoints:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made properties optional, including `stability` and `similarity` settings
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made voice settings properties optional for more flexible streaming requests
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made settings optional and modified `pronunciation_dictionary_locators` property
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Made voice settings properties optional for more flexible requests
### Speech to Text
- Updated Speech to Text endpoint:
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Removed `biased_keywords` property from form data and improved internal repetition detection algorithm
### Voice Management
- Updated Voice endpoints:
- [Get voices](/docs/api-reference/voices/search) - Updated voice settings properties in the response
- [Get default voice settings](/docs/api-reference/voices/settings/get-default) - Made `stability` and `similarity` properties optional
- [Get voice settings](/docs/api-reference/voices/settings/get) - Made numeric properties optional for more flexible configuration
- [Edit voice settings](/docs/api-reference/voices/settings/update) - Made `stability` and `similarity` settings optional
- [Create voice](/docs/api-reference/voices/ivc/create) - Modified array properties to accept null values
- [Create voice from preview](/docs/api-reference/text-to-voice/create) - Updated voice settings model with optional properties
### Studio
- Updated Studio endpoints:
- [Get project](/docs/api-reference/studio/get-project) - Added `version_rules_num` to project metadata
- [Get project snapshot](/docs/api-reference/studio/get-project-snapshot) - Removed `status` property
- [Create pronunciation dictionaries](/docs/api-reference/studio/create-pronunciation-dictionaries) - Modified `pronunciation_dictionary_locators` property and string properties to accept null values
### Pronunciation Dictionary
- Updated Pronunciation Dictionary endpoints:
- [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Added `sort` and `sort_direction` query parameters, plus `latest_version_rules_num` and `integer` properties to response
- [Get pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/get) - Added `latest_version_rules_num` and `integer` properties to response
- [Add from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Added `version_rules_num` property to response for tracking rules quantity
- [Add rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Added `version_rules_num` to response for rules tracking
- [Remove rules](/docs/api-reference/pronunciation-dictionary/remove-rules) - Added `version_rules_num` to response for rules tracking
# March 17, 2025
### Agents Platform
- **Default LLM update**: Changed the default agent LLM from Gemini 1.5 Flash to Gemini 2.0 Flash for improved performance.
- **Fixed incorrect conversation abandons**: Improved detection of conversation continuations, preventing premature abandons when users repeat themselves.
- **Twilio information in history**: Added Twilio call details to conversation history for better tracking.
- **Knowledge base redesign**: Redesigned the knowledge base interface.
- **System dynamic variables**: Added system dynamic variables to use time, conversation id, caller id and other system values as dynamic variables in prompts and tools.
- **Twilio client initialisation**: Adds an agent level override for conversation initiation client data twilio webhook.
- **RAG chunks in history**: Added retrieved chunks by RAG to the call transcripts in the [history view](https://elevenlabs.io/app/agents/history).
### Speech to Text
- **Reduced pricing**: Reduced the pricing of our Scribe model, see more [here](/docs/capabilities/speech-to-text#pricing).
- **Improved VAD detection**: Enhanced Voice Activity Detection with better pause detection at segment boundaries and improved handling of silent segments.
- **Enhanced diarization**: Improved speaker clustering with a better ECAPA model, symmetric connectivity matrix, and more selective speaker embedding generation.
- **Fixed ASR bugs**: Resolved issues with VAD rounding, silence and clustering that affected transcription accuracy.
### Studio
- **Disable publishing UI**: Added ability to disable the publishing interface for specific workspace members to support enterprise workflows.
- **Snapshot API improvement**: Modified endpoints for project and chapter snapshots to return an empty list instead of throwing errors when snapshots can't be downloaded.
- **Disabled auto-moderation**: Turned off automatic moderation based on Text to Speech generations in Studio.
### Workspaces
- **Fixed API key editing**: Resolved an issue where editing workspace API keys would reset character limits to zero, causing the keys to stop working.
- **Optimized free subscriptions**: Fixed an issue with refreshing free subscription character limits,
### API
## New Endpoints
- Added 3 new endpoints:
- [Get workspace resource](/docs/api-reference/workspace/get-resource)
- [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource)
- [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource)
## Updated Endpoints
### Dubbing
- Updated Dubbing endpoints:
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `use_replacement_voices_from_library` property and made `source_path`, `target_language`, `source_language` nullable
- [Resource dubbing](/docs/api-reference/dubbing/dub-segments) - Made `language_codes` array nullable
- [Add language to dubbing resource](/docs/api-reference/dubbing/add-language-to-resource) - Made `language_code` nullable
- [Translate dubbing resource](/docs/api-reference/dubbing/translate-segments) - Made `target_languages` array nullable
- [Update dubbing segment](/docs/api-reference/dubbing/update-segment-language) - Made `start_time` and `end_time` nullable
### Project Management
- Updated Project endpoints:
- [Add project](/docs/api-reference/studio/add-project) - Made `metadata`, `project_name`, `description` nullable
- [Create podcast](/docs/api-reference/studio/create-podcast) - Made `title`, `description`, `author` nullable
- [Get project](/docs/api-reference/studio/get-project) - Made `last_modified_at`, `created_at`, `project_name` nullable
- [Add chapter](/docs/api-reference/studio/add-chapter) - Made `chapter_id`, `word_count`, `statistics` nullable
- [Update chapter](/docs/api-reference/studio/update-chapter) - Made `content` and `blocks` properties nullable
### Agents Platform
- Updated Agents Platform endpoints:
- [Update agent](/docs/api-reference/agents/update) - Made `conversation_config`, `platform_settings` nullable and added `workspace_overrides` property
- [Create agent](/docs/api-reference/agents/create) - Made `agent_name`, `prompt`, `widget_config` nullable and added `workspace_overrides` property
- [Add to knowledge base](/docs/api-reference/knowledge-base/create-from-url) - Made `document_name` nullable
- [Get conversation](/docs/api-reference/conversations/get-conversation) - Added `twilio_call_data` model and made `transcript`, `metadata` nullable
### Text to Speech
- Updated Text to Speech endpoints:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made `character_alignment` and `word_alignment` nullable
### Voice Management
- Updated Voice endpoints:
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added `loudness`, `quality`, `guidance_scale` properties
- [Create voice from preview](/docs/api-reference/text-to-voice/create) - Added `speaker_separation` properties and made `voice_id`, `name`, `labels` nullable
- [Get voice](/docs/api-reference/voices/get) - Added `speaker_boost`, `speaker_clarity`, `speaker_isolation` properties
### Speech to Text
- Updated Speech to Text endpoint:
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `biased_keywords` property
### Other Updates
- [Download history](/docs/api-reference/history/download) - Added application/zip content type and 400 response
- [Add pronunciation dictionary from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Made `dictionary_name` and `description` nullable
# March 10, 2025
### Agents Platform
- **HIPAA compliance**: Agents Platform is now [HIPAA compliant](/docs/agents-platform/legal/hipaa) on appropriate plans, when a BAA is signed, zero-retention mode is enabled and appropriate LLMs are used. For access please [contact sales](https://elevenlabs.io/contact-sales)
- **Cascade LLM**: Added dynamic dispatch during the LLM step to other LLMs if your default LLM fails. This results in higher latency but prevents the turn failing.
- **Better error messages**: Added better error messages for websocket failures.
- **Audio toggling**: Added ability to select only user or agent audio in the conversation playback.
### Scribe
- **HIPAA compliance**: Added a zero retention mode to Scribe to be HIPAA compliant.
- **Diarization**: Increased time length of audio files that can be transcribed with diarization from 8 minutes to 2 hours.
- **Cheaper pricing**: Updated Scribe's pricing to be cheaper, as low as $0.22 per hour for the Business tier.
- **Memory usage**: Shipped improvements to Scribe's memory usage.
- **Fixed timestamps**: Fixed an issue that was causing incorrect timestamps to be returned.
### Text to Speech
- **Pronunciation dictionaries**: Fixed pronunciation dictionary rule application for replacements that contain symbols.
### Dubbing
- **Studio support**: Added support for creating dubs with `dubbing_studio` enabled, allowing for more advanced dubbing workflows beyond one-off dubs.
### Voices
- **Verification**: Fixed an issue where users on probation could not verify their voice clone.
### API
## New Endpoints
- Added 7 new endpoints:
- [Add a shared voice to your collection](/docs/api-reference/voice-library/share)
- [Archive a project snapshot](/docs/api-reference/studio/archive-snapshot)
- [Update a project](/docs/api-reference/studio/edit-project)
- [Create an Audio Native enabled project](/docs/api-reference/audio-native/create)
- [Get all voices](/docs/api-reference/voices/search)
- [Download a pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/download)
- [Get Audio Native project settings](/docs/api-reference/audio-native/get-settings)
## Updated Endpoints
### Studio Projects
- Updated Studio project endpoints to add `source_type` property and deprecate `quality_check_on` and `quality_check_on_when_bulk_convert` properties:
- [Get projects](/docs/api-reference/studio/get-projects)
- [Get project](/docs/api-reference/studio/get-project)
- [Add project](/docs/api-reference/studio/add-project)
- [Update content](/docs/api-reference/studio/update-content)
- [Create podcast](/docs/api-reference/studio/create-podcast)
### Voice Management
- Updated Voice endpoints with several property changes:
- [Get voice](/docs/api-reference/voices/get) - Made several properties optional and added `preview_url`
- [Create voice](/docs/api-reference/voices/ivc/create) - Made several properties optional and added `preview_url`
- [Create voice from preview](/docs/api-reference/text-to-voice/create) - Made several properties optional and added `preview_url`
- [Get similar voices](/docs/api-reference/voices/get-similar-library-voices) - Made `language`, `description`, `preview_url`, and `rate` properties optional
### Agents Platform
- Updated ElevenLabs agent endpoints:
- [Update agent](/docs/api-reference/agents/update) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties
- [Create agent](/docs/api-reference/agents/create) - Modified `conversation_config`, `agent`, `prompt`, platform_settings, widget properties and added `shareable_page_show_terms`
- [Get agent](/docs/api-reference/agents/get) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties
- [Get widget](/docs/api-reference/widget/get-agent-widget) - Modified `widget_config` property and added `shareable_page_show_terms`
### Knowledge Base
- Updated Knowledge Base endpoints to add metadata property:
- [List knowledge base documents](/docs/api-reference/knowledge-base/list#response.body.metadata)
- [Get knowledge base document](/docs/api-reference/knowledge-base/get-document#response.body.metadata)
### Other Updates
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `dubbing_studio` property
- [Convert text to sound effects](/docs/api-reference/text-to-sound-effects/convert) - Added `output_format` query parameter
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `enable_logging` query parameter
- [Get secrets](/docs/api-reference/workspace/secrets/list) - Modified `secrets` and `used_by` properties
- [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Made `next_cursor` property optional
## Removed Endpoints
- Temporarily removed Agents Platform tools endpoints:
- Get tool
- List tools
- Update tool
- Create tool
- Delete tool
# March 3, 2025
### Dubbing
- **Scribe for speech recognition**: Dubbing Studio now uses Scribe by default for speech recognition to improve accuracy.
### Speech to Text
- **Fixes**: Shipped several fixes improving the stability of Speech to Text.
### Agents Platform
- **Speed control**: Added speed control to an agent's settings in Agents Platform.
- **Post call webhook**: Added the option of sending [post-call webhooks](/docs/agents-platform/workflows/post-call-webhooks) after conversations are completed.
- **Improved error messages**: Added better error messages to the Agents Platform websocket.
- **Claude 3.7 Sonnet**: Added Claude 3.7 Sonnet as a new LLM option in Agents Platform.
### API
#### New Endpoints
- Added new Dubbing resource management endpoints:
- for adding [languages to dubs](/docs/api-reference/dubbing/resources/add-language)
- for retrieving [dubbing resources](/docs/api-reference/dubbing/resources/get-resource)
- for creating [segments](/docs/api-reference/dubbing/resources/create-segment)
- for modifying [segments](/docs/api-reference/dubbing/resources/update-segment)
- for removing [segments](/docs/api-reference/dubbing/resources/delete-segment)
- for dubbing [segments](/docs/api-reference/dubbing/resources/dub-segment)
- for transcribing [segments](/docs/api-reference/dubbing/resources/transcribe-segment)
- for translating [segments](/docs/api-reference/dubbing/resources/translate-segment)
- Added Knowledge Base RAG indexing [endpoint](/docs/agents-platform/api-reference/knowledge-base/compute-rag-index)
- Added Studio snapshot retrieval endpoints for [projects](/docs/api-reference/studio/get-project-snapshot) and [chapters](/docs/api-reference/studio/get-chapter-snapshot)
#### Updated Endpoints
- Added `prompt_injectable` property to knowledge base [endpoints](/docs/api-reference/knowledge-base/get-document#response.body.prompt_injectable)
- Added `name` property to Knowledge Base document [creation](/docs/api-reference/knowledge-base/create-from-url#request.body.name) and [retrieval](/docs/api-reference/knowledge-base/get-document#response.body.name) endpoints:
- Added `speed` property to [agent creation](/docs/api-reference/agents/create#request.body.conversation_config.tts.speed)
- Removed `secrets` property from agent endpoints (now handled by dedicated secrets endpoints)
- Added [secret deletion endpoint](/docs/api-reference/workspace/secrets/delete) for removing secrets
- Removed `secrets` property from settings [endpoints](/docs/api-reference/workspace/get)
# February 25, 2025
### Speech to Text
- **ElevenLabs launched a new state of the art [Speech to Text API](/docs/capabilities/speech-to-text) available in 99 languages.**
### Text to Speech
- **Speed control**: Added speed control to the Text to Speech API.
### Studio
- **Auto-assigned projects**: Increased token limits for auto-assigned projects from 1 month to 3 months worth of tokens, addressing user feedback about working on longer projects.
- **Language detection**: Added automatic language detection when generating audio for the first time, with suggestions to switch to Eleven Turbo v2.5 for languages not supported by Multilingual v2 (Hungarian, Norwegian, Vietnamese).
- **Project export**: Enhanced project exporting in ElevenReader with better metadata tracking.
### Dubbing
- **Clip overlap prevention**: Added automatic trimming of overlapping clips in dubbing jobs to ensure clean audio tracks for each speaker and language.
### Voice Management
- **Instant Voice Cloning**: Improved preview generation for Instant Voice Cloning v2, making previews available immediately.
### Agents Platform
- **Agent ownership**: Added display of agent creators in the agent list, improving visibility and management of shared agents.
### Web app
- **Dark mode**: Added dark mode to the web app.
### API
- Launched **/v1/speech-to-text** [endpoint](/docs/api-reference/speech-to-text/convert)
- Added `agents.level` property to [ElevenLabs agents endpoint](/docs/api-reference/agents/get#response.body.agents.access_level)
- Added `platform_settings` to [ElevenLabs agent endpoint](/docs/api-reference/agents/update#request.body.platform_settings)
- Added `expandable` variant to `widget_config`, with configuration options `show_avatar_when_collapsed` and `disable_banner` to [ElevenLabs agent widget endpoint](/docs/api-reference/agents/get#response.body.widget)
- Added `webhooks` property and `used_by` to `secrets` to [secrets endpoint](/docs/api-reference/workspace/secrets/list#response.body.secrets.used_by)
- Added `verified_languages` to [voices endpoint](/docs/api-reference/voices/get#response.body.verified_languages)
- Added `speed` property to [voice settings endpoints](/docs/api-reference/voices/get#response.body.settings.speed)
- Added `verified_languages`, `is_added_by_user` to `voices` and `min_notice_period_days` query parameter to [shared voices endpoint](/docs/api-reference/voice-library/get-shared#request.query)
- Added `verified_languages`, `is_added_by_user` to `voices` in [similar voices endpoint](/docs/api-reference/voices/get-similar-library-voices)
- Added `search`, `show_only_owned_documents`, `use_typesense` query parameters to [knowledge base endpoint](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.search)
- Added `used_by` to Conversation AI [secrets endpoint](/docs/api-reference/workspace/secrets/list)
- Added `invalidate_affected_text` property to Studio [pronunciation dictionaries endpoint](/docs/api-reference/studio/create-pronunciation-dictionaries#request.body.invalidate_affected_text)
# February 17, 2025
### Agents Platform
- **Tool calling fix**: Fixed an issue where tool calling was not working with agents using gpt-4o mini. This was due to a breaking change in the OpenAI API.
- **Tool calling improvements**: Added support for tool calling with dynamic variables inside objects and arrays.
- **Dynamic variables**: Fixed an issue where dynamic variables of a conversation were not being displayed correctly.
### Voice Isolator
- **Fixed**: Fixed an issue that caused the voice isolator to not work correctly temporarily.
### Workspace
- **Billing**: Improved billing visibility by differentiating rollover, cycle, gifted, and usage-based credits.
- **Usage Analytics**: Improved usage analytics load times and readability.
- **Fine grained fiat billing**: Added support for customizable pricing based on several factors.
### API
- Added `phone_numbers` property to [Agent responses](/docs/api-reference/agents/get)
- Added usage metrics to subscription_extras in [User endpoint](/docs/api-reference/user/get):
- `unused_characters_rolled_over_from_previous_period`
- `overused_characters_rolled_over_from_previous_period`
- `usage` statistics
- Added `enable_conversation_initiation_client_data_from_webhook` to [Agent creation](/docs/api-reference/agents/create)
- Updated [Agent](/docs/api-reference/agents) endpoints with consolidated settings for:
- `platform_settings`
- `overrides`
- `safety`
- Deprecated `with_settings` parameter in [Voice retrieval endpoint](/docs/api-reference/voices/get)
# February 10, 2025
## Agents Platform
- **Updated Pricing**: Updated self-serve pricing for Agents Platform with [reduced cost and a more generous free tier](/docs/agents-platform/overview#pricing-tiers).
- **Knowledge Base UI**: Created a new page to easily manage your [knowledge base](https://elevenlabs.io/app/agents/knowledge-base).
- **Live calls**: Added number of live calls in progress in the user [dashboard](https://elevenlabs.io/app/agents) and as a new endpoint.
- **Retention**: Added ability to customize transcripts and audio recordings [retention settings](/docs/agents-platform/customization/privacy/retention).
- **Audio recording**: Added a new option to [disable audio recordings](/docs/agents-platform/customization/privacy/audio-saving).
- **8k PCM support**: Added support for 8k PCM audio for both input and output.
## Studio
- **GenFM**: Updated the create podcast endpoint to accept [multiple input sources](/docs/api-reference/studio/create-podcast).
- **GenFM**: Fixed an issue where GenFM was creating empty podcasts.
## Enterprise
- **New workspace group endpoints**: Added new endpoints to manage [workspace groups](/docs/api-reference/workspace/search-user-groups).
### API
**Studio (formerly Projects)**
All `/v1/projects/*` endpoints have been deprecated in favor of the new `/v1/studio/projects/*` endpoints. The following endpoints are now deprecated:
- All operations on `/v1/projects/`
- All operations related to chapters, snapshots, and content under `/v1/projects/*`
**Agents Platform**
- `POST /v1/convai/add-tool` - Use `POST /v1/convai/tools` instead
- `DELETE /v1/convai/agents/{agent_id}` - Response type is no longer an object
- `GET /v1/convai/tools` - Response type changed from array to object with a `tools` property
**Agents Platform Updates**
- `GET /v1/convai/agents/{agent_id}` - Updated conversation configuration and agent properties
- `PATCH /v1/convai/agents/{agent_id}` - Added `use_tool_ids` parameter for tool management
- `POST /v1/convai/agents/create` - Added tool integration via `use_tool_ids`
**Knowledge Base & Tools**
- `GET /v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties
- `GET /v1/convai/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties
- `GET /v1/convai/tools/{tool_id}` - Added `dependent_agents` property
- `PATCH /v1/convai/tools/{tool_id}` - Added `dependent_agents` property
**GenFM**
- `POST /v1/projects/podcast/create` - Added support for multiple input sources
**Studio (formerly Projects)**
New endpoints replacing the deprecated `/v1/projects/*` endpoints
- `GET /v1/studio/projects`: List all projects
- `POST /v1/studio/projects`: Create a project
- `GET /v1/studio/projects/{project_id}`: Get project details
- `DELETE /v1/studio/projects/{project_id}`: Delete a project
**Knowledge Base Management**
- `GET /v1/convai/knowledge-base`: List all knowledge base documents
- `DELETE /v1/convai/knowledge-base/{documentation_id}`: Delete a knowledge base
- `GET /v1/convai/knowledge-base/{documentation_id}/dependent-agents`: List agents using this knowledge base
**Workspace Groups** - New enterprise features for team management
- `GET /v1/workspace/groups/search`: Search workspace groups
- `POST /v1/workspace/groups/{group_id}/members`: Add members to a group
- `POST /v1/workspace/groups/{group_id}/members/remove`: Remove members from a group
**Tools**
- `POST /v1/convai/tools`: Create new tools for agents
## Socials
- **ElevenLabs Developers**: Follow our new developers account on X [@ElevenLabsDevs](https://x.com/intent/user?screen_name=elevenlabsdevs)
# February 4, 2025
### Agents Platform
- **Agent monitoring**: Added a new dashboard for monitoring ElevenLabs agents' activity. Check out your's [here](https://elevenlabs.io/app/agents).
- **Proactive conversations**: Enhanced capabilities with improved timeout retry logic. [Learn more](/docs/agents-platform/customization/conversation-flow)
- **Tool calls**: Fixed timeout issues occurring during tool calls
- **Allowlist**: Fixed implementation of allowlist functionality.
- **Content summarization**: Added Gemini as a fallback model to ensure service reliability
- **Widget stability**: Fixed issue with dynamic variables causing the Agents Platform widget to fail
### Reader
- **Trending content**: Added carousel showcasing popular articles and trending content
- **New publications**: Introduced dedicated section for recent ElevenReader Publishing releases
### Studio (formerly Projects)
- **Projects is now Studio** and is now generally available to everyone
- **Chapter content editing**: Added support for editing chapter content through the public API, enabling programmatic updates to chapter text and metadata
- **GenFM public API**: Added public API support for podcast creation through GenFM. Key features include:
- Conversation mode with configurable host and guest voices
- URL-based content sourcing
- Customizable duration and highlights
- Webhook callbacks for status updates
- Project snapshot IDs for audio downloads
### SDKs
- **Swift**: fixed an issue where resources were not being released after the end of a session
- **Python**: added uv support
- **Python**: fixed an issue where calls were not ending correctly
### API
- Added POST `v1/workspace/invites/add-bulk` [endpoint](/docs/api-reference/workspace/invite-multiple-users) to enable inviting multiple users simultaneously
- Added POST `v1/projects/podcast/create` [endpoint](/docs/api-reference/studio/create-podcast) for programmatic podcast generation through GenFM
- Added 'v1/convai/knowledge-base/:documentation_id' [endpoints](/docs/api-reference/knowledge-base/) with CRUD operations for Agents Platform
- Added PATCH `v1/projects/:project_id/chapters/:chapter_id` [endpoint](/docs/api-reference/studio/update-chapter) for updating project chapter content and metadata
- Added `group_ids` parameter to [Workspace Invite endpoint](/docs/api-reference/workspace/invite-user) for group-based access control
- Added structured `content` property to [Chapter response objects](/docs/api-reference/studio/get-chapter)
- Added `retention_days` and `delete_transcript_and_pii` data retention parameters to [Agent creation](/docs/api-reference/agents/create)
- Added structured response to [AudioNative content](/docs/api-reference/audio-native/create#response.body.project_id)
- Added `convai_chars_per_minute` usage metric to [User endpoint](/docs/api-reference/user/get)
- Added `media_metadata` field to [Dubbing response objects](/docs/api-reference/dubbing/get)
- Added GDPR-compliant `deletion_settings` to [Conversation responses](/docs/api-reference/conversations/get-conversation#response.body.metadata.deletion_settings)
- Deprecated Knowledge Base legacy endpoints:
- POST `/v1/convai/agents/{agent_id}/add-to-knowledge-base`
- GET `/v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}`
- Updated Agent endpoints with consolidated [privacy control parameters](/docs/api-reference/agents/create)
# January 27, 2025
### Docs
- **Shipped our new docs**: we're keen to hear your thoughts, you can reach out by opening an issue on [GitHub](https://github.com/elevenlabs/elevenlabs-docs) or chatting with us on [Discord](https://discord.gg/elevenlabs)
### Agents Platform
- **Dynamic variables**: Available in the dashboard and SDKs. [Learn more](/docs/agents-platform/customization/personalization/dynamic-variables)
- **Interruption handling**: Now possible to ignore user interruptions in Agents Platform. [Learn more](/docs/agents-platform/customization/conversation-flow#interruptions)
- **Twilio integration**: Shipped changes to increase audio quality when integrating with Twilio
- **Latency optimization**: Published detailed blog post on latency optimizations. [Read more](https://elevenlabs.io/blog/how-do-you-optimize-latency-for-conversational-ai)
- **PCM 8000**: Added support for PCM 8000 to ElevenLabs agents
- **Websocket improvements**: Fixed unexpected websocket closures
### Projects
- **Auto-regenerate**: Auto-regeneration now available by default at no extra cost
- **Content management**: Added `updateContent` method for dynamic content updates
- **Audio conversion**: New auto-convert and auto-publish flags for seamless workflows
### API
- Added `Update Project` endpoint for [project editing](/docs/api-reference/studio/edit-project#:~:text=List%20projects-,POST,Update%20project,-GET)
- Added `Update Content` endpoint for [AudioNative content management](/docs/api-reference/audio-native/update-content)
- Deprecated `quality_check_on` parameter in [project operations](/docs/api-reference/studio/add-project#request.body.quality_check_on). It is now enabled for all users at no extra cost
- Added `apply_text_normalization` parameter to project creation with modes 'auto', 'on', 'apply_english' and 'off' for controlling text normalization during [project creation](/docs/api-reference/studio/add-project#request.body.apply_text_normalization)
- Added alpha feature `auto_assign_voices` in [project creation](/docs/api-reference/studio/add-project#request.body.auto_assign_voices) to automatically assign voices to phrases
- Added `auto_convert` flag to project creation to automatically convert [projects to audio](/docs/api-reference/audio-native/create#request.body.auto_convert)
- Added support for creating ElevenLabs agents with [dynamic variables](/docs/api-reference/agents/create#request.body.conversation_config.agent.dynamic_variables)
- Added `voice_slots_used` to `Subscription` model to track number of custom voices used in a workspace to the `User` [endpoint](/docs/api-reference/user/subscription/get#response.body.voice_slots_used)
- Added `user_id` field to `User` [endpoint](/docs/api-reference/user/get#response.body.user_id)
- Marked legacy AudioNative creation parameters (`image`, `small`, `sessionization`) as deprecated [parameters](/docs/api-reference/audio-native/create#request.body.image)
- Agents platform now supports `call_limits` containing either `agent_concurrency_limit` or `daily_limit` or both parameters to control simultaneous and daily conversation limits for [agents](/docs/api-reference/agents/create#request.body.platform_settings.call_limits)
- Added support for `language_presets` in `conversation_config` to customize language-specific [settings](/docs/api-reference/agents/create#request.body.conversation_config.language_presets)
### SDKs
- **Cross-Runtime Support**: Now compatible with **Bun 1.1.45+** and **Deno 2.1.7+**
- **Regenerated SDKs**: We regenerated our SDKs to be up to date with the latest API spec. Check out the latest [Python SDK release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/1.50.5) and [JS SDK release](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v1.50.4)
- **Dynamic Variables**: Fixed an issue where dynamic variables were not being handled correctly, they are now correctly handled in all SDKs
# January 16, 2025
## Product
### Agents Platform
- **Additional languages**: Add a language dropdown to your widget so customers can launch conversations in their preferred language. Learn more [here](/docs/agents-platform/customization/language).
- **End call tool**: Let the agent automatically end the call with our new “End Call” tool. Learn more [here](/docs/agents-platform/customization/tools)
- **Flash default**: Flash, our lowest latency model, is now the default for new agents. In your agent dashboard under “voice”, you can toggle between Turbo and Flash. Learn more about Flash [here](https://elevenlabs.io/blog/meet-flash).
- **Privacy**: Set concurrent call and daily call limits, turn off audio recordings, add feedback collection, and define customer terms & conditions.
- **Increased tool limits**: Increase the number of tools available to your agent from 5 to 15. Learn more [here](/docs/agents-platform/customization/tools).
# January 2, 2025
## Product
- **Workspace Groups and Permissions**: Introduced new workspace group management features to enhance access control within organizations. [Learn more](https://elevenlabs.io/blog/workspace-groups-and-permissions).
# December 19, 2024
## Model
- **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency Agents Platform applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).
## Launches
- **[TalkToSanta.io](https://www.talktosanta.io)**: Experience Agents Platform in action by talking to Santa this holiday season. For every conversation with santa we donate 2 dollars to [Bridging Voice](https://www.bridgingvoice.org) (up to $11,000).
- **[AI Engineer Pack](https://aiengineerpack.com)**: Get $50+ in credits from leading AI developer tools, including ElevenLabs.
# December 6, 2024
## Product
- **GenFM Now on Web**: Access GenFM directly from the website in addition to the ElevenReader App, [try it now](https://elevenlabs.io/app/projects).
# December 3, 2024
## API
- **Credit Usage Limits**: Set specific credit limits for API keys to control costs and manage usage across different use cases by setting "Access" or "No Access" to features like Dubbing, Audio Native, and more. [Check it out](https://elevenlabs.io/app/settings/api-keys)
- **Workspace API Keys**: Now support access permissions, such as "Read" or "Read and Write" for User, Workspace, and History resources.
- **Improved Key Management**:
- Redesigned interface moving from modals to dedicated pages
- Added detailed descriptions and key information
- Enhanced visibility of key details and settings
# November 29, 2024
## Product
- **GenFM**: Launched in the ElevenReader app. [Learn more](https://elevenlabs.io/blog/genfm-on-elevenreader)
- **Agents Platform**: Now generally available to all customers. [Try it now](https://elevenlabs.io/conversational-ai)
- **TTS Redesign**: The website TTS redesign is now rolled out to all customers.
- **Auto-regenerate**: Now available in Projects. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects)
- **Reader Platform Improvements**:
- Improved content sharing with enhanced landing pages and social media previews.
- Added podcast rating system and improved voice synchronization.
- **Projects revamp**:
- Restore past generations, lock content, assign speakers to sentence fragments, and QC at 2x speed. [Learn more](https://elevenlabs.io/blog/narrate-any-project)
- Auto-regeneration identifies mispronunciations and regenerates audio at no extra cost. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects)
## API
- **Agents Platform**: [SDKs and APIs](https://elevenlabs.io/docs/agents-platform/quickstart) now available.
# October 27, 2024
## API
- **u-law Audio Formats**: Added u-law audio formats to the Convai API for integrations with Twilio.
- **TTS Websocket Improvements**: TTS websocket improvements, flushes and generation work more intuitively now.
- **TTS Websocket Auto Mode**: A streamlined mode for using websockets. This setting reduces latency by disabling chunk scheduling and buffers. Note: Using partial sentences will result in significantly reduced quality.
- **Improvements to latency consistency**: Improvements to latency consistency for all models.
## Website
- **TTS Redesign**: The website TTS redesign is now in alpha!
# October 20, 2024
## API
- **Normalize Text with the API**: Added the option normalize the input text in the TTS API. The new parameter is called `apply_text_normalization` and works on all models. For v2.5 models, this feature is available with Enterprise plans only.
## Product
- **Voice Design**: The Voice Design feature is now in beta!
# October 13, 2024
## Model
- **Stability Improvements**: Significant audio stability improvements across all models, most noticeable on `turbo_v2` and `turbo_v2.5`, when using:
- Websockets
- Projects
- Reader app
- TTS with request stitching
- ConvAI
- **Latency Improvements**: Reduced time to first byte latency by approximately 20-30ms for all models.
## API
- **Remove Background Noise Voice Samples**: Added the ability to remove background noise from voice samples using our audio isolation model to improve quality for IVCs and PVCs at no additional cost.
- **Remove Background Noise STS Input**: Added the ability to remove background noise from STS audio input using our audio isolation model to improve quality at no additional cost.
## Feature
- **Agents Platform Beta**: Agents Platform is now in beta.
# Text to Speech
> Learn how to turn text into lifelike spoken audio with ElevenLabs.
## Overview
ElevenLabs [Text to Speech (TTS)](/docs/api-reference/text-to-speech) API turns text into lifelike audio with nuanced intonation, pacing and emotional awareness. [Our models](/docs/models) adapt to textual cues across 32 languages and multiple voice styles and can be used to:
* Narrate global media campaigns & ads
* Produce audiobooks in multiple languages with complex emotional delivery
* Stream real-time audio from text
Listen to a sample:
Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project.
Learn how to integrate text to speech into your application.
Step-by-step guide for using text to speech in ElevenLabs.
### Voice quality
For real-time applications, Flash v2.5 provides ultra-low 75ms latency, while Multilingual v2 delivers the highest quality audio with more nuanced expression.
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
3,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
[Explore all](/docs/models)
### Voice options
ElevenLabs offers thousands of voices across 32 languages through multiple creation methods:
* [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices
* [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas
* [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication
* [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions
Learn more about our [voice options](/docs/capabilities/voices).
### Supported formats
The default response format is "mp3", but other formats like "PCM", & "μ-law" are available.
* **MP3**
* Sample rates: 22.05kHz - 44.1kHz
* Bitrates: 32kbps - 192kbps
* 22.05kHz @ 32kbps
* 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps
* **PCM (S16LE)**
* Sample rates: 16kHz - 44.1kHz
* Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz
* 16-bit depth
* **μ-law**
* 8kHz sample rate
* Optimized for telephony applications
* **A-law**
* 8kHz sample rate
* Optimized for telephony applications
* **Opus**
* Sample rate: 48kHz
* Bitrates: 32kbps - 192kbps
Higher quality audio options are only available on paid tiers - see our [pricing
page](https://elevenlabs.io/pricing/api) for details.
### Supported languages
Our multilingual v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
Flash v2.5 supports 32 languages - all languages from v2 models plus:
*Hungarian, Norwegian & Vietnamese*
Simply input text in any of our supported languages and select a matching voice from our [voice library](https://elevenlabs.io/community). For the most natural results, choose a voice with an accent that matches your target language and region.
### Prompting
The models interpret emotional context directly from the text input. For example, adding
descriptive text like "she said excitedly" or using exclamation marks will influence the speech
emotion. Voice settings like Stability and Similarity help control the consistency, while the
underlying emotion comes from textual cues.
Read the [prompting guide](/docs/best-practices/prompting) for more details.
Descriptive text will be spoken out by the model and must be manually trimmed or removed from the
audio if desired.
## FAQ
Yes, you can create [instant voice clones](/docs/capabilities/voices#cloned) of your own voice
from short audio clips. For high-fidelity clones, check out our [professional voice
cloning](/docs/capabilities/voices#cloned) feature.
Yes. You retain ownership of any audio you generate. However, commercial usage rights are only
available with paid plans. With a paid subscription, you may use generated audio for commercial
purposes and monetize the outputs if you own the IP rights to the input content.
A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions:
* You can regenerate each piece of content up to 2 times for free
* The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation
Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data.
Use the low-latency Flash [models](/docs/models) (Flash v2 or v2.5) optimized for near real-time
conversational or interactive scenarios. See our [latency optimization
guide](/docs/best-practices/latency-optimization) for more details.
The models are nondeterministic. For consistency, use the optional [seed
parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle
differences may still occur.
Split long text into segments and use streaming for real-time playback and efficient processing.
To maintain natural prosody flow between chunks, include [previous/next text or previous/next
request id parameters](/docs/api-reference/text-to-speech/convert#request.body.previous_text).
# Speech to Text
> Learn how to turn spoken audio into text with ElevenLabs.
## Overview
The ElevenLabs [Speech to Text (STT)](/docs/api-reference/speech-to-text) API turns spoken audio into text with state of the art accuracy. Our Scribe v1 [model](/docs/models) adapts to textual cues across 99 languages and multiple voice styles and can be used to:
* Transcribe podcasts, interviews, and other audio or video content
* Generate transcripts for meetings and other audio or video recordings
Learn how to integrate speech to text into your application.
Learn how to transcribe audio with ElevenLabs in realtime with WebSockets.
Step-by-step guide for using speech to text in ElevenLabs.
Companies requiring HIPAA compliance must contact [ElevenLabs
Sales](https://elevenlabs.io/contact-sales) to sign a Business Associate Agreement (BAA)
agreement. Please ensure this step is completed before proceeding with any HIPAA-related
integrations or deployments.
## State of the art accuracy
The Scribe v1 model is capable of transcribing audio from up to 32 speakers with high accuracy. Optionally it can also transcribe audio events like laughter, applause, and other non-speech sounds.
The transcribed output supports exact timestamps for each word and audio event, plus diarization to identify the speaker for each word.
The Scribe v1 model is best used for when high-accuracy transcription is required rather than real-time transcription. A low-latency, real-time version will be released soon.
## Pricing
| Tier | Price/month | Hours included | Price per included hour | Price per additional hour |
| -------- | ----------- | --------------- | ----------------------- | ------------------------- |
| Free | \$0 | 2 hours 30 min | Unavailable | Unavailable |
| Starter | \$5 | 12 hours 30 min | \$0.4 | Unavailable |
| Creator | \$22 | 62 hours 51 min | \$0.35 | \$0.48 |
| Pro | \$99 | 300 hours | \$0.33 | \$0.4 |
| Scale | \$330 | 1,100 hours | \$0.3 | \$0.33 |
| Business | \$1,320 | 6,000 hours | \$0.22 | \$0.22 |
| Tier | Price/month | Hours included | Price per included hour | Price per additional hour |
| -------- | ----------- | -------------- | ----------------------- | ------------------------- |
| Free | \$0 | Unavailable | Unavailable | Unavailable |
| Starter | \$5 | 10 hours | \$0.48 | Unavailable |
| Creator | \$22 | 48 hours | \$0.46 | \$0.63 |
| Pro | \$99 | 225 hours | \$0.44 | \$0.53 |
| Scale | \$330 | 786 hours | \$0.42 | \$0.46 |
| Business | \$1,320 | 3,385 hours | \$0.39 | \$0.39 |
| Tier | Price/month | Hours included | Price per included hour |
| -------- | ----------- | --------------- | ----------------------- |
| Free | \$0 | 12 minutes | Unavailable |
| Starter | \$5 | 1 hour | \$5 |
| Creator | \$22 | 4 hours 53 min | \$4.5 |
| Pro | \$99 | 24 hours 45 min | \$4 |
| Scale | \$330 | 94 hours 17 min | \$3.5 |
| Business | \$1,320 | 440 hours | \$3 |
For reduced pricing at higher scale than 6,000 hours/month in addition to custom MSAs and DPAs,
please [contact sales](https://elevenlabs.io/contact-sales).
**Note: The free tier requires attribution and does not have commercial licensing.**
Scribe has higher concurrency limits than other services from ElevenLabs.
Please see other concurrency limits [here](/docs/models#concurrency-and-priority)
| Plan | STT Concurrency Limit |
| ---------- | --------------------- |
| Free | 8 |
| Starter | 12 |
| Creator | 20 |
| Pro | 40 |
| Scale | 60 |
| Business | 60 |
| Enterprise | Elevated |
## Examples
The following example shows the output of the Scribe v1 model for a sample audio file.
```javascript
{
"language_code": "en",
"language_probability": 1,
"text": "With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects.",
"words": [
{
"text": "With",
"start": 0.119,
"end": 0.259,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 0.239,
"end": 0.299,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "a",
"start": 0.279,
"end": 0.359,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 0.339,
"end": 0.499,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "soft",
"start": 0.479,
"end": 1.039,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 1.019,
"end": 1.2,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "and",
"start": 1.18,
"end": 1.359,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 1.339,
"end": 1.44,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "whispery",
"start": 1.419,
"end": 1.979,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 1.959,
"end": 2.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "American",
"start": 2.159,
"end": 2.719,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 2.699,
"end": 2.779,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "accent,",
"start": 2.759,
"end": 3.389,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 4.119,
"end": 4.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "I'm",
"start": 4.159,
"end": 4.459,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 4.44,
"end": 4.52,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "the",
"start": 4.5,
"end": 4.599,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 4.579,
"end": 4.699,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "ideal",
"start": 4.679,
"end": 5.099,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 5.079,
"end": 5.219,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "choice",
"start": 5.199,
"end": 5.719,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 5.699,
"end": 6.099,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "for",
"start": 6.099,
"end": 6.199,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 6.179,
"end": 6.279,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "creating",
"start": 6.259,
"end": 6.799,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 6.779,
"end": 6.979,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "ASMR",
"start": 6.959,
"end": 7.739,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 7.719,
"end": 7.859,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "content,",
"start": 7.839,
"end": 8.45,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 9,
"end": 9.06,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "meditative",
"start": 9.04,
"end": 9.64,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 9.619,
"end": 9.699,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "guides,",
"start": 9.679,
"end": 10.359,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 10.359,
"end": 10.409,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "or",
"start": 11.319,
"end": 11.439,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 11.42,
"end": 11.52,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "adding",
"start": 11.5,
"end": 11.879,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 11.859,
"end": 12,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "an",
"start": 11.979,
"end": 12.079,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 12.059,
"end": 12.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "intimate",
"start": 12.179,
"end": 12.579,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 12.559,
"end": 12.699,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "feel",
"start": 12.679,
"end": 13.159,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.139,
"end": 13.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "to",
"start": 13.159,
"end": 13.26,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.239,
"end": 13.3,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "your",
"start": 13.299,
"end": 13.399,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.379,
"end": 13.479,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "narrative",
"start": 13.479,
"end": 13.889,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.919,
"end": 13.939,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "projects.",
"start": 13.919,
"end": 14.779,
"type": "word",
"speaker_id": "speaker_0"
}
]
}
```
The output is classified in three category types:
* `word` - A word in the language of the audio
* `spacing` - The space between words, not applicable for languages that don't use spaces like Japanese, Mandarin, Thai, Lao, Burmese and Cantonese
* `audio_event` - Non-speech sounds like laughter or applause
## Models
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
Real-time speech recognition model
Accurate transcription in 99 languages
Real-time transcription
Low latency (~150ms†)
Precise word-level timestamps
[Explore all](/docs/models)
## Concurrency and priority
Concurrency is the concept of how many requests can be processed at the same time.
For Speech to Text, files that are over 8 minutes long are transcribed in parallel internally in order to speed up processing. The audio is chunked into four segments to be transcribed concurrently.
You can calculate the concurrency limit with the following calculation:
$$
Concurrency = \min(4, \text{round\_up}(\frac{\text{audio\_duration\_secs}}{480}))
$$
For example, a 15 minute audio file will be transcribed with a concurrency of 2, while a 120 minute audio file will be transcribed with a concurrency of 4.
The above calculation is only applicable to Scribe v1. For Scribe v2 Realtime, see the
[concurrency limit chart](/docs/models#concurrency-and-priority).
## Supported languages
The Scribe v1 model supports 99 languages, including:
*Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (zho), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).*
### Breakdown of language support
Word Error Rate (WER) is a key metric used to evaluate the accuracy of transcription systems. It measures how many errors are present in a transcript compared to a reference transcript. Below is a breakdown of the WER for each language that Scribe v1 supports.
Bulgarian (bul), Catalan (cat), Czech (ces), Danish (dan), Dutch (nld), English (eng), Finnish
(fin), French (fra), Galician (glg), German (deu), Greek (ell), Hindi (hin), Indonesian (ind),
Italian (ita), Japanese (jpn), Kannada (kan), Malay (msa), Malayalam (mal), Macedonian (mkd),
Norwegian (nor), Polish (pol), Portuguese (por), Romanian (ron), Russian (rus), Serbian (srp),
Slovak (slk), Spanish (spa), Swedish (swe), Turkish (tur), Ukrainian (ukr) and Vietnamese (vie).
Bengali (ben), Belarusian (bel), Bosnian (bos), Cantonese (yue), Estonian (est), Filipino (fil),
Gujarati (guj), Hungarian (hun), Kazakh (kaz), Latvian (lav), Lithuanian (lit), Mandarin (cmn),
Marathi (mar), Nepali (nep), Odia (ori), Persian (fas), Slovenian (slv), Tamil (tam) and Telugu
(tel)
Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani
(aze), Burmese (mya), Cebuano (ceb), Croatian (hrv), Georgian (kat), Hausa (hau), Hebrew (heb),
Icelandic (isl), Javanese (jav), Kabuverdianu (kea), Korean (kor), Kyrgyz (kir), Lingala (lin),
Maltese (mlt), Mongolian (mon), Māori (mri), Occitan (oci), Punjabi (pan), Sindhi (snd), Swahili
(swa), Tajik (tgk), Thai (tha), Urdu (urd), Uzbek (uzb) and Welsh (cym).
Amharic (amh), Chichewa (nya), Fulah (ful), Ganda (lug), Igbo (ibo), Irish (gle), Khmer (khm),
Kurdish (kur), Lao (lao), Luxembourgish (ltz), Luo (luo), Northern Sotho (nso), Pashto (pus),
Shona (sna), Somali (som), Umbundu (umb), Wolof (wol), Xhosa (xho) and Zulu (zul).
## FAQ
Yes, the API supports uploading both audio and video files for transcription.
Files up to 3 GB in size and up to 10 hours in duration are supported.
The audio supported audio formats include:
* audio/aac
* audio/x-aac
* audio/x-aiff
* audio/ogg
* audio/mpeg
* audio/mp3
* audio/mpeg3
* audio/x-mpeg-3
* audio/opus
* audio/wav
* audio/x-wav
* audio/webm
* audio/flac
* audio/x-flac
* audio/mp4
* audio/aiff
* audio/x-m4a
Supported video formats include:
* video/mp4
* video/x-msvideo
* video/x-matroska
* video/quicktime
* video/x-ms-wmv
* video/x-flv
* video/webm
* video/mpeg
* video/3gpp
ElevenLabs is constantly expanding the number of languages supported by our models. Please check back frequently for updates.
Yes, asynchronous transcription results can be sent to webhooks configured in webhook settings in the UI. Learn more in the [webhooks cookbook](/docs/cookbooks/speech-to-text/webhooks).
Yes, the multichannel STT feature allows you to transcribe audio where each channel is processed independently and assigned a speaker ID based on its channel number. This feature supports up to 5 channels. Learn more in the [multichannel transcription cookbook](/docs/cookbooks/speech-to-text/multichannel-transcription).
# Text to Dialogue
> Learn how to create immersive, natural-sounding dialogue with ElevenLabs.
## Overview
The ElevenLabs [Text to Dialogue](/docs/api-reference/text-to-dialogue) API creates natural sounding expressive dialogue from text using the Eleven v3 model. Popular use cases include:
* Generating pitch perfect conversations for video games
* Creating immersive dialogue for podcasts and other audio content
* Bring audiobooks to life with expressive narration
Text to Dialogue is not intended for use in real-time applications like conversational agents. Several generations might be required to achieve the desired results. When integrating Text to Dialogue into your application, consider generating several generations and allowing the user to select the best one.
Listen to a sample:
Learn how to integrate text to dialogue into your application.
Learn how to use the Eleven v3 model to generate expressive dialogue.
## Voice options
ElevenLabs offers thousands of voices across 70+ languages through multiple creation methods:
* [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices
* [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas
* [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication
* [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions
Learn more about our [voice options](/docs/capabilities/voices).
## Prompting
The models interpret emotional context directly from the text input. For example, adding
descriptive text like "she said excitedly" or using exclamation marks will influence the speech
emotion. Voice settings like Stability and Similarity help control the consistency, while the
underlying emotion comes from textual cues.
Read the [prompting guide](/docs/best-practices/prompting) for more details.
### Emotional deliveries with audio tags
This feature is still under active development, actual results may vary.
The Eleven v3 model allows the use of non-speech audio events to influence the delivery of the dialogue. This is done by inserting the audio events into the text input wrapped in square brackets.
Audio tags come in a few different forms:
### Emotions and delivery
For example, \[sad], \[laughing] and \[whispering]
### Audio events
For example, \[leaves rustling], \[gentle footsteps] and \[applause].
### Overall direction
For example, \[football], \[wrestling match] and \[auctioneer].
Some examples include:
```
"[giggling] That's really funny!"
"[groaning] That was awful."
"Well, [sigh] I'm not sure what to say."
```
You can also use punctuation to indicate the flow of dialog, like interruptions:
```
"[cautiously] Hello, is this seat-"
"[jumping in] Free? [cheerfully] Yes it is."
```
Ellipses can be used to indicate trailing sentences:
```
"[indecisive] Hi, can I get uhhh..."
"[quizzically] The usual?"
"[elated] Yes! [laughs] I'm so glad you knew!"
```
## Supported formats
The default response format is "mp3", but other formats like "PCM", & "μ-law" are available.
* **MP3**
* Sample rates: 22.05kHz - 44.1kHz
* Bitrates: 32kbps - 192kbps
* 22.05kHz @ 32kbps
* 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps
* **PCM (S16LE)**
* Sample rates: 16kHz - 44.1kHz
* Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz
* 16-bit depth
* **μ-law**
* 8kHz sample rate
* Optimized for telephony applications
* **A-law**
* 8kHz sample rate
* Optimized for telephony applications
* **Opus**
* Sample rate: 48kHz
* Bitrates: 32kbps - 192kbps
Higher quality audio options are only available on paid tiers - see our [pricing
page](https://elevenlabs.io/pricing/api) for details.
## Supported languages
The Eleven v3 model supports 70+ languages, including:
*Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).*
## FAQ
Text to Dialogue is only available on the Eleven v3 model.
Yes. You retain ownership of any audio you generate. However, commercial usage rights are only
available with paid plans. With a paid subscription, you may use generated audio for commercial
purposes and monetize the outputs if you own the IP rights to the input content.
A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions:
* Only available within the ElevenLabs dashboard.
* You can regenerate each piece of content up to 2 times for free.
* The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation.
Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data.
There is no limit to the number of speakers in a dialogue.
The models are nondeterministic. For consistency, use the optional [seed
parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle
differences may still occur.
Split long text into segments and use streaming for real-time playback and efficient processing.
# Image & Video
> Generate and edit stunning images and videos from text prompts and visual references.
## Overview
Image & Video enables you to create high-quality visual content from simple text descriptions or reference images. Generate static images or dynamic videos in any style, then refine them iteratively with additional prompts, upscale for high-resolution output, and even add lip-sync with audio.
This feature is currently in beta.
Complete guide to using Image & Video in ElevenLabs.
## Key capabilities
* **Image generation**: Create high-quality images from text prompts or reference images with models optimized for speed or quality
* **Video generation**: Generate dynamic videos with cinematic motion, physics realism, and integrated audio. Video generation is only available on paid plans
* **Iterative refinement**: Refine generations with additional prompts and create variations
* **Enhancement tools**: Upscale resolution by up to 4x and apply realistic lip-sync with audio
* **Multiple models**: Access specialized models for different use cases, from rapid iteration to production-ready content
* **Reference support**: Guide generation with start frames, end frames, and style references. Supports a wide range of image file formats including JPG, PNG, WEBP, and more
* **Export flexibility**: Download as standalone files or import directly into Studio projects
## Workflow
The creation process moves you from inspiration to finished asset in four stages:
**Explore:** Discover community creations to find inspiration and study effective prompts.
**Generate:** Use the prompt box to describe what you want to create, select a model, and fine-tune settings.
**Iterate and enhance:** Review generations, create variations, and apply enhancements like upscaling and lip-syncing.
**Export:** Download finished assets or send them directly to Studio.
## Supported download formats
**Video:**
* **MP4**: Codecs H.264, H.265. Quality up to 4K (with upscaling)
**Image:**
* **PNG**: High-resolution, lossless output
## Models
Image & Video provides access to specialized models optimized for different use cases. Each model offers unique capabilities, from rapid iteration to production-ready quality.
Post-processing models require an existing generated output, though you can also upload your own image or video file.
The most advanced, high-fidelity video model for cinematic results at your disposal.
**Generation inputs:**
* Text-to-Video
* Start Frame
**Features:**
* Highest-fidelity, professional-grade output with synced audio
* Precise multi-shot control
* Excels at complex motion and prompt adherence
* Fixed durations: 4s, 8s, and 12s
* Batch creation with up to 4 generations at a time
**Output options:**
* Resolutions: 720p, 1080p
* Aspect ratios: 16:9, 9:16
**Ideal for:**
* Cinematic, professional-grade video content
**Cost:** Starts at 12,000 credits for a generation
End frame is not currently supported. Cannot provide image references. Sound is enabled by default.
The standard, high-speed version of OpenAI's advanced video model, tuned for everyday content creation.
**Generation inputs:**
* Text-to-Video
* Start Frame
**Features:**
* Realistic, physics-aware videos with synced audio
* Fine scene control
* Fixed durations: 4s, 8s, and 12s
* Batch creation with up to 4 generations at a time
* Strong narrative and character consistency
**Output options:**
* Resolutions: 720p, 1080p
* Aspect ratios: 16:9, 9:16
**Ideal for:**
* Everyday content creation with realistic physics
**Cost:** Starts at 4,000 credits for default settings
End frame is not currently supported. Cannot provide image references. Sound is enabled by default.
A professional-grade model for high-quality, cinematic video generation.
**Generation inputs:**
* Text-to-Video
* Start Frame
* End Frame
* Image References
**Features:**
* Excellent quality and creative control with negative prompts
* Fully integrated and synchronized audio
* Realistic dialogue, lip-sync, and sound effects
* Fixed durations: 4s, 6s, and 8s
* Batch creation with up to 4 generations at a time
* Dedicated sound control
**Output options:**
* Resolutions: 720p, 1080p
* Aspect ratios: 16:9, 9:16
**Ideal for:**
* High-quality, cinematic video generation with full creative control
**Cost:** Starts at 8,000 credits for default settings
Enabling and disabling sound will change the generation credits.
A balanced and versatile model for high-quality, full-HD video generation.
**Generation inputs:**
* Text-to-Video
* Start Frame
**Features:**
* Excels at simulating complex motion and realistic physics
* Accurately models fluid dynamics and expressions
* Fixed durations: 5s and 10s
* Batch creation with up to 4 generations at a time
**Output options:**
* Resolutions: 1080p
* Aspect ratios: 16:9, 1:1, 9:16
**Ideal for:**
* Realistic physics simulations and complex motion
**Cost:** Starts at 3,500 credits for default settings
End frame is not currently supported. Cannot provide image references. Sound control not available.
A high-speed model optimized for rapid previews and generations, delivering sharper visuals with lower latency.
**Generation inputs:**
* Text-to-Video
* Start Frame
* End Frame
**Features:**
* Advanced creative control with negative prompts and dedicated sound control
* Fixed durations: 4s, 6s, and 8s
* Batch creation with up to 4 generations at a time
* Accurately models real-world physics for realistic motion and interactions
**Output options:**
* Resolutions: 720p, 1080p
* Aspect ratios: 16:9, 9:16
**Ideal for:**
* Quick iteration and A/B testing visuals
* Fast-paced social media content creation
**Cost:** Starts at 4,000 credits for default settings
Production-ready model delivering exceptional quality, strong physics realism, and coherent narrative audio.
**Generation inputs:**
* Text-to-Video
* Start Frame
**Features:**
* Advanced integrated "narrative audio" generation that matches video tone and story
* Granular creative control with negative prompts and dedicated sound control
* Fixed durations: 4s, 6s, and 8s
* Batch creation with up to 4 generations at a time
**Output options:**
* Resolutions: 720p, 1080p
* Aspect ratios: 16:9, 9:16
**Ideal for:**
* Final renders and professional marketing content
* Short-form storytelling
**Cost:** Starts at 8,000 credits for default settings
A high-speed, cost-efficient model for generating audio-backed video from text or a starting image.
**Generation inputs:**
* Text-to-Video
* Start Frame
**Features:**
* Granular creative control with negative prompts and dedicated sound control
* Fixed durations: 4s, 6s, and 8s
* Batch creation with up to 4 generations at a time
**Output options:**
* Resolutions: 720p, 1080p
* Aspect ratios: 16:9, 9:16
**Ideal for:**
* Rapid iteration and previews
* Cost-effective content creation
**Cost:** Starts at 4,000 credits for default settings
A specialized model for creating dynamic, multi-shot sequences with large movement and action.
**Generation inputs:**
* Text-to-Video
* Start Frame
* End Frame
**Features:**
* Highly stable physics and seamless transitions between shots
* Fixed durations: 3s, 4s, 5s, 6s, 7s, 8s, 9s, 10s, 11s, and 12s
* Batch creation with up to 4 generations at a time
* Maximum creative flexibility with numerous aspect ratio options
**Output options:**
* Resolutions: 480p, 720p, 1080p
* Aspect ratios: 21:9, 16:9, 4:3, 1:1, 3:4, 9:16
**Ideal for:**
* Storytelling and action scenes requiring stable physics
**Cost:** Starts at 4,800 credits for default settings
Aspect ratio and resolution do not affect generation credits, but duration does.
A versatile model that delivers cinematic motion and high prompt fidelity from text or a starting image.
**Generation inputs:**
* Text-to-Video
* Start Frame (Image-to-Video)
**Features:**
* Granular creative control with negative prompts and dedicated sound control
* Fixed durations: 5s and 10s
* Batch creation with up to 4 generations at a time
**Output options:**
* Resolutions: 480p, 720p, 1080p
* Aspect ratios: 16:9, 1:1, 9:16
**Ideal for:**
* Cinematic content with strong prompt adherence
**Cost:** Starts at 2,500 credits for default settings
Generation cost varies based on selected settings.
A high-speed model for quick, high-quality image generation and editing directly from text prompts.
**Features:**
* Supports multiple image references to guide generation
* Generates up to 4 images at a time
**Output options:**
* Aspect ratios: 21:9, 16:9, 5:4, 4:3, 3:2, 1:1, 2:3, 3:4, 4:5, 9:16
**Ideal for:**
* Rapid image creation and iteration
**Cost:** Starts at 2,000 credits for default settings; varies based on number of generations
A specialized image model for generating multi-shot sequences or scenes with large movement and action.
**Features:**
* Excels at creating images with stable physics and coherent transitions
* Supports multiple image references to guide generation
* Generates up to 4 images at a time
**Output options:**
* Aspect ratios: auto, 16:9, 4:3, 1:1, 3:4, 9:16
**Ideal for:**
* Action scenes and dynamic compositions
**Cost:** Starts at 1,200 credits for default settings; varies based on number of generations
A professional model for advanced image generation and editing, offering strong scene coherence and style control.
**Features:**
* Image-based style control requiring a reference image to guide visual aesthetic
* Generates up to 4 images at a time
**Output options:**
* Aspect ratios: 21:9, 16:9, 4:3, 3:2, 1:1, 2:3, 3:4, 4:5, 9:16, 9:21
**Ideal for:**
* Professional content with precise style requirements
**Cost:** Starts at 1,600 credits; varies based on settings and number of generations
An image model with strong prompt fidelity and motion awareness, ideal for capturing dynamic action in a still frame.
**Features:**
* Granular control with negative prompts
* Supports multiple image references to guide generation
* Generates up to 4 images at a time
**Output options:**
* Aspect ratios: 16:9, 4:3, 1:1, 3:4, 9:16
**Ideal for:**
* Dynamic still images with motion awareness
**Cost:** Starts at 2,000 credits; varies based on settings
A versatile model for precise, high-quality image creation and detailed editing guided by natural language prompts.
**Features:**
* Supports multiple image references to guide generation
* Generates up to 4 images at a time
**Output options:**
* Aspect ratios: 3:2, 1:1, 2:3
* Quality options: low, medium, high
**Ideal for:**
* Creating and editing images with precise, text-based control
**Cost:** Starts at 2,400 credits for default settings; varies based on settings and number of generations
A dedicated utility model for generating exceptionally realistic, humanlike lip-sync.
**Inputs:**
* Static source image
* Speech audio file
**Features:**
* Animates the mouth on the source image to match provided audio
* Creates high-fidelity "talking" video from still images
* Lip-sync specific tool, not a full video generation model
**Ideal for:**
* Creating talking avatars
* Adding dialogue to still images
* Professional dubbing workflows
**Cost:** Depends on generation input
For best results, the image should contain a detectable figure.
A fast, affordable, and precise utility model for applying realistic lip-sync to videos.
**Inputs:**
* Source video
* New speech audio file
**Features:**
* Re-animates mouth movements in source video to match new audio
* Video-to-video lip-sync tool, not a full video generator
**Ideal for:**
* High-volume, cost-effective dubbing
* Translating content
* Correcting audio in video clips with realistic results
**Cost:** Depends on generation input
For best results, the video should contain a detectable figure.
A dedicated utility model for image and video upscaling, designed to enhance resolution and detail up to 4x.
**Features:**
* Enhancement tool that processes existing media
* Increases media size while preserving natural textures and minimizing artifacts
* Highly granular upscale factors: 1x, 1.25x, 1.5x, 1.75x, 2x, 3x, 4x
* Video-specific: Flexible frame rate control (keep source or convert to 24, 25, 30, 48, 50, or 60 fps)
**Ideal for:**
* Improving quality of generated media
* Restoring legacy footage or photos
* Preparing assets for high-resolution displays
**Cost:** Depends on generation input
# Voice changer
> Learn how to transform audio between voices while preserving emotion and delivery.
## Overview
ElevenLabs [voice changer](/docs/api-reference/speech-to-speech/convert) API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to:
* Change any voice while preserving emotional delivery and nuance
* Create consistent character voices across multiple languages and recording sessions
* Fix or replace specific words and phrases in existing recordings
Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project.
Learn how to integrate voice changer into your application.
Step-by-step guide for using voice changer in ElevenLabs.
## Supported languages
Our multilingual v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
The `eleven_english_sts_v2` model only supports English.
## Best practices
### Audio quality
* Record in a quiet environment to minimize background noise
* Maintain appropriate microphone levels - avoid too quiet or peaked audio
* Use `remove_background_noise=true` if environmental sounds are present
### Recording guidelines
* Keep segments under 5 minutes for optimal processing
* Feel free to include natural expressions (laughs, sighs, emotions)
* The source audio's accent and language will be preserved in the output
### Parameters
* **Style**: Set to 0% when input audio is already expressive
* **Stability**: Use 100% for maximum voice consistency
* **Language**: Choose source audio that matches your desired accent and language
## FAQ
Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability
and consistent output.
Absolutely. Provide your custom voice’s voice\_id and specify the correct{' '}
model\_id.
You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no
additional fee based on file size.
Possibly. Use remove\_background\_noise=true or the Voice Isolator tool to minimize
environmental sounds in the final output.
Though eleven\_english\_sts\_v2 is available, our{' '}
eleven\_multilingual\_sts\_v2 model often outperforms it, even for English material.
“Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances
in the source audio, turn style down and stability up.
# Voice isolator
> Learn how to isolate speech from background noise, music, and ambient sounds from any audio.
## Overview
ElevenLabs [voice isolator](/docs/api-reference/audio-isolation/audio-isolation) API transforms audio recordings with background noise into clean, studio-quality speech. This is particularly useful for audio recorded in noisy environments, or recordings containing unwanted ambient sounds, music, or other background interference.
Listen to a sample:
## Usage
The voice isolator model extracts speech from background noise in both audio and video files.
Learn how to integrate voice isolator into your application.
Step-by-step guide for using voice isolator in ElevenLabs.
### Supported file types
* **Audio**: AAC, AIFF, OGG, MP3, OPUS, WAV, FLAC, M4A
* **Video**: MP4, AVI, MKV, MOV, WMV, FLV, WEBM, MPEG, 3GPP
## FAQ
* **Cost**: Voice isolator costs 1000 characters for every minute of audio.
* **File size and length**: Supports files up to 500MB and 1 hour in length.
* **Music vocals**: Not specifically optimized for isolating vocals from music, but may work depending on the content.
# Dubbing
> Learn how to translate audio and video while preserving the emotion, timing & tone of speakers.
## Overview
ElevenLabs [dubbing](/docs/api-reference/dubbing/create) API translates audio and video across 32 languages while preserving the emotion, timing, tone and unique characteristics of each speaker. Our model separates each speaker’s dialogue from the soundtrack, allowing you to recreate the original delivery in another language. It can be used to:
* Grow your addressable audience by 4x to reach international audiences
* Adapt existing material for new markets while preserving emotional nuance
* Offer content in multiple languages without re-recording voice talent
We also offer a [fully managed dubbing service](https://elevenlabs.io/elevenstudios) for video and podcast creators.
## Usage
ElevenLabs dubbing can be used in three ways:
* **Dubbing Studio** in the user interface for fast, interactive control and editing
* **Programmatic integration** via our [API](/docs/api-reference/dubbing/create) for large-scale or automated workflows
* **Human-verified dubs via ElevenLabs Productions** - for more information, please reach out to [productions@elevenlabs.io](mailto:productions@elevenlabs.io)
The UI supports files up to **500MB** and **45 minutes**. The API supports files up to **1GB** and **2.5 hours**.
Learn how to integrate dubbing into your application.
Edit transcripts and translate videos step by step in Dubbing Studio.
### Key features
**Speaker separation**
Automatically detect multiple speakers, even with overlapping speech.
**Multi-language output**
Generate localized tracks in 32 languages.
**Preserve original voices**
Retain the speaker’s identity and emotional tone.
**Keep background audio**
Avoid re-mixing music, effects, or ambient sounds.
**Customizable transcripts**
Manually edit translations and transcripts as needed.
**Supported file types**
Videos and audio can be dubbed from various sources, including YouTube, X, TikTok, Vimeo, direct URLs, or file uploads.
**Video transcript and translation editing**
Our AI video translator lets you manually edit transcripts and translations to ensure your content is properly synced and localized. Adjust the voice settings to tune delivery, and regenerate speech segments until the output sounds just right.
A Creator plan or higher is required to dub audio files. For videos, a watermark option is
available to reduce credit usage.
### Cost
To reduce credit usage, you can:
* Dub only a selected portion of your file
* Use watermarks on video output (not available for audio)
* Fine-tune transcripts and regenerate individual segments instead of the entire clip
Refer to our [pricing page](https://elevenlabs.io/pricing) for detailed credit costs.
## List of supported languages for dubbing
| No | Language Name | Language Code |
| -- | ------------- | ------------- |
| 1 | English | en |
| 2 | Hindi | hi |
| 3 | Portuguese | pt |
| 4 | Chinese | zh |
| 5 | Spanish | es |
| 6 | French | fr |
| 7 | German | de |
| 8 | Japanese | ja |
| 9 | Arabic | ar |
| 10 | Russian | ru |
| 11 | Korean | ko |
| 12 | Indonesian | id |
| 13 | Italian | it |
| 14 | Dutch | nl |
| 15 | Turkish | tr |
| 16 | Polish | pl |
| 17 | Swedish | sv |
| 18 | Filipino | fil |
| 19 | Malay | ms |
| 20 | Romanian | ro |
| 21 | Ukrainian | uk |
| 22 | Greek | el |
| 23 | Czech | cs |
| 24 | Danish | da |
| 25 | Finnish | fi |
| 26 | Bulgarian | bg |
| 27 | Croatian | hr |
| 28 | Slovak | sk |
| 29 | Tamil | ta |
## FAQ
Dubbing can be performed on all types of short and long form video and audio content. We
recommend dubbing content with a maximum of 9 unique speakers at a time to ensure a high-quality
dub.
Yes. Our models analyze each speaker’s original delivery to recreate the same tone, pace, and
style in your target language.
We use advanced source separation to isolate individual voices from ambient sound. Multiple
overlapping speakers can be split into separate tracks.
Via the user interface, the maximum file size is 1GB up to 45 minutes. Through the API, you can
process files up to 1GB and 2.5 hours.
You can choose to dub only certain portions of your video/audio or tweak translations/voices in
our interactive Dubbing Studio.
# Sound effects
> Learn how to create high-quality sound effects from text with ElevenLabs.
## Overview
ElevenLabs [sound effects](/docs/api-reference/text-to-sound-effects/convert) API turns text descriptions into high-quality audio effects with precise control over timing, style and complexity. The model understands both natural language and audio terminology, enabling you to:
* Generate cinematic sound design for films & trailers
* Create custom sound effects for games & interactive media
* Produce Foley and ambient sounds for video content
Listen to an example:
## Usage
Sound effects are generated using text descriptions & two optional parameters:
* **Duration**: Set a specific length for the generated audio (in seconds)
* Default: Automatically determined based on the prompt
* Range: 0.1 to 30 seconds
* Cost: 40 credits per second when duration is specified
* **Looping**: Enable seamless looping for sound effects longer than 30 seconds
* Creates sound effects that can be played on repeat without perceptible start/end points
* Perfect for atmospheric sounds, ambient textures, and background elements
* Example: Generate 30s of 'soft rain' then loop it endlessly for atmosphere in audiobooks, films, games
* **Prompt influence**: Control how strictly the model follows the prompt
* High: More literal interpretation of the prompt
* Low: More creative interpretation with added variations
Learn how to integrate sound effects into your application.
Step-by-step guide for using sound effects in ElevenLabs.
### Prompting guide
#### Simple effects
For basic sound effects, use clear, concise descriptions:
* "Glass shattering on concrete"
* "Heavy wooden door creaking open"
* "Thunder rumbling in the distance"
#### Complex sequences
For multi-part sound effects, describe the sequence of events:
* "Footsteps on gravel, then a metallic door opens"
* "Wind whistling through trees, followed by leaves rustling"
* "Sword being drawn, then clashing with another blade"
#### Musical elements
The API also supports generation of musical components:
* "90s hip-hop drum loop, 90 BPM"
* "Vintage brass stabs in F minor"
* "Atmospheric synth pad with subtle modulation"
#### Audio Terminology
Common terms that can enhance your prompts:
* **Impact**: Collision or contact sounds between objects, from subtle taps to dramatic crashes
* **Whoosh**: Movement through air effects, ranging from fast and ghostly to slow-spinning or rhythmic
* **Ambience**: Background environmental sounds that establish atmosphere and space
* **One-shot**: Single, non-repeating sound
* **Loop**: Repeating audio segment
* **Stem**: Isolated audio component
* **Braam**: Big, brassy cinematic hit that signals epic or dramatic moments, common in trailers
* **Glitch**: Sounds of malfunction, jittering, or erratic movement, useful for transitions and sci-fi
* **Drone**: Continuous, textured sound that creates atmosphere and suspense
## FAQ
The maximum duration is 30 seconds per generation. For longer sequences, you can either generate
multiple effects and combine them, or use the looping feature to create seamless repeating sound
effects.
Yes, you can generate musical elements like drum loops, bass lines, and melodic samples.
However, for full music production, consider combining multiple generated elements.
Use detailed prompts, appropriate duration settings, and high prompt influence for more
predictable results. For complex sounds, generate components separately and combine them.
Generated audio is provided in MP3 format with professional-grade quality. For WAV downloads of
non-looping sound effects, audio is delivered at 48kHz sample rate - the industry standard for
film, TV, video, and game audio, ensuring no resampling is needed for professional workflows.
Looping sound effects are designed to play seamlessly on repeat without noticeable start or end
points. This is perfect for creating continuous atmospheric sounds, ambient textures, or
background elements that need to play indefinitely. For example, you can generate 30 seconds of
rain sounds and loop them endlessly for background atmosphere in audiobooks, films, or games.
# Voices
> Learn how to create, customize, and manage voices with ElevenLabs.
## Overview
ElevenLabs provides models for voice creation & customization. The platform supports a wide range of voice options, including voices from our extensive [voice library](https://elevenlabs.io/app/voice-library), voice cloning, and artificially designed voices using text prompts.
### Voice types
* **Community**: Voices shared by the community from the ElevenLabs [voice library](/docs/product-guides/voices/voice-library).
* **Cloned**: Custom voices created using instant or professional [voice cloning](/docs/product-guides/voices/voice-cloning).
* **Voice design**: Artificially designed voices created with the [voice design](/docs/product-guides/voices/voice-design) tool.
* **Default**: Pre-designed, high-quality voices optimized for general use.
Voices that you personally own, either created with Instant Voice Cloning, Professional Voice
Cloning, or Voice Design, can be used for [Voice Remixing](/docs/capabilities/voice-remixing).
#### Community
The [voice library](/docs/product-guides/voices/voice-library) contains over 5,000 voices shared by the ElevenLabs community. Use it to:
* Discover unique voices shared by the ElevenLabs community.
* Add voices to your personal collection.
* Share your own voice clones for cash rewards when others use it.
Share your voice with the community, set your terms, and earn cash rewards when others use it.
We've paid out over **\$1M** already.
Learn how to use voices from the voice library
#### Cloned
Clone your own voice from 30-second samples with Instant Voice Cloning, or create hyper-realistic voices using Professional Voice Cloning.
* **Instant Voice Cloning**: Quickly replicate a voice from short audio samples.
* **Professional Voice Cloning**: Generate professional-grade voice clones with extended training audio.
Voice-captcha technology is used to verify that **all** voice clones are created from your own voice samples.
A Creator plan or higher is required to create voice clones.
Clone a voice instantly
Create a perfect voice clone
Learn how to create instant & professional voice clones
#### Voice design
With [Voice Design](/docs/product-guides/voices/voice-design), you can create entirely new voices by specifying attributes like age, gender, accent, and tone. Generated voices are ideal for:
* Realistic voices with nuanced characteristics.
* Creative character voices for games and storytelling.
The voice design tool creates 3 voice previews, simply provide:
* A **voice description** between 20 and 1000 characters.
* A **text** to preview the voice between 100 and 1000 characters.
##### Voice design with Eleven v3 (alpha)
Using the new [Eleven v3 model](/docs/models#eleven-v3-alpha), voices that are capable of a wide range of emotion can be designed via a prompt.
Using v3 gets you the following benefits:
* More natural and versatile voice generation.
* Better control over voice characteristics.
* Audio tags supported in Preview generations.
* Backward compatibility with v2 models.
Integrate voice design into your application.
Learn how to craft voices from a single prompt.
#### Default
Our curated set of default voices is optimized for core use cases. These voices are:
* **Reliable**: Available long-term.
* **Consistent**: Carefully crafted and quality-checked for performance.
* **Model-ready**: Fine-tuned on new models upon release.
Default voices are available to all users via the **my voices** tab in the [voice lab
dashboard](https://elevenlabs.io/app/voice-lab). Default voices were previously referred to as
`premade` voices. The latter term is still used when accessing default voices via the API.
### Managing voices
All voices can be managed through **My Voices**, where you can:
* Search, filter, and categorize voices
* Add descriptions and custom tags
* Organize voices for quick access
Learn how to manage your voice collection in [My Voices documentation](/docs/product-guides/voices/voice-library).
* **Search and Filter**: Find voices using keywords or tags.
* **Preview Samples**: Listen to voice demos before adding them to **My Voices**.
* **Add to Collection**: Save voices for easy access in your projects.
> **Tip**: Try searching by specific accents or genres, such as "Australian narration" or "child-like character."
### Supported languages
All ElevenLabs voices support multiple languages. Experiment by converting phrases like `Hello! こんにちは! Bonjour!` into speech to hear how your own voice sounds across different languages.
ElevenLabs supports voice creation in 32 languages. Match your voice selection to your target region for the most natural results.
* **Default Voices**: Optimized for multilingual use.
* **Generated and Cloned Voices**: Accent fidelity depends on input samples or selected attributes.
Our multilingual v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
Flash v2.5 supports 32 languages - all languages from v2 models plus:
*Hungarian, Norwegian & Vietnamese*
[Learn more about our models](/docs/models)
## FAQ
Yes, you can create custom voices with Voice Design or clone voices using Instant or
Professional Voice Cloning. Both options are accessible in **My Voices**.
Instant Voice Cloning uses short audio samples for near-instantaneous voice creation.
Professional Voice Cloning requires longer samples but delivers hyper-realistic, high-quality
results.
Professional Voice Clones can be shared privately or publicly in the Voice Library. Generated
voices and Instant Voice Clones cannot currently be shared.
Use **My Voices** to search, filter, and organize your voice collection. You can also delete,
tag, and categorize voices for easier management.
Use clean and consistent audio samples. For Professional Voice Cloning, provide a variety of
recordings in the desired speaking style.
Yes, Professional Voice Clones can be shared in the Voice Library. Instant Voice Clones and
Generated Voices cannot currently be shared.
Generated Voices are ideal for unique characters in games, animations, and creative
storytelling.
Go to **Voices > Voice Library** in your dashboard or access it via API.
# Voice remixing
> Learn how to transform and enhance existing voices by modifying their attributes including gender, accent, style, pacing, audio quality, and more.
Voice remixing is currently in alpha.
## Overview
ElevenLabs voice remixing is available on the core platform and via API. This feature transforms existing voices by allowing you to modify their core attributes while maintaining the unique characteristics that make them recognizable. This is particularly useful for adapting voices to different contexts, creating variations for different characters, or improving and/or changing the audio quality of existing voice profiles.
As an example, here is an original voice:
And here is a remixed version, switching to a San Francisco accent:
## Usage
The voice remixing model allows you to iteratively transform voices you own by adjusting multiple attributes through natural language prompts and customizable settings.
Integrate voice remixing into your application.
{/*
Learn how to craft voices from a single prompt.
*/}
### Key Features
* **Attribute Modification**: Change gender, accent, speaking style, pacing, and audio quality of any voice you own
* **Iterative Editing**: Continue refining voices based on previously remixed versions
* **Script Flexibility**: Use default scripts or input custom scripts with v3 model audio tags like `[laughs]` or `[whispers]`
* **Prompt Strength Control**: Adjust remix intensity from low to high for precise control over transformations
### Remixing parameters
#### Prompt Strength
Voice remixing offers varying degrees of prompt strength to control how much your voice transforms:
* **Low**: Subtle changes that maintain most of the original voice characteristics
* **Medium**: Balanced transformation that modifies key attributes while preserving voice identity
* **High**: Strong adherence to remix prompt, may significantly change the tonality of the original voice
* **Max**: A full transformation of the voice, but at the cost of changing the voice entirely
#### Script Options
* **Default Scripts**: Pre-configured scripts optimized for voice remixing
* **Custom Scripts**: Input your own text with support for v3 model audio tags such as:
* `[laughs]` - Add laughter
* `[whispers]` - Convert to whispered speech
* `[sighs]` - Add sighing
* Additional emotion and style tags supported which can help craft the voice
### Tips and Tricks
#### Getting Started
Start with a high prompt strength early in your experimentation to understand the full range of transformation possibilities. You’ll need to have a voice to start with, if you haven’t already created a voice, experiment with default voices available in your library to understand how different base voices respond to remixing.
You can create custom voices using [Voice Design](/docs/product-guides/voices/voice-design) as starting points for unique remixes.
#### Advanced Techniques
* **Iterative refinement**: Sometimes multiple iterations are needed to achieve the desired voice quality. Each remix can serve as the base for the next transformation
* **Combine attributes gradually**: When making multiple changes (e.g., accent and pacing), consider applying them in separate iterations for more control
* **Test with varied content**: Different scripts may highlight different aspects of your remixed voice
### Supported Voice Formats
#### Input
* Any cloned voice that you personally own (Instant Voice Clone or Professional Voice Clone)
* Voices created through our Voice Design product
#### Output
* Full-quality voice model in v3 (but backwards compatibility to all other models)
* Iteratively editable voice that can be further remixed
## FAQ
Voice remixing costs are calculated based on the length of the test script used during the
remixing process.
No, voice remixing is only available for voices in your personal library that you have ownership
or appropriate permissions for.
There is no limit to iterative remixing. You can continue refining a voice through multiple
generations of remixes.
No, remixing creates a new voice variant. Your original voice remains unchanged and available in
your library.
Voice Design creates new voices from scratch using text prompts, while Voice Remixing modifies
existing voices you already own.
# Forced Alignment
> Learn how to turn spoken audio and text into a time-aligned transcript with ElevenLabs.
## Overview
The ElevenLabs [Forced Alignment](/docs/api-reference/forced-alignment) API turns spoken audio and text into a time-aligned transcript. This is useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. This can be used for:
* Matching subtitles to a video recording
* Generating timings for an audiobook recording of an ebook
## Usage
The Forced Alignment API can be used by interfacing with the ElevenLabs API directly.
Learn how to integrate Forced Alignment into your application.
## Supported languages
Our multilingual v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
## FAQ
Forced alignment is a technique used to align spoken audio with text. You provide an audio file and a transcript of the audio file and the API will return a time-aligned transcript.
It's useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript.
The input text should be a string with no special formatting i.e. JSON.
Example of good input text:
```
"Hello, how are you?"
```
Example of bad input text:
```
{
"text": "Hello, how are you?"
}
```
Forced Alignment costs the same as the [Speech to Text](/docs/capabilities/speech-to-text#pricing) API.
Forced Alignment does not support diarization. If you provide diarized text, the API will likely return unwanted results.
The maximum file size for Forced Alignment is 3GB.
For audio files, the maximum duration is 10 hours.
For the text input, the maximum length is 675k characters.
# Eleven Music
> Learn how to create studio-grade music with natural language prompts in any style with ElevenLabs.
## Overview
Eleven Music is a Text to Music model that generates studio-grade music with natural language prompts in any style. It's designed to understand intent and generate complete, context-aware audio based on your goals. The model understands both natural language and musical terminology, providing you with state-of-the-art features:
* Complete control over genre, style, and structure
* Vocals or just instrumental
* Multilingual, including English, Spanish, German, Japanese and more
* Edit the sound and lyrics of individual sections or the whole song
Listen to a sample:
Created in collaboration with labels, publishers, and artists, Eleven Music is cleared for nearly all commercial uses, from film and television to podcasts and social media videos, and from advertisements to gaming. For more information on supported usage across our different plans, [see our music terms](http://elevenlabs.io/music-terms).
## Usage
Eleven Music is available today on the ElevenLabs website, with public API access and integration into our Agents Platform coming soon.
Created in collaboration with labels, publishers, and artists, Eleven Music is cleared for nearly all commercial uses, from film and television to podcasts and social media videos, and from advertisements to gaming. For more information on supported usage across our different plans, head here.
Eleven Music is available today on our website, with public API access and integration into our Agents Platform coming soon. Check out our prompt engineering guide to help you master the full range of the model’s capabilities.
Learn how to use Eleven Music with natural language prompts.
Step-by-step guide for using Eleven Music on the ElevenLabs Creative Platform.
Step-by-step guide for using Eleven Music with the API.
## FAQ
Generated music has a minimum duration of 10 seconds and a maximum duration of 5 minutes.
Yes, refer to the [developer quickstart](/docs/cookbooks/music) for more information.
Yes, Eleven Music is cleared for nearly all commercial uses, from film and television to
podcasts and social media videos, and from advertisements to gaming. For more information on
supported usage across our different plans, [see our music
terms](http://elevenlabs.io/music-terms).
Generated audio is provided in MP3 format with professional-grade quality (44.1kHz,
128-192kbps). Other audio formats will be supported soon.
# Streaming text to speech
> Learn how to stream text into speech in Python or Node.js.
In this tutorial, you'll learn how to convert [text to speech](https://elevenlabs.io/text-to-speech) with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application.
If you want to jump straight to an example you can find them in the [Python](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) and [Node.js](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) example repositories.
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/docs/developer-guides/quickstart#authentication)).
* Python or Node installed on your machine
* (Optionally) an AWS account with access to S3.
## Setup
### Installing our SDK
Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip:
```bash Python
pip install elevenlabs
```
```bash TypeScript
npm install @elevenlabs/elevenlabs-js
```
Additionally, install necessary packages to manage your environmental variables:
```bash Python
pip install python-dotenv
```
```bash TypeScript
npm install dotenv
npm install @types/dotenv --save-dev
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Convert text to speech (file)
To convert text to speech and save it as a file, we’ll use the `convert` method of the ElevenLabs SDK and then it locally as a `.mp3` file.
```python Python
import os
import uuid
from dotenv import load_dotenv
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
def text_to_speech_file(text: str) -> str:
# Calling the text_to_speech conversion API with detailed parameters
response = elevenlabs.text_to_speech.convert(
voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
output_format="mp3_22050_32",
text=text,
model_id="eleven_turbo_v2_5", # use the turbo model for low latency
# Optional voice settings that allow you to customize the output
voice_settings=VoiceSettings(
stability=0.0,
similarity_boost=1.0,
style=0.0,
use_speaker_boost=True,
speed=1.0,
),
)
# uncomment the line below to play the audio back
# play(response)
# Generating a unique file name for the output MP3 file
save_file_path = f"{uuid.uuid4()}.mp3"
# Writing the audio to a file
with open(save_file_path, "wb") as f:
for chunk in response:
if chunk:
f.write(chunk)
print(f"{save_file_path}: A new audio file was saved successfully!")
# Return the path of the saved audio file
return save_file_path
```
```typescript TypeScript
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import * as dotenv from 'dotenv';
import { createWriteStream } from 'fs';
import { v4 as uuid } from 'uuid';
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const elevenlabs = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export const createAudioFileFromText = async (text: string): Promise => {
return new Promise(async (resolve, reject) => {
try {
const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
modelId: 'eleven_multilingual_v2',
text,
outputFormat: 'mp3_44100_128',
// Optional voice settings that allow you to customize the output
voiceSettings: {
stability: 0,
similarityBoost: 0,
useSpeakerBoost: true,
speed: 1.0,
},
});
const fileName = `${uuid()}.mp3`;
const fileStream = createWriteStream(fileName);
audio.pipe(fileStream);
fileStream.on('finish', () => resolve(fileName)); // Resolve with the fileName
fileStream.on('error', reject);
} catch (error) {
reject(error);
}
});
};
```
You can then run this function with:
```python Python
text_to_speech_file("Hello World")
```
```typescript TypeScript
await createAudioFileFromText('Hello World');
```
## Convert text to speech (streaming)
If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature.
```python Python
import os
from typing import IO
from io import BytesIO
from dotenv import load_dotenv
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
def text_to_speech_stream(text: str) -> IO[bytes]:
# Perform the text-to-speech conversion
response = elevenlabs.text_to_speech.stream(
voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
output_format="mp3_22050_32",
text=text,
model_id="eleven_multilingual_v2",
# Optional voice settings that allow you to customize the output
voice_settings=VoiceSettings(
stability=0.0,
similarity_boost=1.0,
style=0.0,
use_speaker_boost=True,
speed=1.0,
),
)
# Create a BytesIO object to hold the audio data in memory
audio_stream = BytesIO()
# Write each chunk of audio data to the stream
for chunk in response:
if chunk:
audio_stream.write(chunk)
# Reset stream position to the beginning
audio_stream.seek(0)
# Return the stream for further use
return audio_stream
```
```typescript TypeScript
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import * as dotenv from 'dotenv';
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
if (!ELEVENLABS_API_KEY) {
throw new Error('Missing ELEVENLABS_API_KEY in environment variables');
}
const elevenlabs = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export const createAudioStreamFromText = async (text: string): Promise => {
const audioStream = await elevenlabs.textToSpeech.stream('JBFqnCBsd6RMkjVDRZzb', {
modelId: 'eleven_multilingual_v2',
text,
outputFormat: 'mp3_44100_128',
// Optional voice settings that allow you to customize the output
voiceSettings: {
stability: 0,
similarityBoost: 1.0,
useSpeakerBoost: true,
speed: 1.0,
},
});
const chunks: Buffer[] = [];
for await (const chunk of audioStream) {
chunks.push(chunk);
}
const content = Buffer.concat(chunks);
return content;
};
```
You can then run this function with:
```python Python
text_to_speech_stream("This is James")
```
```typescript TypeScript
await createAudioStreamFromText('This is James');
```
## Bonus - Uploading to AWS S3 and getting a secure sharing link
Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link.
To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your `.env` file. Follow these steps to find the credentials:
1. Log in to your AWS Management Console: Navigate to the AWS home page and sign in with your account.
2. Access the IAM (Identity and Access Management) Dashboard: You can find IAM under "Security, Identity, & Compliance" on the services menu. The IAM dashboard manages access to your AWS services securely.
3. Create a New User (if necessary): On the IAM dashboard, select "Users" and then "Add user". Enter a user name.
4. Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it's best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket.
5. Review and create the user: Review your settings and create the user. Upon creation, you'll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step.
6. Get AWS region name: ex. us-east-1
If you do not have an AWS S3 bucket, you will need to create a new one by following these steps:
1. Access the S3 dashboard: You can find S3 under "Storage" on the services menu.
2. Create a new bucket: On the S3 dashboard, click the "Create bucket" button.
3. Enter a bucket name and click on the "Create bucket" button. You can leave the other bucket options as default. The newly added bucket will appear in the list.
Install `boto3` for interacting with AWS services using `pip` and `npm`.
```bash Python
pip install boto3
```
```bash TypeScript
npm install @aws-sdk/client-s3
npm install @aws-sdk/s3-request-presigner
```
Then add the environment variables to `.env` file like so:
```
AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
AWS_REGION_NAME=your_aws_region_name_here
AWS_S3_BUCKET_NAME=your_s3_bucket_name_here
```
Add the following functions to upload the audio stream to S3 and generate a signed URL.
```python s3_uploader.py (Python)
import os
import boto3
import uuid
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_REGION_NAME = os.getenv("AWS_REGION_NAME")
AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")
session = boto3.Session(
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_REGION_NAME,
)
s3 = session.client("s3")
def generate_presigned_url(s3_file_name: str) -> str:
signed_url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name},
ExpiresIn=3600,
) # URL expires in 1 hour
return signed_url
def upload_audiostream_to_s3(audio_stream) -> str:
s3_file_name = f"{uuid.uuid4()}.mp3" # Generates a unique file name using UUID
s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name)
return s3_file_name
```
```typescript s3_uploader.ts (TypeScript)
import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
import * as dotenv from 'dotenv';
import { v4 as uuid } from 'uuid';
dotenv.config();
const { AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME, AWS_S3_BUCKET_NAME } =
process.env;
if (!AWS_ACCESS_KEY_ID || !AWS_SECRET_ACCESS_KEY || !AWS_REGION_NAME || !AWS_S3_BUCKET_NAME) {
throw new Error('One or more environment variables are not set. Please check your .env file.');
}
const s3 = new S3Client({
credentials: {
accessKeyId: AWS_ACCESS_KEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
},
region: AWS_REGION_NAME,
});
export const generatePresignedUrl = async (objectKey: string) => {
const getObjectParams = {
Bucket: AWS_S3_BUCKET_NAME,
Key: objectKey,
Expires: 3600,
};
const command = new GetObjectCommand(getObjectParams);
const url = await getSignedUrl(s3, command, { expiresIn: 3600 });
return url;
};
export const uploadAudioStreamToS3 = async (audioStream: Buffer) => {
const remotePath = `${uuid()}.mp3`;
await s3.send(
new PutObjectCommand({
Bucket: AWS_S3_BUCKET_NAME,
Key: remotePath,
Body: audioStream,
ContentType: 'audio/mpeg',
})
);
return remotePath;
};
```
You can then call uploading function with the audio stream from the text.
```python Python
s3_file_name = upload_audiostream_to_s3(audio_stream)
```
```typescript TypeScript
const s3path = await uploadAudioStreamToS3(stream);
```
After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing.
You can now generate a URL from a file with:
```python Python
signed_url = generate_presigned_url(s3_file_name)
print(f"Signed URL to access the file: {signed_url}")
```
```typescript TypeScript
const presignedUrl = await generatePresignedUrl(s3path);
console.log('Presigned URL:', presignedUrl);
```
If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire.
To put it all together, you can use the following script:
```python main.py (Python)
import os
from dotenv import load_dotenv
load_dotenv()
from text_to_speech_stream import text_to_speech_stream
from s3_uploader import upload_audiostream_to_s3, generate_presigned_url
def main():
text = "This is James"
audio_stream = text_to_speech_stream(text)
s3_file_name = upload_audiostream_to_s3(audio_stream)
signed_url = generate_presigned_url(s3_file_name)
print(f"Signed URL to access the file: {signed_url}")
if __name__ == "__main__":
main()
```
```typescript index.ts (Typescript)
import 'dotenv/config';
import { generatePresignedUrl, uploadAudioStreamToS3 } from './s3_uploader';
import { createAudioFileFromText } from './text_to_speech_file';
import { createAudioStreamFromText } from './text_to_speech_stream';
(async () => {
// save the audio file to disk
const fileName = await createAudioFileFromText(
'Today, the sky is exceptionally clear, and the sun shines brightly.'
);
console.log('File name:', fileName);
// OR stream the audio, upload to S3, and get a presigned URL
const stream = await createAudioStreamFromText(
'Today, the sky is exceptionally clear, and the sun shines brightly.'
);
const s3path = await uploadAudioStreamToS3(stream);
const presignedUrl = await generatePresignedUrl(s3path);
console.log('Presigned URL:', presignedUrl);
})();
```
## Conclusion
You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically.
Here are some examples of what you could build with this.
1. **Educational Podcasts**: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting.
2. **Accessibility Features for Websites**: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning.
3. **Automated Customer Support Messages**: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails.
4. **Audio Books and Narration**: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading.
5. **Language Learning Tools**: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way.
For more details, visit the following to see the full project files which give a clear structure for setting up your application:
For Python: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python)
For TypeScript: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node)
If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).
# Stitching multiple requests
> Learn how to maintain voice prosody over multiple chunks/generations.
When converting a large body of text into audio, you may encounter abrupt changes in prosody from one chunk to another. This can be particularly noticeable when converting text that spans multiple paragraphs or sections. In order to maintain voice prosody over multiple chunks, you can use the Request Stitching feature.
This feature allows you to provide context on what has already been generated and what will be generated in the future, helping to maintain a consistent voice and prosody throughout the entire text.
Request stitching is not available for the
`eleven_v3`
model.
Here's an example without Request Stitching:
And the same example with Request Stitching:
## How to use Request Stitching
Request Stitching is easiest when using the ElevenLabs SDKs.
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python
import os
from io import BytesIO
from elevenlabs.client import ElevenLabs
from elevenlabs.play import play
from dotenv import load_dotenv
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
paragraphs = [
"The advent of technology has transformed countless sectors, with education ",
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way ",
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, ",
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
]
request_ids = []
audio_buffers = []
for paragraph in paragraphs:
# Usually we get back a stream from the convert function, but with_raw_response is
# used to get the headers from the response
with elevenlabs.text_to_speech.with_raw_response.convert(
text=paragraph,
voice_id="T7QGPtToiqH4S8VlIkMJ",
model_id="eleven_multilingual_v2",
previous_request_ids=request_ids
) as response:
request_ids.append(response._response.headers.get("request-id"))
# response._response.headers also contains useful information like 'character-cost',
# which shows the cost of the generation in characters.
audio_data = b''.join(chunk for chunk in response.data)
audio_buffers.append(BytesIO(audio_data))
combined_stream = BytesIO(b''.join(buffer.getvalue() for buffer in audio_buffers))
play(combined_stream)
```
```typescript
import "dotenv/config";
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import { Readable } from "node:stream";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const paragraphs = [
"The advent of technology has transformed countless sectors, with education ",
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way ",
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, ",
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
];
const requestIds: string[] = [];
const audioBuffers: Buffer[] = [];
for (const paragraph of paragraphs) {
// Usually we get back a stream from the convert function, but withRawResponse() is
// used to get the headers from the response
const response = await elevenlabs.textToSpeech
.convert("T7QGPtToiqH4S8VlIkMJ", {
text: paragraph,
modelId: "eleven_multilingual_v2",
previousRequestIds: requestIds,
})
.withRawResponse();
// response.rawResponse.headers also contains useful information like 'character-cost',
// which shows the cost of the generation in characters.
requestIds.push(response.rawResponse.headers.get("request-id") ?? "");
// Convert stream to buffer
const chunks: Buffer[] = [];
for await (const chunk of response.data) {
chunks.push(Buffer.from(chunk));
}
audioBuffers.push(Buffer.concat(chunks));
}
// Create a single readable stream from all buffers
const combinedStream = Readable.from(Buffer.concat(audioBuffers));
play(combinedStream);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the combined stitched audio play.
## FAQ
In order to use the request IDs of a previous request for conditioning it needs to have processed completely. In case of streaming this means the audio has to be read completely from the response body.
The difference depends on the model, voice and voice settings used.
The request IDs should be no older than two hours.
Yes, unless you are an enterprise user with increased privacy requirements.
# Using pronunciation dictionaries
> Learn how to manage pronunciation dictionaries programmatically.
In this tutorial, you'll learn how to use a pronunciation dictionary with the ElevenLabs Python SDK. Pronunciation dictionaries are useful for controlling the specific pronunciation of words. We support both [IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and [CMU](https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary) alphabets. It is useful for correcting rare or specific pronunciations, such as names or companies. For example, the word `nginx` could be pronounced incorrectly. Instead, we can add our version of pronunciation. Based on IPA, `nginx` is pronounced as `/ˈɛndʒɪnˈɛks/`. Finding IPA or CMU of words manually can be difficult. Instead, LLMs like ChatGPT can help you to make the search easier.
We'll start by adding rules to the pronunciation dictionary from a file and comparing the text-to-speech results that use and do not use the dictionary. After that, we'll discuss how to add and remove specific rules to existing dictionaries.
If you want to jump straight to the finished repo you can find it [here](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/pronunciation-dictionaries/python)
Phoneme tags only work with `eleven_flash_v2`, `eleven_turbo_v2` & `eleven_monolingual_v1` models.
If you use phoneme tags with other models, they will silently skip the word.
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/docs/api-reference/text-to-speech#authentication)).
* Python installed on your machine
* FFMPEG to play audio
## Setup
### Installing our SDK
Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the updating pronunciation dictionary and using text-to-speech conversion. You can install it using pip:
```bash
pip install elevenlabs
```
Additionally, install `python-dotenv` to manage your environmental variables:
```bash
pip install python-dotenv
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Initiate the Client SDK
We'll start by initializing the client SDK.
```python
import os
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
```
## Create a Pronunciation Dictionary From a File
To create a pronunciation dictionary from a File, we'll create a `.pls` file for our rules.
This rule will use the "IPA" alphabet and update the pronunciation for `tomato` and `Tomato` with a different pronunciation. PLS files are case sensitive which is why we include it both with and without a capital "T". Save it as `dictionary.pls`.
```xml filename="dictionary.pls"
tomato/tə'meɪtoʊ/Tomato/tə'meɪtoʊ/
```
In the following snippet, we start by adding rules from a file and get the uploaded result. Finally, we generate and play two different text-to-speech audio to compare the custom pronunciation dictionary.
```python
import requests
from elevenlabs.play import play, PronunciationDictionaryVersionLocator
with open("dictionary.pls", "rb") as f:
# this dictionary changes how tomato is pronounced
pronunciation_dictionary = elevenlabs.pronunciation_dictionaries.create_from_file(
file=f.read(), name="example"
)
audio_1 = elevenlabs.text_to_speech.convert(
text="Without the dictionary: tomato",
voice_id="21m00Tcm4TlvDq8ikWAM",
model_id="eleven_turbo_v2",
)
audio_2 = elevenlabs.text_to_speech.convert(
text="With the dictionary: tomato",
voice_id="21m00Tcm4TlvDq8ikWAM",
model_id="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary.id,
version_id=pronunciation_dictionary.version_id,
)
],
)
# play the audio
play(audio_1)
play(audio_2)
```
## Remove Rules From a Pronunciation Dictionary
To remove rules from a pronunciation dictionary, call the `remove` method in the pronunciation dictionary module. In the following snippet, we start by removing rules based on the rule string and get the updated result. Next, we generate and play another text-to-speech audio to test the difference. In the example, we get the pronunciation dictionary version ID from the `remove` method response as every change to a pronunciation dictionary will generate a new version.
```python
pronunciation_dictionary_rules_removed = (
elevenlabs.pronunciation_dictionaries.rules.remove(
pronunciation_dictionary_id=pronunciation_dictionary.id,
rule_strings=["tomato", "Tomato"],
)
)
audio_3 = elevenlabs.generate(
text="With the rule removed: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
version_id=pronunciation_dictionary_rules_removed.version_id,
)
],
)
play(audio_3)
```
## Add Rules to Pronunciation Dictionary
We can add rules directly to the pronunciation dictionary by calling the `add` method. Next, generate and play another text-to-speech audio to test the difference.
```python
from elevenlabs import PronunciationDictionaryRule_Phoneme
pronunciation_dictionary_rules_added = elevenlabs.pronunciation_dictionaries.rules.add(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
rules=[
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="tomato",
phoneme="/tə'meɪtoʊ/",
),
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="Tomato",
phoneme="/tə'meɪtoʊ/",
),
],
)
audio_4 = elevenlabs.generate(
text="With the rule added again: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
version_id=pronunciation_dictionary_rules_added.version_id,
)
],
)
play(audio_4)
```
## Conclusion
You know how to use a pronunciation dictionary for generating text-to-speech audio. These functionalities open up opportunities to generate Text to Speech audio based on your pronunciation dictionary, making it more flexible for your use case.
For more details, visit our [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/pronunciation-dictionaries/python) to see the full project files which give a clear structure for setting up your application:
* `env.example`: Template for your environment variables.
* `main.py`: The complete code for snippets above.
* `dictionary.pls`: Custom dictionary example with XML format.
* `requirements.txt`: List of python package used for this example.
If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).
# Streaming and Caching with Supabase
> Generate and stream speech through Supabase Edge Functions. Store speech in Supabase Storage and cache responses via built-in CDN.
## Introduction
In this tutorial you will learn how to build and edge API to generate, stream, store, and cache speech using Supabase Edge Functions, Supabase Storage, and ElevenLabs.
Find the [example project on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/supabase/stream-and-cache-storage).
## Requirements
* An ElevenLabs account with an [API key](https://elevenlabs.io/app/settings/api-keys).
* A [Supabase](https://supabase.com) account (you can sign up for a free account via [database.new](https://database.new)).
* The [Supabase CLI](https://supabase.com/docs/guides/local-development) installed on your machine.
* The [Deno runtime](https://docs.deno.com/runtime/getting_started/installation/) installed on your machine and optionally [setup in your facourite IDE](https://docs.deno.com/runtime/getting_started/setup_your_environment).
## Setup
### Create a Supabase project locally
After installing the [Supabase CLI](https://supabase.com/docs/guides/local-development), run the following command to create a new Supabase project locally:
```bash
supabase init
```
### Configure the storage bucket
You can configure the Supabase CLI to automatically generate a storage bucket by adding this configuration in the `config.toml` file:
```toml ./supabase/config.toml
[storage.buckets.audio]
public = false
file_size_limit = "50MiB"
allowed_mime_types = ["audio/mp3"]
objects_path = "./audio"
```
Upon running `supabase start` this will create a new storage bucket in your local Supabase
project. Should you want to push this to your hosted Supabase project, you can run `supabase seed
buckets --linked`.
### Configure background tasks for Supabase Edge Functions
To use background tasks in Supabase Edge Functions when developing locally, you need to add the following configuration in the `config.toml` file:
```toml ./supabase/config.toml
[edge_runtime]
policy = "per_worker"
```
When running with `per_worker` policy, Function won't auto-reload on edits. You will need to
manually restart it by running `supabase functions serve`.
### Create a Supabase Edge Function for Speech generation
Create a new Edge Function by running the following command:
```bash
supabase functions new text-to-speech
```
If you're using VS Code or Cursor, select `y` when the CLI prompts "Generate VS Code settings for Deno? \[y/N]"!
### Set up the environment variables
Within the `supabase/functions` directory, create a new `.env` file and add the following variables:
```env supabase/functions/.env
# Find / create an API key at https://elevenlabs.io/app/settings/api-keys
ELEVENLABS_API_KEY=your_api_key
```
### Dependencies
The project uses a couple of dependencies:
* The [@supabase/supabase-js](https://supabase.com/docs/reference/javascript) library to interact with the Supabase database.
* The ElevenLabs [JavaScript SDK](/docs/quickstart) to interact with the text-to-speech API.
* The open-source [object-hash](https://www.npmjs.com/package/object-hash) to generate a hash from the request parameters.
Since Supabase Edge Function uses the [Deno runtime](https://deno.land/), you don't need to install the dependencies, rather you can [import](https://docs.deno.com/examples/npm/) them via the `npm:` prefix.
## Code the Supabase Edge Function
In your newly created `supabase/functions/text-to-speech/index.ts` file, add the following code:
```ts supabase/functions/text-to-speech/index.ts
// Setup type definitions for built-in Supabase Runtime APIs
import 'jsr:@supabase/functions-js/edge-runtime.d.ts';
import { createClient } from 'jsr:@supabase/supabase-js@2';
import { ElevenLabsClient } from 'npm:elevenlabs';
import * as hash from 'npm:object-hash';
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
);
const elevenlabs = new ElevenLabsClient({
apiKey: Deno.env.get('ELEVENLABS_API_KEY'),
});
// Upload audio to Supabase Storage in a background task
async function uploadAudioToStorage(stream: ReadableStream, requestHash: string) {
const { data, error } = await supabase.storage
.from('audio')
.upload(`${requestHash}.mp3`, stream, {
contentType: 'audio/mp3',
});
console.log('Storage upload result', { data, error });
}
Deno.serve(async (req) => {
// To secure your function for production, you can for example validate the request origin,
// or append a user access token and validate it with Supabase Auth.
console.log('Request origin', req.headers.get('host'));
const url = new URL(req.url);
const params = new URLSearchParams(url.search);
const text = params.get('text');
const voiceId = params.get('voiceId') ?? 'JBFqnCBsd6RMkjVDRZzb';
const requestHash = hash.MD5({ text, voiceId });
console.log('Request hash', requestHash);
// Check storage for existing audio file
const { data } = await supabase.storage.from('audio').createSignedUrl(`${requestHash}.mp3`, 60);
if (data) {
console.log('Audio file found in storage', data);
const storageRes = await fetch(data.signedUrl);
if (storageRes.ok) return storageRes;
}
if (!text) {
return new Response(JSON.stringify({ error: 'Text parameter is required' }), {
status: 400,
headers: { 'Content-Type': 'application/json' },
});
}
try {
console.log('ElevenLabs API call');
const response = await elevenlabs.textToSpeech.stream(voiceId, {
output_format: 'mp3_44100_128',
model_id: 'eleven_multilingual_v2',
text,
});
const stream = new ReadableStream({
async start(controller) {
for await (const chunk of response) {
controller.enqueue(chunk);
}
controller.close();
},
});
// Branch stream to Supabase Storage
const [browserStream, storageStream] = stream.tee();
// Upload to Supabase Storage in the background
EdgeRuntime.waitUntil(uploadAudioToStorage(storageStream, requestHash));
// Return the streaming response immediately
return new Response(browserStream, {
headers: {
'Content-Type': 'audio/mpeg',
},
});
} catch (error) {
console.log('error', { error });
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { 'Content-Type': 'application/json' },
});
}
});
```
### Code deep dive
There's a couple of things worth noting about the code. Let's step through it step by step.
To handle the incoming request, use the `Deno.serve` handler. In the demo we don't validate the request origin, but you can for example validate the request origin, or append a user access token and validate it with [Supabase Auth](https://supabase.com/docs/guides/functions/auth).
From the incoming request, the function extracts the `text` and `voiceId` parameters. The `voiceId` parameter is optional and defaults to the ElevenLabs ID for the "Allison" voice.
Using the `object-hash` library, the function generates a hash from the request parameters. This hash is used to check for existing audio files in Supabase Storage.
```ts {1,5-8}
Deno.serve(async (req) => {
// To secure your function for production, you can for example validate the request origin,
// or append a user access token and validate it with Supabase Auth.
console.log("Request origin", req.headers.get("host"));
const url = new URL(req.url);
const params = new URLSearchParams(url.search);
const text = params.get("text");
const voiceId = params.get("voiceId") ?? "JBFqnCBsd6RMkjVDRZzb";
const requestHash = hash.MD5({ text, voiceId });
console.log("Request hash", requestHash);
// ...
})
```
Supabase Storage comes with a [smart CDN built-in](https://supabase.com/docs/guides/storage/cdn/smart-cdn) allowing you to easily cache and serve your files.
Here, the function checks for an existing audio file in Supabase Storage. If the file exists, the function returns the file from Supabase Storage.
```ts {4,9}
const { data } = await supabase
.storage
.from("audio")
.createSignedUrl(`${requestHash}.mp3`, 60);
if (data) {
console.log("Audio file found in storage", data);
const storageRes = await fetch(data.signedUrl);
if (storageRes.ok) return storageRes;
}
```
Using the streaming capabilities of the ElevenLabs API, the function generates a stream. The benefit here is that even for larger text, you can start streaming the audio back to your user immediately, and then upload the stream to Supabase Storage in the background.
This allows for the best possible user experience, making even large text blocks feel magically quick. The magic here happens on line 17, where the `stream.tee()` method branches the readablestream into two branches: one for the browser and one for Supabase Storage.
```ts {1,17,20,22-27}
try {
const response = await elevenlabs.textToSpeech.stream(voiceId, {
output_format: "mp3_44100_128",
model_id: "eleven_multilingual_v2",
text,
});
const stream = new ReadableStream({
async start(controller) {
for await (const chunk of response) {
controller.enqueue(chunk);
}
controller.close();
},
});
// Branch stream to Supabase Storage
const [browserStream, storageStream] = stream.tee();
// Upload to Supabase Storage in the background
EdgeRuntime.waitUntil(uploadAudioToStorage(storageStream, requestHash));
// Return the streaming response immediately
return new Response(browserStream, {
headers: {
"Content-Type": "audio/mpeg",
},
});
} catch (error) {
console.log("error", { error });
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { "Content-Type": "application/json" },
});
}
```
The `EdgeRuntime.waitUntil` method on line 20 in the previous step is used to upload the audio stream to Supabase Storage in the background using the `uploadAudioToStorage` function. This allows the function to return the streaming response immediately to the browser, while the audio is being uploaded to Supabase Storage.
Once the storage object has been created, the next time your users makes a request with the same parameters, the function will return the audio file from the Supabase Storage CDN.
```ts {2,8-10}
// Upload audio to Supabase Storage in a background task
async function uploadAudioToStorage(
stream: ReadableStream,
requestHash: string,
) {
const { data, error } = await supabase.storage
.from("audio")
.upload(`${requestHash}.mp3`, stream, {
contentType: "audio/mp3",
});
console.log("Storage upload result", { data, error });
}
```
## Run locally
To run the function locally, run the following commands:
```bash
supabase start
```
Once the local Supabase stack is up and running, run the following command to start the function and observe the logs:
```bash
supabase functions serve
```
### Try it out
Navigate to `http://127.0.0.1:54321/functions/v1/text-to-speech?text=hello%20world` to hear the function in action.
Afterwards, navigate to `http://127.0.0.1:54323/project/default/storage/buckets/audio` to see the audio file in your local Supabase Storage bucket.
## Deploy to Supabase
If you haven't already, create a new Supabase account at [database.new](https://database.new) and link the local project to your Supabase account:
```bash
supabase link
```
Once done, run the following command to deploy the function:
```bash
supabase functions deploy
```
### Set the function secrets
Now that you have all your secrets set locally, you can run the following command to set the secrets in your Supabase project:
```bash
supabase secrets set --env-file supabase/functions/.env
```
## Test the function
The function is designed in a way that it can be used directly as a source for an `