# ElevenLabs
> ElevenLabs is an AI audio research and deployment company.
{/* Light mode wave */}
{/* Dark mode wave */}
## Most popular
Learn how to integrate ElevenLabs
Deploy voice agents in minutes
Learn how to use ElevenLabs
Dive into our API reference
## Meet the models
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
10,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Explore all](/docs/models)
## Capabilities
Text to Speech
Convert text into lifelike speech
Speech to Text
Transcribe spoken audio into text
Voice changer
Modify and transform voices
Voice isolator
Isolate voices from background noise
Dubbing
Dub audio and videos seamlessly
Sound effects
Create cinematic sound effects
Voices
Clone and design custom voices
Conversational AI
Deploy intelligent voice agents
## Product guides
Product guides
Explore our product guides for step-by-step guidance
† Excluding application & network latency
# Developer quickstart
> Learn how to make your first ElevenLabs API request.
The ElevenLabs API provides a simple interface to state-of-the-art audio [models](/docs/models) and [features](/docs/api-reference/introduction). Follow this guide to learn how to create lifelike speech with our Text to Speech API. See the [developer guides](/docs/quickstart#explore-our-developer-guides) for more examples with our other products.
## Using the Text to Speech API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
{/* This snippet was auto-generated */}
```python
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio = elevenlabs.text_to_speech.convert(
text="The first move is what sets everything in motion.",
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
play(audio)
```
```typescript
import { ElevenLabsClient, play } from '@elevenlabs/elevenlabs-js';
import 'dotenv/config';
const elevenlabs = new ElevenLabsClient();
const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
text: 'The first move is what sets everything in motion.',
modelId: 'eleven_multilingual_v2',
outputFormat: 'mp3_44100_128',
});
await play(audio);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the audio play through your speakers.
## Explore our developer guides
Now that you've made your first ElevenLabs API request, you can explore the other products that ElevenLabs offers.
Convert spoken audio into text
Deploy conversational voice agents
Clone a voice
Generate sound effects from text
Transform the voice of an audio file
Isolate background noise from audio
Generate voices from a single text prompt
Dub audio/video from one language to another
Generate time-aligned transcripts for audio
# Models
> Learn about the models that power the ElevenLabs API.
## Flagship models
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
10,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Pricing](https://elevenlabs.io/pricing/api)
## Models overview
The ElevenLabs API offers a range of audio models optimized for different use cases, quality levels, and performance requirements.
| Model ID | Description | Languages |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `eleven_v3` | Human-like and expressive speech generation | [70+ languages](/docs/models#supported-languages) |
| `eleven_ttv_v3` | Human-like and expressive voice design model (Text to Voice) | [70+ languages](/docs/models#supported-languages) |
| `eleven_multilingual_v2` | Our most lifelike model with rich emotional expression | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` |
| `eleven_flash_v2_5` | Ultra-fast model optimized for real-time use (\~75ms†) | All `eleven_multilingual_v2` languages plus: `hu`, `no`, `vi` |
| `eleven_flash_v2` | Ultra-fast model optimized for real-time use (\~75ms†) | `en` |
| `eleven_turbo_v2_5` | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`, `hu`, `no`, `vi` |
| `eleven_turbo_v2` | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms) | `en` |
| `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` |
| `eleven_multilingual_ttv_v2` | State-of-the-art multilingual voice designer model (Text to Voice) | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru` |
| `eleven_english_sts_v2` | English-only voice changer model (Speech to Speech) | `en` |
| `scribe_v1` | State-of-the-art speech recognition model | [99 languages](/docs/capabilities/speech-to-text#supported-languages) |
| `scribe_v1_experimental` | State-of-the-art speech recognition model with experimental features: improved multilingual performance, reduced hallucinations during silence, fewer audio tags, and better handling of early transcript termination | [99 languages](/docs/capabilities/speech-to-text#supported-languages) |
† Excluding application & network latency
These models are maintained for backward compatibility but are not recommended for new projects.
| Model ID | Description | Languages |
| ------------------------ | ---------------------------------------------------- | ---------------------------------------------- |
| `eleven_monolingual_v1` | First generation TTS model (outclassed by v2 models) | `en` |
| `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models) | `en`, `fr`, `de`, `hi`, `it`, `pl`, `pt`, `es` |
## Eleven v3 (alpha)
This model is currently in alpha and is subject to change. Eleven v3 is not made for real-time
applications like Conversational AI. When integrating Eleven v3 into your application, consider
generating several generations and allowing the user to select the best one.
Eleven v3 is our latest and most advanced speech synthesis model. It is a state-of-the-art model that produces natural, life-like speech with high emotional range and contextual understanding across multiple languages.
This model works well in the following scenarios:
* **Character Discussions**: Excellent for audio experiences with multiple characters that interact with each other.
* **Audiobook Production**: Perfect for long-form narration with complex emotional delivery.
* **Emotional Dialogue**: Generate natural, lifelike dialogue with high emotional range and contextual understanding.
With Eleven v3 comes a new Text to Dialogue API, which allows you to generate natural, lifelike dialogue with high emotional range and contextual understanding across multiple languages. Eleven v3 can also be used with the Text to Speech API to generate natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
Eleven v3 API access is currently not publicly available, but will be soon. To request access,
please [contact our sales team](https://elevenlabs.io/contact-sales).
Read more about the Text to Dialogue API [here](/docs/capabilities/text-to-dialogue).
### Model selection
The model can be used with the Text to Speech API by selecting the `eleven_v3` model ID. The Text to Dialogue API defaults to using the v3 model. Alternatively you can select a preview version which is formatted as `eleven_v3_preview_YYYY_MM_DD`. When a preview version has been evaluated and is ready for production, it will be promoted to the `eleven_v3` model ID. Use the evergreen `eleven_v3` model ID for the most stable experience and the preview version for the latest features.
### Supported languages
The Eleven v3 model supports 70+ languages, including:
*Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).*
## Multilingual v2
Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.
The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent.
This model excels in scenarios requiring high-quality, emotionally nuanced speech:
* **Character Voiceovers**: Ideal for gaming and animation due to its emotional range.
* **Professional Content**: Well-suited for corporate videos and e-learning materials.
* **Multilingual Projects**: Maintains consistent voice quality across language switches.
* **Stable Quality**: Produces consistent, high-quality audio output.
While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.
Our v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
## Flash v2.5
Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (\~75ms†) across 32 languages.
The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.
This model is particularly well-suited for:
* **Conversational AI**: Perfect for real-time voice agents and chatbots.
* **Interactive Applications**: Ideal for games and applications requiring immediate response.
* **Large-Scale Processing**: Efficient for bulk text-to-speech conversion.
With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages.
Flash v2.5 supports 32 languages - all languages from v2 models plus:
*Hungarian, Norwegian & Vietnamese*
† Excluding application & network latency
### Considerations
When using Flash v2.5, numbers aren't normalized in a way you might expect. For example, phone numbers might be read out in way that isn't clear for the user. Dates and currencies are affected in a similar manner.
This is expected as normalization is disabled for Flash v2.5 to maintain the low latency.
The Multilingual v2 model does a better job of normalizing numbers, so we recommend using it for phone numbers and other cases where number normalization is important.
For low-latency or Conversational AI applications, best practice is to have your LLM [normalize the text](/docs/best-practices/prompting/normalization) before passing it to the TTS model.
## Turbo v2.5
Eleven Turbo v2.5 is our high-quality, low-latency model with a good balance of quality and speed.
This model is an ideal choice for all scenarios where you'd use Flash v2.5, but where you're willing to trade off latency for higher quality voice generation.
## Model selection guide
Use `eleven_multilingual_v2`
Best for high-fidelity audio output with rich emotional expression
Use Flash models
Optimized for real-time applications (\~75ms latency)
Use either either `eleven_multilingual_v2` or `eleven_flash_v2_5`
Both support up to 32 languages
Use `eleven_turbo_v2_5`
Good balance between quality and speed
Use `eleven_multilingual_v2`
Ideal for professional content, audiobooks & video narration.
Use `eleven_flash_v2_5`, `eleven_flash_v2`, `eleven_multilingual_v2`, `eleven_turbo_v2_5` or `eleven_turbo_v2`
Perfect for real-time conversational applications
Use `eleven_multilingual_sts_v2`
Specialized for Speech-to-Speech conversion
## Character limits
The maximum number of characters supported in a single text-to-speech request varies by model.
| Model ID | Character limit | Approximate audio duration |
| ------------------------ | --------------- | -------------------------- |
| `eleven_flash_v2_5` | 40,000 | \~40 minutes |
| `eleven_flash_v2` | 30,000 | \~30 minutes |
| `eleven_turbo_v2_5` | 40,000 | \~40 minutes |
| `eleven_turbo_v2` | 30,000 | \~30 minutes |
| `eleven_multilingual_v2` | 10,000 | \~10 minutes |
| `eleven_multilingual_v1` | 10,000 | \~10 minutes |
| `eleven_english_sts_v2` | 10,000 | \~10 minutes |
| `eleven_english_sts_v1` | 10,000 | \~10 minutes |
For longer content, consider splitting the input into multiple requests.
## Scribe v1
Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging.
This model excels in scenarios requiring accurate speech-to-text conversion:
* **Transcription Services**: Perfect for converting audio/video content to text
* **Meeting Documentation**: Ideal for capturing and documenting conversations
* **Content Analysis**: Well-suited for audio content processing and analysis
* **Multilingual Recognition**: Supports accurate transcription across 99 languages
Key features:
* Accurate transcription with word-level timestamps
* Speaker diarization for multi-speaker audio
* Dynamic audio tagging for enhanced context
* Support for 99 languages
Read more about Scribe v1 [here](/docs/capabilities/speech-to-text).
## Concurrency and priority
Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue.
Speech to Text has an elevated concurrency limit.
Once the concurrency limit is met, subsequent requests are processed in a queue alongside lower-priority requests.
In practice this typically only adds \~50ms of latency.
| Plan | Concurrency Limit (Multilingual v2) | Concurrency Limit (Turbo & Flash) | STT Concurrency Limit | Priority level |
| ---------- | ----------------------------------------- | --------------------------------------- | --------------------- | -------------- |
| Free | 2 | 4 | 10 | 3 |
| Starter | 3 | 6 | 15 | 4 |
| Creator | 5 | 10 | 25 | 5 |
| Pro | 10 | 20 | 50 | 5 |
| Scale | 15 | 30 | 75 | 5 |
| Business | 15 | 30 | 75 | 5 |
| Enterprise | Elevated | Elevated | Elevated | Highest |
The response headers include `current-concurrent-requests` and `maximum-concurrent-requests` which you can use to monitor your concurrency.
How endpoint requests are made impacts concurrency limits:
* With HTTP, each request counts individually toward your concurrency limit.
* With a WebSocket, only the time where our model is generating audio counts towards your concurrency limit, this means a for most of the time an open websocket doesn't count towards your concurrency limit at all.
### Understanding concurrency limits
The concurrency limit associated with your plan should not be interpreted as the maximum number of simultaneous conversations, phone calls character voiceovers, etc that can be handled at once.
The actual number depends on several factors, including the specific AI voices used and the characteristics of the use case.
As a general rule of thumb, a concurrency limit of 5 can typically support up to approximately 100 simultaneous audio broadcasts.
This is because of the speed it takes for audio to be generated relative to the time it takes for the TTS request to be processed.
The diagram below is an example of how 4 concurrent calls with different users can be facilitated while only hitting 2 concurrent requests.
Where TTS is used to facilitate dialogue, a concurrency limit of 5 can support about 100 broadcasts for balanced conversations between AI agents and human participants.
For use cases in which the AI agent speaks less frequently than the human, such as customer support interactions, more than 100 simultaneous conversations could be supported.
Generally, more than 100 simultaneous character voiceovers can be supported for a concurrency limit of 5.
The number can vary depending on the character’s dialogue frequency, the length of pauses, and in-game actions between lines.
Concurrent dubbing streams generally follow the provided heuristic.
If the broadcast involves periods of conversational pauses (e.g. because of a soundtrack, visual scenes, etc), more simultaneous dubbing streams than the suggestion may be possible.
If you exceed your plan's concurrency limits at any point and you are on the Enterprise plan, model requests may still succeed, albeit slower, on a best efforts basis depending on available capacity.
To increase your concurrency limit & queue priority, [upgrade your subscription
plan](https://elevenlabs.io/pricing/api).
Enterprise customers can request a higher concurrency limit by contacting their account manager.
# June 23, 2025
### Tools migration
- **Conversational AI tools migration**: The way tools in Conversational AI are handled is being migrated, please see the guide here to understand [what's changing and how to migrate](/docs/conversational-ai/customization/tools/agent-tools-deprecation)
### Text to Speech
- **Audio tags automatic removal**: Audio tags are now automatically removed when switching from V3 to V2 models, ensuring optimal compatibility and performance.
### Conversational AI
- **Tools management UI**: Added a new comprehensive [tools management interface](/app/conversational-ai/tools) for creating, configuring, and managing tools across all agents in your workspace.
- **Streamlined agent creation**: Introduced a new [agent creation flow](/app/conversational-ai/new) with improved user experience and better configuration options.
- **Agent duplication**: Added the ability to [duplicate existing agents](/docs/api-reference/agents/duplicate), allowing you to quickly create variations of successful agent configurations.
### SIP Trunking
- **Inbound media encryption**: Added support for configurable [inbound media encryption settings](/docs/conversational-ai/phone-numbers/sip-trunking#configure-transport-and-encryption) for SIP trunk phone numbers, enhancing security options.
### Voices
- **Famous voice category**: Added a new "famous" voice category to the voice library, expanding the available voice options for users.
### Dubbing
- **CSV frame rate control**: Added `csv_fps` parameter to control frame rate when parsing CSV files for dubbing projects, providing more precise timing control.
## SDKs
- **ElevenLabs JavaScript SDK v2.4.0**: Released with new Conversational AI SDK support for Node.js. [View release notes](https://github.com/elevenlabs/elevenlabs-js/releases)
- **ElevenLabs Python SDK v2.5.0**: Updated with enhanced Conversational AI capabilities. [View release notes](https://github.com/elevenlabs/elevenlabs-python/releases)
### API
## New Endpoints
### Conversational AI
- [Duplicate agent](/docs/api-reference/agents/duplicate) - Create a new agent by duplicating an existing one
- [Create tool](/docs/api-reference/tools/create) - Add a new tool to the available tools in the workspace
- [List tools](/docs/api-reference/tools/list) - Retrieve all tools available in the workspace
- [Get tool](/docs/api-reference/tools/get) - Retrieve a specific tool configuration
- [Update tool](/docs/api-reference/tools/update) - Update an existing tool configuration
- [Delete tool](/docs/api-reference/tools/delete) - Remove a tool from the workspace
- [Get tool dependent agents](/docs/api-reference/tools/get-dependent-agents) - List all agents that depend on a specific tool
## Updated Endpoints
### Conversational AI
- **Agent configuration**:
- Added `built_in_tools` configuration for system tools management
- Deprecated inline `tools` configuration in favor of `tool_ids` for better tool management
- **Tool system**:
- Refactored tool configuration structure to use centralized tool management
### Dubbing
- **CSV processing**:
- [Create dubbing project](/docs/api-reference/dubbing/create) - Added `csv_fps` parameter for custom frame rate control
### SIP Trunking
- **Phone number creation**:
- [Create SIP trunk phone number](/docs/api-reference/phone-numbers/create) - Added `inbound_media_encryption` parameter for security configuration
### Voice Library
- **Voice categories**:
- Updated voice response models to include "famous" as a new voice category option
- Enhanced voice search and filtering capabilities
# June 17, 2025
### Conversational AI
- **Dynamic variables in simulated conversations**: Added support for [dynamic variable population in simulated conversations](/docs/api-reference/agents/simulate-conversation#request.body.simulation_specification.simulated_user_config.dynamic_variables), enabling more flexible and context-aware conversation testing scenarios.
- **MCP server integration**: Introduced comprehensive support for [Model Context Protocol (MCP) servers](/docs/conversational-ai/customization/mcp), allowing agents to connect to external tools and services through standardized protocols with configurable approval policies.
- **Burst pricing for extra concurrency**: Added [bursting capability](/docs/conversational-ai/guides/burst-pricing) for workspace call limits, automatically allowing up to 3x the configured concurrency limit during peak usage for overflow capacity.
### Studio
- **JSON content initialization**: Added support for initializing Studio projects with structured JSON content through the `from_content_json` parameter, enabling programmatic project creation with predefined chapters, blocks, and voice configurations.
### Workspaces
- **Webhook management**: Introduced workspace-level webhook management capabilities, allowing administrators to view, configure, and monitor webhook integrations across the entire workspace with detailed usage tracking and failure diagnostics.
### API
## New Endpoints
### Conversational AI - MCP Servers
- [Create MCP server](/docs/api-reference/mcp/create) - Create a new MCP server configuration in the workspace
- [List MCP servers](/docs/api-reference/mcp/list) - Retrieve all MCP server configurations available in the workspace
- [Get MCP server](/docs/api-reference/mcp/get) - Retrieve a specific MCP server configuration from the workspace
- [Update MCP server approval policy](/docs/api-reference/mcp/approval-policies/update) - Update the approval policy configuration for an MCP server
- [Create MCP server tool approval](/docs/api-reference/mcp/approval-policies/create) - Add approval for a specific MCP tool when using per-tool approval mode
- [Delete MCP server tool approval](/docs/api-reference/mcp/approval-policies/delete) - Remove approval for a specific MCP tool when using per-tool approval mode
### Workspace
- [Get workspace webhooks](/docs/api-reference/webhooks/list) - Retrieve all webhook configurations for the workspace with optional usage information
## Updated Endpoints
### Conversational AI
- **Agent simulation**:
- [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Added `dynamic_variables` parameter for populating conversation context with runtime values
- [Simulate conversation stream](/docs/api-reference/agents/simulate-conversation-stream) - Added `dynamic_variables` parameter for streaming conversation simulations
- **Agent configuration**:
- [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.call_limits) - Added `bursting_enabled` parameter to control burst pricing for call limits
- **WebSocket events**:
- Enhanced `ClientEvent` enum to include `mcp_connection_status` for real-time MCP server monitoring
- **Conversation charging**:
- Added `is_burst` indicator to conversation metadata for tracking burst pricing usage
### Studio
- [Create Studio project](/docs/api-reference/studio/add-project#request.body.from_content_json.from_content_json) - Added `from_content_json` parameter for JSON-based project setup
### User Management
- **User profile**:
- [Get user](/docs/api-reference/user/get) - Deprecated `can_use_delayed_payment_methods` field in user response model
### Subscription Management
- **Subscription status**:
- Removed `canceled` and `unpaid` from available subscription status types, streamlining subscription state management
# June 8, 2025
### Text to Speech
- **Eleven v3 (alpha)**: Released Eleven v3 (alpha), our most expressive Text to Speech model, as a research preview.
### Conversational AI
- **Custom voice settings in multi-voice**: Added support for configuring individual [voice settings per supported voice](/docs/conversational-ai/customization/voice/multi-voice-support) in multi-voice agents, allowing fine-tuned control over stability, speed, similarity boost, and streaming latency for each voice.
- **Silent transfer to human in Twilio**: Added backend configuration support for silent (cold) [transfer to human](/docs/conversational-ai/customization/tools/system-tools/transfer-to-human) in the Twilio native integration, enabling seamless handoff without announcing the transfer to callers.
- **Batch calling retry and cancel**: Added support for retrying outbound calls to phone numbers that did not respond during a [batch call](/docs/conversational-ai/phone-numbers/batch-calls), along with the ability to cancel ongoing batch operations for better campaign management.
- **LLM pinning**: Added support for [versioned LLM models with explicit checkpoint identifiers](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm)
- **Custom LLM headers**: Added support for passing [custom headers to custom LLMs](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.custom_llm.request_headers)
- **Fixed issue in non-latin languages**: Fixed an issue causing some conversations in non latin alphabet languages to fail.
### SDKs
- **Python SDK v2.3.0**: Released [Python SDK v2.3.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.3.0)
- **JavaScript SDK v2.2.0**: Released [JavaScript SDK v2.2.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.2.0)
### API
## New Endpoints
### Conversational AI
- **Batch Calling**:
- [Cancel batch call](/docs/api-reference/batch-calling/cancel) - Cancel a running batch call and set all recipients to cancelled status
- [Retry batch call](/docs/api-reference/batch-calling/retry) - Retry a batch call by setting completed recipients back to pending status
- **Knowledge Base RAG**:
- [Get document RAG indexes](/docs/api-reference/knowledge-base/get-document-rag-indexes) - Get information about all RAG indexes of a knowledge base document
- [Delete document RAG index](/docs/api-reference/knowledge-base/delete-document-rag-index) - Delete a specific RAG index for a knowledge base document
- [RAG index overview](/docs/api-reference/knowledge-base/rag-index-overview) - Get total size and information of RAG indexes used by knowledge base documents
## Updated Endpoints
### Conversational AI
- **Supported Voices**:
- [Agent configuration](/docs/api-reference/agents/update#request.body.tts.supported_voices) - Added `optimize_streaming_latency`, `stability`, `speed`, and `similarity_boost` parameters for per-voice TTS customization
- **Transfer to Human**:
- [Agent configuration](/docs/api-reference/agents/update#request.body.system_tools.transfer_to_number) - Added `enable_client_message` parameter to control whether a message is played to the client during transfer
- **Knowledge Base**:
- Knowledge base documents now use `supported_usages` instead of `prompt_injectable` for better usage mode control
- RAG index creation now returns enhanced response model with usage information
- **Custom LLM**:
- [Agent configuration](/docs/api-reference/agents/update#request.body.llm.custom_llm) - Added `request_headers` parameter for custom header configuration
- **Widget Configuration**:
- [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.widget_config) - Added comprehensive `styles` configuration for widget appearance customization
- **LLM**:
- Added support for [versioned LLM models](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) with explicit version identifiers
# June 1, 2025
### Conversational AI
- **Multi-voice support for agents**: Enable conversational AI agents to [dynamically switch between different voices](docs/conversational-ai/customization/voice/multi-voice-support) during conversations for multi-character storytelling, language tutoring, and role-playing scenarios.
- **Claude Sonnet 4 support**: Added [Claude Sonnet 4 as a new LLM option](/docs/conversational-ai/customization/llm#anthropic) for conversational agents, providing enhanced reasoning capabilities and improved performance.
- **Genesys Cloud integration**: Introduced AudioHook Protocol integration for seamless connection with [Genesys Cloud contact center platform](/docs/conversational-ai/phone-numbers/c-caa-s-integrations/genesys).
- **Force delete knowledge base documents**: Added [`force` parameter](/docs/api-reference/knowledge-base/delete#request.query.force.force) to knowledge base document deletion, allowing removal of documents even when used by agents.
- **Multimodal widget**: Added text input and text-only mode defaults for better user experience with [improved widget configuration](/docs/conversational-ai/customization/widget).
### API
## Updated Endpoints
### Speech to Text
- [Create transcript](/docs/api-reference/speech-to-text/convert) - Added `webhook` parameter for asynchronous processing with webhook delivery
### Conversational AI
- **Knowledge Base**:
- [Delete knowledge base document](/docs/api-reference/knowledge-base/delete) - Added `force` query parameter to delete documents regardless of agent dependencies
- **Widget**:
- [Widget configuration](/docs/api-reference/widget/get#response.body.widget_config.supports_text_only) - Added text input and text-only mode support for multi-modality
# May 26, 2025
### Forced Aligment
- **Forced alignment improvements**: Fixed a rare failure case in forced alignment processing to improve reliability.
### Voices
- **Live moderated voices filter**: Added `include_live_moderated` query parameter to the shared voices endpoint, allowing you to include or exclude voices that are live moderated.
### Conversational AI
- **Secret dynamic variables**: Added support for specifying dynamic variables as secrets with the `secret__` prefix. Secret dynamic variables can only be used in webhook tool headers and are never sent to an LLM, enhancing security for sensitive data. [Learn more](/docs/conversational-ai/customization/personalization/dynamic-variables#secret-dynamic-variables).
- **Skip turn system tool**: Introduced a new system tool called **skip_turn**. When enabled, the agent will skip its turn if the user explicitly indicates they need a moment to think or perform an action (e.g., "just a sec", "give me a minute"). This prevents turn timeout from being triggered during intentional user pauses. See the [skip turn tool docs](/docs/conversational-ai/customization/tools/system-tools/skip-turn) for more information.
- **Text input support**: Added text input support in websocket connections via "user_message" event with text field. Also added "user_activity" event support to indicate typing or other UI activity, improving agent turn-taking when there's interleaved text and audio input.
- **RAG chunk limit**: Added ability to configure the [maximum number of chunks](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.rag.max_retrieved_rag_chunks_count) collected during RAG retrieval, giving users
more control over context window usage and costs.
- **Enhanced widget configuration**: Expanded widget customization options to include [text input and text only mode](/docs/api-reference/widget/get#response.body.widget_config.text_only).
- **LLM usage calculator**: Introduced tools to calculate expected LLM token usage and costs for agents, helping with cost estimation and planning.
### Audio Native
- **Accessibility improvements**: Enhanced accessibility for the AudioNative player with multiple improvements:
- Added aria-labels for all buttons
- Enabled keyboard navigation for all interactive elements
- Made progress bar handle focusable and keyboard-accessible
- Improved focus indicator visibility for better screen reader compatibility
### API
## New Endpoints
- Added 3 new endpoints:
- [Get Agent Knowledge Base Size](/docs/conversational-ai/api-reference/knowledge-base/size) - Returns the number of pages in the agent's knowledge base.
- [Calculate Agent LLM Usage](/docs/conversational-ai/api-reference/llm-usage/calculate) - Calculates expected number of LLM tokens needed for the specified agent.
- [Calculate LLM Usage](/docs/conversational-ai/api-reference/llm-usage/calculate) - Returns a list of LLM models and the expected cost for using them based on the provided values.
## Updated Endpoints
### Voices
- [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_live_moderated` query parameter to `GET /v1/shared-voices` to filter voices by live moderation status.
### Conversational AI
- **Agent Configuration**:
- Enhanced system tools with new `skip_turn` tool configuration
- Improved RAG configuration with `max_retrieved_rag_chunks_count` parameter
- **Widget Configuration**:
- Added support for text-only mode
- **Batch Calling**:
- Batch call responses now include `phone_provider` field with default value "twilio"
### Text to Speech
- **Voice Settings**:
- Added `quality` parameter to voice settings for controlling audio generation quality
- Model response schema updated to include `can_use_quality` field
# May 19, 2025
### SDKs
- **SDKs V2**: Released new v2 SDKs for both [Python](https://github.com/elevenlabs/elevenlabs-python) and [JavaScript](https://github.com/elevenlabs/elevenlabs-js)
### Speech to Text
- **Speech to text logprobs**: The Speech to Text response now includes a `logprob` field for word prediction confidence.
### Billing
- **Improved API error messages**: Enhanced API error messages for subscriptions with failed payments. This provides clearer information if a failed payment has caused a user to reach their quota threshold sooner than expected.
### Conversational AI
- **Batch calls**: Released new batch calling functionality, which allows you to [automate groups of outbound calls](/docs/conversational-ai/phone-numbers/batch-calls).
- **Increased evaluation criteria limit**: The maximum number of evaluation criteria for agent performance evaluation has been increased from 5 to 10.
- **Human-readable IDs**: Introduced human-readable IDs for key Conversational AI entities (e.g., agents, conversations). This improves usability and makes resources easier to identify and manage through the API and UI.
- **Unanswered call tracking**: 'Not Answered' outbound calls are now reliably detected and visible in the conversation history.
- **LLM cost visibility in dashboard**: The Conversational AI dashboard now displays the total and per-minute average LLM costs.
- **Zero retention mode (ZRM) for agents**: Allowed enabling Zero Retention Mode (ZRM) [per agent](/docs/conversational-ai/customization/privacy/zero-retention-mode).
- **Dynamic variables in headers**: Added option of setting dynamic variable as a [header value for tools](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.webhook.api_schema.request_headers.Conv-AI-Dynamic-Variable)
- **Customisable tool timeouts**: Shipped setting different [timeout durations per tool](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.client.response_timeout_secs).
### Workspaces
- **Simplified secret updates**: Workspace secrets can now be updated more granularly using a `PATCH` request via the API, simplifying the management of individual secret values. For technical details, please see the API changes section below.
### API
## New Endpoints
- Added 6 new endpoints:
- [Get Signed Url](/docs/conversational-ai/api-reference/conversations/get-signed-url) - Get a signed URL to start a conversation with an agent that requires authorization.
- [Simulate Conversation](/docs/conversational-ai/api-reference/agents/simulate-conversation) - Run a conversation between an agent and a simulated user.
- [Simulate Conversation (Stream)](/docs/conversational-ai/api-reference/agents/simulate-conversation-stream) - Run and stream a conversation simulation between an agent and a simulated user.
- [Update Convai Workspace Secret](/docs/conversational-ai/api-reference/workspace/secrets/update-secret) - Update an existing secret for the Convai workspace.
- [Submit Batch Call Request](/docs/conversational-ai/api-reference/batch-calling/create) - Submit a batch call request to schedule calls for multiple recipients.
- [Get All Batch Calls for Workspace](/docs/conversational-ai/api-reference/batch-calling/list) - Retrieve all batch calls for the current workspace.
## Updated Endpoints
### Conversational AI
- **Agents & Conversations**:
- Endpoint `GET /v1/convai/conversation/get_signed_url` (snake_case path) has been deprecated. Use the new `GET /v1/convai/conversation/get-signed-url` (kebab-case path) instead.
- **Phone Numbers**:
- [Get Phone Number Details](/docs/conversational-ai/api-reference/phone-numbers/get) - Response schema for `GET /v1/convai/phone-numbers/{phone_number_id}` updated to distinct `Twilio` and `SIPTrunk` provider details.
- [Update Phone Number](/docs/conversational-ai/api-reference/phone-numbers/update) - Response schema for `PATCH /v1/convai/phone-numbers/{phone_number_id}` updated similarly for `Twilio` and `SIPTrunk`.
- [List Phone Numbers](/docs/conversational-ai/api-reference/phone-numbers/list) - Response schema for `GET /v1/convai/phone-numbers/` list items updated for `Twilio` and `SIPTrunk` providers.
### Text To Speech
- [Text to Speech Endpoints](/docs/api-reference/text-to-speech) - Default `model_id` changed from `eleven_monolingual_v1` to `eleven_multilingual_v2` for the following endpoints:
- `POST /v1/text-to-speech/{voice_id}/stream`
- `POST /v1/text-to-speech/{voice_id}/stream-with-timestamps`
- `POST /v1/text-to-speech/{voice_id}`
- `POST /v1/text-to-speech/{voice_id}/with-timestamps`
### Voices
- [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_custom_rates` query parameter to `GET /v1/shared-voices`.
- **Schema Updates**:
- `LibraryVoiceResponseModel` and `VoiceSharingResponseModel` now include an optional `fiat_rate` field (USD per 1000 credits).
# May 12, 2025
### Billing
- **Downgraded Plan Pricing Fix**: Fixed an issue where customers with downgraded subscriptions were shown their current price instead of the correct future price.
### Conversational AI
- **Edit Knowledge Base Document Names**: You can now edit the names of knowledge base documents.
See: [Knowledge Base](/docs/conversational-ai/customization/knowledge-base)
- **Conversation Simulation**: Released a [new endpoint](/docs/conversational-ai/api-reference/agents/simulate-conversation) that allows you to test an agent over text
### Studio
- **Export Paragraphs as Zip**: Added support for exporting separated paragraphs in a zip file.
See: [Studio](/docs/product-guides/products/studio)
### SDKs
- **Released new SDKs**:
- [ElevenLabs Python v1.58.1](https://github.com/elevenlabs/elevenlabs-python)
- [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js)
### API
#### New Endpoints
- [Update metadata for a speaker](/docs/api-reference/dubbing)
`PATCH /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}`
Amend the metadata associated with a speaker, such as their voice. Both voice cloning and using voices from the ElevenLabs library are supported.
- [Search similar voices for a speaker](/docs/api-reference/dubbing)
`GET /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}/similar-voices`
Fetch the top 10 similar voices to a speaker, including IDs, names, descriptions, and sample audio.
- [Simulate a conversation](/docs/api-reference/agents/simulate-conversation)
`POST /v1/convai/agents/{agent_id}/simulate_conversation`
Run a conversation between the agent and a simulated user.
- [Simulate a conversation (stream)](/docs/api-reference/agents/simulate-conversation-stream)
`POST /v1/convai/agents/{agent_id}/simulate_conversation/stream`
Stream a simulated conversation between the agent and a simulated user.
- [Handle outbound call via SIP trunk](/docs/api-reference/sip-trunk/outbound-call)
`POST /v1/convai/sip-trunk/outbound-call`
Initiate an outbound call using SIP trunking.
#### Updated Endpoints
- [List conversations](/docs/api-reference/conversations/get-conversations)
`GET /v1/convai/conversations`
Added `call_start_after_unix` query parameter to filter conversations by start date.
- [Update knowledge base document](/docs/api-reference/knowledge-base/update-knowledge-base-document)
`PATCH /v1/convai/knowledge-base/{documentation_id}`
Now supports updating the name of a document.
- [Text to Speech endpoints](/docs/api-reference/text-to-speech)
The default model for all TTS endpoints is now `eleven_multilingual_v2` (was `eleven_monolingual_v1`).
#### Removed Endpoints
- None.
# May 5, 2025
### Dubbing
- **Disable Voice Cloning**: Added an option in the [Dubbing Studio UI](https://elevenlabs.io/app/dubbing) to disable voice cloning when uploading audio, aligning with the existing `disable_voice_cloning` API parameter.
### Billing
- **Quota Exceeded Error**: Improved error messaging for exceeding character limits. Users attempting to generate audio beyond their quota within a short billing window will now receive a clearer `401 unauthorized: This request exceeds your quota limit of...` error message indicating the limit has been exceeded.
## SDKs
- **Released new SDKs**: Added [ElevenLabs Python v1.58.0](https://github.com/elevenlabs/elevenlabs-python) and [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js) to fix a breaking change that had been mistakenly shipped
# April 28, 2025
### Conversational AI
- **Custom Dashboard Charts**: The Conversational AI Dashboard can now be extended with custom charts displaying the results of evaluation criteria over time. See the new [GET](/docs/api-reference/workspace/dashboard/get) and [PATCH](/docs/api-reference/workspace/dashboard/update) endpoints for managing dashboard settings.
- **Call History Filtering**: Added the ability to filter the call history by start date using the new `call_start_before_unix` parameter in the [List Conversations](/docs/conversational-ai/api-reference/conversations/get-conversations#request.query.call_start_before_unix) endpoint. [Try it here](https://elevenlabs.io/app/conversational-ai/history).
- **Server Tools**: Added option of making PUT requests in [server tools](/docs/conversational-ai/customization/tools/server-tools)
- **Transfer to human**: Added call forwarding functionality to support forwarding to operators, see docs [here](/docs/conversational-ai/customization/tools/system-tools/transfer-to-human)
- **Language detection**: Fixed an issue where the [language detection system tool](/docs/conversational-ai/customization/tools/system-tools/language-detection) would trigger on a user replying yes in non-English language.
### Usage Analytics
- **Custom Aggregation**: Added an optional `aggregation_interval` parameter to the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint to control the interval over which to aggregate character usage (hour, day, week, month, or cumulative).
- **New Metric Breakdowns**: The Usage Analytics section now supports additional metric breakdowns including `minutes_used`, `request_count`, `ttfb_avg`, and `ttfb_p95`, selectable via the new `metric` parameter in the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint. Furthermore, you can now get a breakdown and filter by `request_queue`.
### API
## New Endpoints
- Added 2 new endpoints for managing Conversational AI dashboard settings:
- [Get Dashboard Settings](/docs/api-reference/workspace/dashboard/get) - Retrieves custom chart configurations for the ConvAI dashboard.
- [Update Dashboard Settings](/docs/api-reference/workspace/dashboard/update) - Updates custom chart configurations for the ConvAI dashboard.
## Updated Endpoints
### Audio Generation (TTS, S2S, SFX, Voice Design)
- Updated endpoints to support new `output_format` option `pcm_48000`:
- [Text to Speech](/docs/api-reference/text-to-speech/convert) (`POST /v1/text-to-speech/{voice_id}`)
- [Text to Speech with Timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/with-timestamps`)
- [Text to Speech Stream](/docs/api-reference/text-to-speech/convert-as-stream) (`POST /v1/text-to-speech/{voice_id}/stream`)
- [Text to Speech Stream with Timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/stream/with-timestamps`)
- [Speech to Speech](/docs/api-reference/speech-to-speech/convert) (`POST /v1/speech-to-speech/{voice_id}`)
- [Speech to Speech Stream](/docs/api-reference/speech-to-speech/stream) (`POST /v1/speech-to-speech/{voice_id}/stream`)
- [Sound Generation](/docs/api-reference/text-to-sound-effects/convert) (`POST /v1/sound-generation`)
- [Create Voice Previews](/docs/api-reference/legacy/voices/create-previews) (`POST /v1/text-to-voice/create-previews`)
### Usage Analytics
- Updated usage metrics endpoint:
- [Get Usage Metrics](/docs/api-reference/usage/get) (`GET /v1/usage/character-stats`) - Added optional `aggregation_interval` and `metric` query parameters.
### Conversational AI
- Updated conversation listing endpoint:
- [List Conversations](/docs/conversational-ai/api-reference/conversations/get-conversations#request.query.call_start_before_unix) (`GET /v1/convai/conversations`) - Added optional `call_start_before_unix` query parameter for filtering by start date.
## Schema Changes
### Conversational AI
- Added detailed LLM usage and pricing information to conversation [charging and history models](/docs/conversational-ai/api-reference/conversations/get-conversation#response.body.metadata.charging).
- Added `tool_latency_secs` to [tool result schemas](/docs/api-reference/conversations/get-conversation#response.body.transcript.tool_results.tool_latency_secs)
- Added `access_info` to [`GET /v1/convai/agents/{agent_id}`](/docs/api-reference/agents/get#response.body.access_info)
# April 21, 2025
### Professional Voice Cloning (PVC)
- **PVC API**: Introduced a comprehensive suite of API endpoints for managing Professional Voice Clones (PVC). You can now programmatically create voices, add/manage/delete audio samples, retrieve audio/waveforms, manage speaker separation, handle verification, and initiate training. For a full list of new endpoints check the API changes summary below or read the PVC API reference [here](/docs/api-reference/voices/pvc/create).
### Speech to Text
- **Enhanced Export Options**: Added options to include or exclude timestamps and speaker IDs when exporting Speech to Text results in segmented JSON format via the API.
### Conversational AI
- **New LLM Models**: Added support for new GPT-4.1 models: `gpt-4.1`, `gpt-4.1-mini`, and `gpt-4.1-nano` [here](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm)
- **VAD Score**: Added a new client event which sends VAD scores to the client, see reference [here](/docs/conversational-ai/customization/events/client-events#vad_score)
### Workspace
- **Member Management**: Added a new API endpoint to allow administrators to delete workspace members [here](/docs/api-reference/workspace/delete-member)
### API
## New Endpoints
- Added 16 new endpoints:
- [Delete Member](/docs/api-reference/workspace/delete-member) - Allows deleting workspace members.
- [Create PVC Voice](/docs/api-reference/voices/pvc/create) - Creates a new PVC voice.
- [Edit PVC Voice](/docs/api-reference/voices/pvc/update) - Edits PVC voice metadata.
- [Add Samples To PVC Voice](/docs/api-reference/voices/pvc/samples/create) - Adds audio samples to a PVC voice.
- [Update PVC Voice Sample](/docs/api-reference/voices/pvc/samples/update) - Updates a PVC voice sample (noise removal, speaker selection, trimming).
- [Delete PVC Voice Sample](/docs/api-reference/voices/pvc/samples/delete) - Deletes a sample from a PVC voice.
- [Retrieve Voice Sample Audio](/docs/api-reference/voices/pvc/samples/get-audio) - Retrieves audio for a PVC voice sample.
- [Retrieve Voice Sample Visual Waveform](/docs/api-reference/voices/pvc/samples/get-waveform) - Retrieves the visual waveform for a PVC voice sample.
- [Retrieve Speaker Separation Status](/docs/api-reference/voices/pvc/samples/get-speaker-separation-status) - Gets the status of speaker separation for a sample.
- [Start Speaker Separation](/docs/api-reference/voices/pvc/samples/separate-speakers) - Initiates speaker separation for a sample.
- [Retrieve Separated Speaker Audio](/docs/api-reference/voices/pvc/samples/get-separated-speaker-audio) - Retrieves audio for a specific separated speaker.
- [Get PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha) - Gets the captcha for PVC voice verification.
- [Verify PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha/verify) - Submits captcha verification for a PVC voice.
- [Run PVC Training](/docs/api-reference/voices/pvc/train) - Starts the training process for a PVC voice.
- [Request Manual Verification](/docs/api-reference/voices/pvc/verification/request) - Requests manual verification for a PVC voice.
## Updated Endpoints
### Speech to Text
- Updated endpoint with changes:
- [Create Forced Alignment Task](/docs/api-reference/forced-alignment/create#request.body.enabled_spooled_file) - Added `enabled_spooled_file` parameter to allow streaming large files (`POST /v1/forced-alignment`).
## Schema Changes
### Conversational AI
- `GET conversation details`: Added `has_audio`, `has_user_audio`, `has_response_audio` boolean fields [here](/docs/api-reference/conversations/get-conversation#response.body.has_audio)
### Dubbing
- `GET dubbing resource `: Added `status` field to each render [here](/docs/api-reference/dubbing/get-dubbing-resource#response.body.renders.status)
# April 14, 2025
### Voices
- **New PVC flow**: Added new flow for Professional Voice Clone creation, try it out [here](https://elevenlabs.io/app/voice-lab?action=create&creationType=professionalVoiceClone)
### Conversational AI
- **Agent-agent transfer:** Added support for agent-to-agent transfers via a new system tool, enabling more complex conversational flows. See the [Agent Transfer tool documentation](/docs/conversational-ai/customization/tools/system-tools/agent-transfer) for details.
- **Enhanced tool debugging:** Improved how tool execution details are displayed in the conversation history for easier debugging.
- **Language detection fix:** Resolved an issue regarding the forced calling of the language detection tool.
### Dubbing
- **Render endpoint:** Introduced a new endpoint to regenerate audio or video renders for specific languages within a dubbing project. This automatically handles missing transcriptions or translations. See the [Render Dub endpoint](/docs/api-reference/dubbing/render-dub).
- **Increased size limit:** Raised the maximum allowed file size for dubbing projects to 1 GiB.
### API
## New Endpoints
- [Added render dub endpoint](/docs/api-reference/dubbing/render-dub) - Regenerate dubs for a specific language.
## Updated Endpoints
### Pronunciation Dictionaries
- Updated the response for the [`GET /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/`](/docs/api-reference/pronunciation-dictionary/get#response.body.permission_on_resource) endpoint and related components to include the `permission_on_resource` field.
### Speech to Text
- Updated [Speech to Text endpoint](/docs/api-reference/speech-to-text/convert) (`POST /v1/speech-to-text`):
- Added `cloud_storage_url` parameter to allow transcription directly from public S3 or GCS URLs (up to 2GB).
- Made the `file` parameter optional; exactly one of `file` or `cloud_storage_url` must now be provided.
### Speech to Speech
- Added optional `file_format` parameter (`pcm_s16le_16` or `other`) for lower latency with PCM input to [`POST /v1/speech-to-speech/{voice_id}`](/docs/api-reference/speech-to-speech/convert)
### Conversational AI
- Updated components to support [agent-agent transfer](/docs/conversational-ai/customization/tools/system-tools/agent-transfer) tool
### Voices
- Updated [`GET /v1/voices/{voice_id}`](/docs/api-reference/voices/get#response.body.samples.trim_start) `samples` field to include optional `trim_start` and `trim_end` parameters.
### AudioNative
- Updated [`Get /v1/audio-native/{project_id}/settings`](/docs/api-reference/audio-native/get-settings#response.body.settings.status) to include `status` field (`processing` or `ready`).
# April 7, 2025
## Speech to text
- **`scribe_v1_experimental`**: Launched a new experimental preview of the [Scribe v1 model](/docs/capabilities/speech-to-text) with improvements including improved performance on audio files with multiple languages, reduced hallucinations when audio is interleaved with silence, and improved audio tags. The new model is available via the API under the model name [`scribe_v1_experimental`](/docs/api-reference/speech-to-text/convert#request.body.model_id)
### Text to speech
- **A-law format support**: Added [a-law format](/docs/api-reference/text-to-speech/convert#request.query.output_format) with 8kHz sample rate to enable integration with European telephony systems.
- **Fixed quota issues**: Fixed a database bug that caused some requests to be mistakenly rejected as exceeding their quota.
### Conversational AI
- **Document type filtering**: Added support for filtering knowledge base documents by their [type](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.types) (file, URL, or text).
- **Non-audio agents**: Added support for conversational agents that don't output audio but still send response transcripts and can use tools. Non-audio agents can be enabled by removing the audio [client event](/docs/conversational-ai/customization/events/client-events).
- **Improved agent templates**: Updated all agent templates with enhanced configurations and prompts. See more about how to improve system prompts [here](/docs/conversational-ai/best-practices/prompting-guide).
- **Fixed stuck exports**: Fixed an issue that caused exports to be stuck for extended periods.
### Studio
- **Fixed volume normalization**: Fixed issue with streaming project snapshots when volume normalization is enabled.
### New API endpoints
- **Forced alignment**: Added new [forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text, perfect for subtitle generation.
- **Batch calling**: Added batch calling [endpoint](/docs/conversational-ai/api-reference/batch-calling/create) for scheduling calls to multiple recipients
### API
## New Endpoints
- Added [Forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text
- Added dedicated endpoints for knowledge base document types:
- [Create text document](/docs/api-reference/knowledge-base/create-from-text)
- [Create file document](/docs/api-reference/knowledge-base/create-from-file)
- [Create URL document](/docs/api-reference/knowledge-base/create-from-url)
## Updated Endpoints
### Text to Speech
- Added a-law format (8kHz) to all audio endpoints:
- [Text to speech](/docs/api-reference/text-to-speech/convert)
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream)
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps)
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps)
- [Speech to speech](/docs/api-reference/speech-to-speech)
- [Stream speech to speech](/docs/api-reference/speech-to-speech/stream)
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews)
- [Sound generation](/docs/api-reference/sound-generation)
### Voices
- [Get voices](/docs/api-reference/voices/search) - Added `collection_id` parameter for filtering voices by collection
### Knowledge Base
- [Get knowledge base](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Added `types` parameter for filtering documents by type
- General endpoint for creating knowledge base documents marked as deprecated in favor of specialized endpoints
### User Subscription
- [Get user subscription](/docs/api-reference/user/subscription/get) - Added `professional_voice_slots_used` property to track number of professional voices used in a workspace
### Conversational AI
- Added `silence_end_call_timeout` parameter to set maximum wait time before terminating a call
- Removed `/v1/convai/agents/{agent_id}/add-secret` endpoint (now handled by workspace secrets endpoints)
# March 31, 2025
### Text to speech
- **Opus format support**: Added support for Opus format with 48kHz sample rate across multiple bitrates (32-192 kbps).
- **Improved websocket error handling**: Updated TTS websocket API to return more accurate error codes (1011 for internal errors instead of 1008) for better error identification and SLA monitoring.
### Conversational AI
- **Twilio outbound**: Added ability to natively run outbound calls.
- **Post-call webhook override**: Added ability to override post-call webhook settings at the agent level, providing more flexible configurations.
- **Large knowledge base document viewing**: Enhanced the knowledge base interface to allow viewing the entire content of large RAG documents.
- **Added call SID dynamic variable**: Added `system__call_sid` as a system dynamic variable to allow referencing the call ID in prompts and tools.
### Studio
- **Actor Mode**: Added Actor Mode in Studio, allowing you to use your own voice recordings to direct the way speech should sound in Studio projects.
- **Improved keyboard shortcuts**: Updated keyboard shortcuts for viewing settings and editor shortcuts to avoid conflicts and simplified shortcuts for locking paragraphs.
### Dubbing
- **Dubbing duplication**: Made dubbing duplication feature available to all users.
- **Manual mode foreground generation**: Added ability to generate foreground audio when using manual mode with a file and CSV.
### Voices
- **Enhanced voice collections**: Improved voice collections with visual upgrades, language-based filtering, navigation breadcrumbs, collection images, and mouse dragging for carousel navigation.
- **Locale filtering**: Added locale parameter to shared voices endpoint for more precise voice filtering.
### API
## Updated Endpoints
### Text to Speech
- Updated Text to Speech endpoints:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Added `apply_language_text_normalization` parameter for improved text pronunciation in supported languages (currently Japanese)
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added `apply_language_text_normalization`
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added `apply_language_text_normalization`
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added `apply_language_text_normalization`
### Audio Format
- Added Opus format support to multiple endpoints:
- [Text to speech](/docs/api-reference/text-to-speech/convert) - Added support for Opus format with 48kHz sample rate at multiple bitrates (32, 64, 96, 128, 192 kbps)
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added Opus format options
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added Opus format options
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added Opus format options
- [Speech to speech](/docs/api-reference/speech-to-speech) - Added Opus format options
- [Stream speech to speech](/docs/api-reference/speech-to-speech/stream) - Added Opus format options
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added Opus format options
- [Sound generation](/docs/api-reference/sound-generation) - Added Opus format options
### Conversational AI
- Updated Conversational AI endpoints:
- [Delete agent](/docs/api-reference/agents/delete) - Changed success response code from 200 to 204
- [Updated RAG embedding model options](docs/api-reference/knowledge-base/rag-index-status#request.body.model) - replaced `gte_Qwen2_15B_instruct` with `multilingual_e5_large_instruct`
### Voices
- Updated Voice endpoints:
- [Get shared voices](/docs/api-reference/voice-library/get-shared) - Added locale parameter for filtering voices by language region
### Dubbing
- Updated Dubbing endpoint:
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Renamed beta feature `use_replacement_voices_from_library` parameter to `disable_voice_cloning` for clarity
# March 24, 2025
### Voices
- **List Voices V2**: Added a new [V2 voice search endpoint](/docs/api-reference/voices/search) with better search and additional filtering options
### Conversational AI
- **Native outbound calling**: Added native outbound calling for Twilio-configured numbers, eliminating the need for complex setup configurations. Outbound calls are now visible in the Call History page.
- **Automatic language detection**: Added new system tool for automatic language detection that enables agents to switch languages based on both explicit user requests ("Let's talk in Spanish") and implicit language in user audio.
- **Pronunciation dictionary improvements**: Fixed phoneme tags in pronunciation dictionaries to work correctly with conversational AI.
- **Large RAG document viewing**: Added ability to view the entire content of large RAG documents in the knowledge base.
- **Customizable widget controls**: Updated UI to include an optional mute microphone button and made widget icons customizable via slots.
### Sound Effects
- **Fractional duration support**: Fixed an issue where users couldn't enter fractional values (like 0.5 seconds) for sound effect generation duration.
### Speech to Text
- **Repetition handling**: Improved detection and handling of repetitions in speech-to-text processing.
### Studio
- **Reader publishing fixes**: Added support for mp3_44100_192 output format (high quality) so users below Publisher tier can export audio to Reader.
### Mobile
- **Core app signup**: Added signup endpoints for the new Core mobile app.
### API
## New Endpoints
- Added 5 new endpoints:
- [List voices (v2)](/docs/api-reference/voices/search) - Enhanced voice search capabilities with additional filtering options
- [Initiate outbound call](/docs/api-reference/conversations/outbound-call) - New endpoint for making outbound calls via Twilio integration
- [Add pronunciation dictionary from rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Create pronunciation dictionaries directly from rules without file upload
- [Get knowledge base document content](/docs/api-reference/knowledge-base/get-knowledge-base-document-content) - Retrieve full document content from the knowledge base
- [Get knowledge base document chunk](/docs/api-reference/knowledge-base/get-knowledge-base-document-part-by-id) - Retrieve specific chunks from knowledge base documents
## Updated Endpoints
### Conversational AI
- Updated Conversational AI endpoints:
- [Create agent](/docs/api-reference/agents/create) - Added `mic_muting_enabled` property for UI control and `workspace_overrides` property for workspace-specific configurations
- [Update agent](/docs/api-reference/agents/update) - Added `workspace_overrides` property for customizing agent behavior per workspace
- [Get agent](/docs/api-reference/agents/get) - Added `workspace_overrides` property to the response
- [Get widget](/docs/api-reference/widget/get-agent-widget) - Added `mic_muting_enabled` property for controlling microphone muting in the widget UI
- [Get conversation](/docs/api-reference/conversations/get-conversation) - Added rag information to view knowledge base content used during conversations
- [Create phone number](/docs/api-reference/phone-numbers/create) - Replaced generic structure with specific twilio phone number and sip trunk options
- [Compute RAG index](/docs/conversational-ai/api-reference/knowledge-base/compute-rag-index) - Removed `force_reindex` query parameter for more controlled indexing
- [List knowledge base documents](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Changed response structure to support different document types
- [Get knowledge base document](/docs/api-reference/knowledge-base/get) - Modified to return different response models based on document type
### Text to Speech
- Updated Text to Speech endpoints:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made properties optional, including `stability` and `similarity` settings
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made voice settings properties optional for more flexible streaming requests
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made settings optional and modified `pronunciation_dictionary_locators` property
- [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Made voice settings properties optional for more flexible requests
### Speech to Text
- Updated Speech to Text endpoint:
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Removed `biased_keywords` property from form data and improved internal repetition detection algorithm
### Voice Management
- Updated Voice endpoints:
- [Get voices](/docs/api-reference/voices/search) - Updated voice settings properties in the response
- [Get default voice settings](/docs/api-reference/voices/settings/get-default) - Made `stability` and `similarity` properties optional
- [Get voice settings](/docs/api-reference/voices/settings/get) - Made numeric properties optional for more flexible configuration
- [Edit voice settings](/docs/api-reference/voices/settings/update) - Made `stability` and `similarity` settings optional
- [Create voice](/docs/api-reference/voices/ivc/create) - Modified array properties to accept null values
- [Create voice from preview](/docs/api-reference/legacy/voices/create-voice-from-preview) - Updated voice settings model with optional properties
### Studio
- Updated Studio endpoints:
- [Get project](/docs/api-reference/studio/get-project) - Added `version_rules_num` to project metadata
- [Get project snapshot](/docs/api-reference/studio/get-project-snapshot) - Removed `status` property
- [Create pronunciation dictionaries](/docs/api-reference/studio/create-pronunciation-dictionaries) - Modified `pronunciation_dictionary_locators` property and string properties to accept null values
### Pronunciation Dictionary
- Updated Pronunciation Dictionary endpoints:
- [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Added `sort` and `sort_direction` query parameters, plus `latest_version_rules_num` and `integer` properties to response
- [Get pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/get) - Added `latest_version_rules_num` and `integer` properties to response
- [Add from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Added `version_rules_num` property to response for tracking rules quantity
- [Add rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Added `version_rules_num` to response for rules tracking
- [Remove rules](/docs/api-reference/pronunciation-dictionary/remove-rules) - Added `version_rules_num` to response for rules tracking
# March 17, 2025
### Conversational AI
- **Default LLM update**: Changed the default agent LLM from Gemini 1.5 Flash to Gemini 2.0 Flash for improved performance.
- **Fixed incorrect conversation abandons**: Improved detection of conversation continuations, preventing premature abandons when users repeat themselves.
- **Twilio information in history**: Added Twilio call details to conversation history for better tracking.
- **Knowledge base redesign**: Redesigned the knowledge base interface.
- **System dynamic variables**: Added system dynamic variables to use time, conversation id, caller id and other system values as dynamic variables in prompts and tools.
- **Twilio client initialisation**: Adds an agent level override for conversation initiation client data twilio webhook.
- **RAG chunks in history**: Added retrieved chunks by RAG to the call transcripts in the [history view](https://elevenlabs.io/app/conversational-ai/history).
### Speech to Text
- **Reduced pricing**: Reduced the pricing of our Scribe model, see more [here](/docs/capabilities/speech-to-text#pricing).
- **Improved VAD detection**: Enhanced Voice Activity Detection with better pause detection at segment boundaries and improved handling of silent segments.
- **Enhanced diarization**: Improved speaker clustering with a better ECAPA model, symmetric connectivity matrix, and more selective speaker embedding generation.
- **Fixed ASR bugs**: Resolved issues with VAD rounding, silence and clustering that affected transcription accuracy.
### Studio
- **Disable publishing UI**: Added ability to disable the publishing interface for specific workspace members to support enterprise workflows.
- **Snapshot API improvement**: Modified endpoints for project and chapter snapshots to return an empty list instead of throwing errors when snapshots can't be downloaded.
- **Disabled auto-moderation**: Turned off automatic moderation based on Text to Speech generations in Studio.
### Workspaces
- **Fixed API key editing**: Resolved an issue where editing workspace API keys would reset character limits to zero, causing the keys to stop working.
- **Optimized free subscriptions**: Fixed an issue with refreshing free subscription character limits,
### API
## New Endpoints
- Added 3 new endpoints:
- [Get workspace resource](/docs/api-reference/workspace/get-resource)
- [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource)
- [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource)
## Updated Endpoints
### Dubbing
- Updated Dubbing endpoints:
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `use_replacement_voices_from_library` property and made `source_path`, `target_language`, `source_language` nullable
- [Resource dubbing](/docs/api-reference/dubbing/dub-segments) - Made `language_codes` array nullable
- [Add language to dubbing resource](/docs/api-reference/dubbing/add-language-to-resource) - Made `language_code` nullable
- [Translate dubbing resource](/docs/api-reference/dubbing/translate-segments) - Made `target_languages` array nullable
- [Update dubbing segment](/docs/api-reference/dubbing/update-segment-language) - Made `start_time` and `end_time` nullable
### Project Management
- Updated Project endpoints:
- [Add project](/docs/api-reference/studio/add-project) - Made `metadata`, `project_name`, `description` nullable
- [Create podcast](/docs/api-reference/studio/create-podcast) - Made `title`, `description`, `author` nullable
- [Get project](/docs/api-reference/studio/get-project) - Made `last_modified_at`, `created_at`, `project_name` nullable
- [Add chapter](/docs/api-reference/studio/add-chapter) - Made `chapter_id`, `word_count`, `statistics` nullable
- [Update chapter](/docs/api-reference/studio/update-chapter) - Made `content` and `blocks` properties nullable
### Conversational AI
- Updated Conversational AI endpoints:
- [Update agent](/docs/api-reference/agents/update) - Made `conversation_config`, `platform_settings` nullable and added `workspace_overrides` property
- [Create agent](/docs/api-reference/agents/create) - Made `agent_name`, `prompt`, `widget_config` nullable and added `workspace_overrides` property
- [Add to knowledge base](/docs/api-reference/knowledge-base/create-from-url) - Made `document_name` nullable
- [Get conversation](/docs/api-reference/conversations/get-conversation) - Added `twilio_call_data` model and made `transcript`, `metadata` nullable
### Text to Speech
- Updated Text to Speech endpoints:
- [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property
- [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property
- [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made `character_alignment` and `word_alignment` nullable
### Voice Management
- Updated Voice endpoints:
- [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added `loudness`, `quality`, `guidance_scale` properties
- [Create voice from preview](/docs/api-reference/legacy/voices/create-voice-from-preview) - Added `speaker_separation` properties and made `voice_id`, `name`, `labels` nullable
- [Get voice](/docs/api-reference/voices/get) - Added `speaker_boost`, `speaker_clarity`, `speaker_isolation` properties
### Speech to Text
- Updated Speech to Text endpoint:
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `biased_keywords` property
### Other Updates
- [Download history](/docs/api-reference/history/download) - Added application/zip content type and 400 response
- [Add pronunciation dictionary from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Made `dictionary_name` and `description` nullable
# March 10, 2025
### Conversational AI
- **HIPAA compliance**: Conversational AI is now [HIPAA compliant](/docs/conversational-ai/legal/hipaa) on appropriate plans, when a BAA is signed, zero-retention mode is enabled and appropriate LLMs are used. For access please [contact sales](/contact-sales)
- **Cascade LLM**: Added dynamic dispatch during the LLM step to other LLMs if your default LLM fails. This results in higher latency but prevents the turn failing.
- **Better error messages**: Added better error messages for websocket failures.
- **Audio toggling**: Added ability to select only user or agent audio in the conversation playback.
### Scribe
- **HIPAA compliance**: Added a zero retention mode to Scribe to be HIPAA compliant.
- **Diarization**: Increased time length of audio files that can be transcribed with diarization from 8 minutes to 2 hours.
- **Cheaper pricing**: Updated Scribe's pricing to be cheaper, as low as $0.22 per hour for the Business tier.
- **Memory usage**: Shipped improvements to Scribe's memory usage.
- **Fixed timestamps**: Fixed an issue that was causing incorrect timestamps to be returned.
### Text to Speech
- **Pronunciation dictionaries**: Fixed pronunciation dictionary rule application for replacements that contain symbols.
### Dubbing
- **Studio support**: Added support for creating dubs with `dubbing_studio` enabled, allowing for more advanced dubbing workflows beyond one-off dubs.
### Voices
- **Verification**: Fixed an issue where users on probation could not verify their voice clone.
### API
## New Endpoints
- Added 7 new endpoints:
- [Add a shared voice to your collection](/docs/api-reference/voice-library/share)
- [Archive a project snapshot](/docs/api-reference/studio/archive-snapshot)
- [Update a project](/docs/api-reference/studio/edit-project)
- [Create an Audio Native enabled project](/docs/api-reference/audio-native/create)
- [Get all voices](/docs/api-reference/voices/search)
- [Download a pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/download)
- [Get Audio Native project settings](/docs/api-reference/audio-native/get-settings)
## Updated Endpoints
### Studio Projects
- Updated Studio project endpoints to add `source_type` property and deprecate `quality_check_on` and `quality_check_on_when_bulk_convert` properties:
- [Get projects](/docs/api-reference/studio/get-projects)
- [Get project](/docs/api-reference/studio/get-project)
- [Add project](/docs/api-reference/studio/add-project)
- [Update content](/docs/api-reference/studio/update-content)
- [Create podcast](/docs/api-reference/studio/create-podcast)
### Voice Management
- Updated Voice endpoints with several property changes:
- [Get voice](/docs/api-reference/voices/get) - Made several properties optional and added `preview_url`
- [Create voice](/docs/api-reference/voices/ivc/create) - Made several properties optional and added `preview_url`
- [Create voice from preview](/docs/api-reference/legacy/voices/create-voice-from-preview) - Made several properties optional and added `preview_url`
- [Get similar voices](/docs/api-reference/voices/get-similar-library-voices) - Made `language`, `description`, `preview_url`, and `rate` properties optional
### Conversational AI
- Updated Conversational AI agent endpoints:
- [Update agent](/docs/api-reference/agents/update) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties
- [Create agent](/docs/api-reference/agents/create) - Modified `conversation_config`, `agent`, `prompt`, platform_settings, widget properties and added `shareable_page_show_terms`
- [Get agent](/docs/api-reference/agents/get) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties
- [Get widget](/docs/api-reference/widget/get-agent-widget) - Modified `widget_config` property and added `shareable_page_show_terms`
### Knowledge Base
- Updated Knowledge Base endpoints to add metadata property:
- [List knowledge base documents](/docs/api-reference/knowledge-base/list#response.body.metadata)
- [Get knowledge base document](/docs/api-reference/knowledge-base/get-document#response.body.metadata)
### Other Updates
- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `dubbing_studio` property
- [Convert text to sound effects](/docs/api-reference/text-to-sound-effects/convert) - Added `output_format` query parameter
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `enable_logging` query parameter
- [Get secrets](/docs/api-reference/workspace/secrets/list) - Modified `secrets` and `used_by` properties
- [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Made `next_cursor` property optional
## Removed Endpoints
- Temporarily removed Conversational AI tools endpoints:
- Get tool
- List tools
- Update tool
- Create tool
- Delete tool
# March 3, 2025
### Dubbing
- **Scribe for speech recognition**: Dubbing Studio now uses Scribe by default for speech recognition to improve accuracy.
### Speech to Text
- **Fixes**: Shipped several fixes improving the stability of Speech to Text.
### Conversational AI
- **Speed control**: Added speed control to an agent's settings in Conversational AI.
- **Post call webhook**: Added the option of sending [post-call webhooks](/docs/conversational-ai/customization/personalization/post-call-webhooks) after conversations are completed.
- **Improved error messages**: Added better error messages to the Conversational AI websocket.
- **Claude 3.7 Sonnet**: Added Claude 3.7 Sonnet as a new LLM option in Conversational AI.
### API
#### New Endpoints
- Added new Dubbing resource management endpoints:
- for adding [languages to dubs](/docs/api-reference/dubbing/resources/add-language)
- for retrieving [dubbing resources](/docs/api-reference/dubbing/resources/get-resource)
- for creating [segments](/docs/api-reference/dubbing/resources/create-segment)
- for modifying [segments](/docs/api-reference/dubbing/resources/update-segment)
- for removing [segments](/docs/api-reference/dubbing/resources/delete-segment)
- for dubbing [segments](/docs/api-reference/dubbing/resources/dub-segment)
- for transcribing [segments](/docs/api-reference/dubbing/resources/transcribe-segment)
- for translating [segments](/docs/api-reference/dubbing/resources/translate-segment)
- Added Knowledge Base RAG indexing [endpoint](/docs/conversational-ai/api-reference/knowledge-base/compute-rag-index)
- Added Studio snapshot retrieval endpoints for [projects](/docs/api-reference/studio/get-project-snapshot) and [chapters](/docs/api-reference/studio/get-chapter-snapshot)
#### Updated Endpoints
- Added `prompt_injectable` property to knowledge base [endpoints](docs/api-reference/knowledge-base/get#response.body.prompt_injectable)
- Added `name` property to Knowledge Base document [creation](/docs/api-reference/knowledge-base/create-from-url#request.body.name) and [retrieval](/docs/api-reference/knowledge-base/get-document#response.body.name) endpoints:
- Added `speed` property to [agent creation](/docs/api-reference/agents/create#request.body.conversation_config.tts.speed)
- Removed `secrets` property from agent endpoints (now handled by dedicated secrets endpoints)
- Added [secret deletion endpoint](/docs/api-reference/workspace/secrets/delete) for removing secrets
- Removed `secrets` property from settings [endpoints](/docs/api-reference/workspace/get)
# February 25, 2025
### Speech to Text
- **ElevenLabs launched a new state of the art [Speech to Text API](/docs/capabilities/speech-to-text) available in 99 languages.**
### Text to Speech
- **Speed control**: Added speed control to the Text to Speech API.
### Studio
- **Auto-assigned projects**: Increased token limits for auto-assigned projects from 1 month to 3 months worth of tokens, addressing user feedback about working on longer projects.
- **Language detection**: Added automatic language detection when generating audio for the first time, with suggestions to switch to Eleven Turbo v2.5 for languages not supported by Multilingual v2 (Hungarian, Norwegian, Vietnamese).
- **Project export**: Enhanced project exporting in ElevenReader with better metadata tracking.
### Dubbing
- **Clip overlap prevention**: Added automatic trimming of overlapping clips in dubbing jobs to ensure clean audio tracks for each speaker and language.
### Voice Management
- **Instant Voice Cloning**: Improved preview generation for Instant Voice Cloning v2, making previews available immediately.
### Conversational AI
- **Agent ownership**: Added display of agent creators in the agent list, improving visibility and management of shared agents.
### Web app
- **Dark mode**: Added dark mode to the web app.
### API
- Launched **/v1/speech-to-text** [endpoint](/docs/api-reference/speech-to-text/convert)
- Added `agents.level` property to [Conversational AI agents endpoint](/docs/api-reference/agents/get#response.body.agents.access_level)
- Added `platform_settings` to [Conversational AI agent endpoint](/docs/api-reference/agents/update#request.body.platform_settings)
- Added `expandable` variant to `widget_config`, with configuration options `show_avatar_when_collapsed` and `disable_banner` to [Conversational AI agent widget endpoint](/docs/api-reference/agents/get#response.body.widget)
- Added `webhooks` property and `used_by` to `secrets` to [secrets endpoint](/docs/api-reference/workspace/secrets/list#response.body.secrets.used_by)
- Added `verified_languages` to [voices endpoint](/docs/api-reference/voices/get#response.body.verified_languages)
- Added `speed` property to [voice settings endpoints](/docs/api-reference/voices/get#response.body.settings.speed)
- Added `verified_languages`, `is_added_by_user` to `voices` and `min_notice_period_days` query parameter to [shared voices endpoint](/docs/api-reference/voice-library/get-shared#request.query)
- Added `verified_languages`, `is_added_by_user` to `voices` in [similar voices endpoint](/docs/api-reference/voices/get-similar-library-voices)
- Added `search`, `show_only_owned_documents`, `use_typesense` query parameters to [knowledge base endpoint](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.search)
- Added `used_by` to Conversation AI [secrets endpoint](/docs/api-reference/workspace/secrets/list)
- Added `invalidate_affected_text` property to Studio [pronunciation dictionaries endpoint](/docs/api-reference/studio/create-pronunciation-dictionaries#request.body.invalidate_affected_text)
# February 17, 2025
### Conversational AI
- **Tool calling fix**: Fixed an issue where tool calling was not working with agents using gpt-4o mini. This was due to a breaking change in the OpenAI API.
- **Tool calling improvements**: Added support for tool calling with dynamic variables inside objects and arrays.
- **Dynamic variables**: Fixed an issue where dynamic variables of a conversation were not being displayed correctly.
### Voice Isolator
- **Fixed**: Fixed an issue that caused the voice isolator to not work correctly temporarily.
### Workspace
- **Billing**: Improved billing visibility by differentiating rollover, cycle, gifted, and usage-based credits.
- **Usage Analytics**: Improved usage analytics load times and readability.
- **Fine grained fiat billing**: Added support for customizable pricing based on several factors.
### API
- Added `phone_numbers` property to [Agent responses](/docs/api-reference/agents/get)
- Added usage metrics to subscription_extras in [User endpoint](/docs/api-reference/user/get):
- `unused_characters_rolled_over_from_previous_period`
- `overused_characters_rolled_over_from_previous_period`
- `usage` statistics
- Added `enable_conversation_initiation_client_data_from_webhook` to [Agent creation](/docs/api-reference/agents/create)
- Updated [Agent](/docs/api-reference/agents) endpoints with consolidated settings for:
- `platform_settings`
- `overrides`
- `safety`
- Deprecated `with_settings` parameter in [Voice retrieval endpoint](/docs/api-reference/voices/get)
# February 10, 2025
## Conversational AI
- **Updated Pricing**: Updated self-serve pricing for Conversational AI with [reduced cost and a more generous free tier](/docs/conversational-ai/overview#pricing-tiers).
- **Knowledge Base UI**: Created a new page to easily manage your [knowledge base](/app/conversational-ai/knowledge-base).
- **Live calls**: Added number of live calls in progress in the user [dashboard](/app/conversational-ai) and as a new endpoint.
- **Retention**: Added ability to customize transcripts and audio recordings [retention settings](/docs/conversational-ai/customization/privacy/retention).
- **Audio recording**: Added a new option to [disable audio recordings](/docs/conversational-ai/customization/privacy/audio-saving).
- **8k PCM support**: Added support for 8k PCM audio for both input and output.
## Studio
- **GenFM**: Updated the create podcast endpoint to accept [multiple input sources](/docs/api-reference/studio/create-podcast).
- **GenFM**: Fixed an issue where GenFM was creating empty podcasts.
## Enterprise
- **New workspace group endpoints**: Added new endpoints to manage [workspace groups](/docs/api-reference/workspace/search-user-groups).
### API
**Studio (formerly Projects)**
All `/v1/projects/*` endpoints have been deprecated in favor of the new `/v1/studio/projects/*` endpoints. The following endpoints are now deprecated:
- All operations on `/v1/projects/`
- All operations related to chapters, snapshots, and content under `/v1/projects/*`
**Conversational AI**
- `POST /v1/convai/add-tool` - Use `POST /v1/convai/tools` instead
- `DELETE /v1/convai/agents/{agent_id}` - Response type is no longer an object
- `GET /v1/convai/tools` - Response type changed from array to object with a `tools` property
**Conversational AI Updates**
- `GET /v1/convai/agents/{agent_id}` - Updated conversation configuration and agent properties
- `PATCH /v1/convai/agents/{agent_id}` - Added `use_tool_ids` parameter for tool management
- `POST /v1/convai/agents/create` - Added tool integration via `use_tool_ids`
**Knowledge Base & Tools**
- `GET /v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties
- `GET /v1/convai/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties
- `GET /v1/convai/tools/{tool_id}` - Added `dependent_agents` property
- `PATCH /v1/convai/tools/{tool_id}` - Added `dependent_agents` property
**GenFM**
- `POST /v1/projects/podcast/create` - Added support for multiple input sources
**Studio (formerly Projects)**
New endpoints replacing the deprecated `/v1/projects/*` endpoints
- `GET /v1/studio/projects`: List all projects
- `POST /v1/studio/projects`: Create a project
- `GET /v1/studio/projects/{project_id}`: Get project details
- `DELETE /v1/studio/projects/{project_id}`: Delete a project
**Knowledge Base Management**
- `GET /v1/convai/knowledge-base`: List all knowledge base documents
- `DELETE /v1/convai/knowledge-base/{documentation_id}`: Delete a knowledge base
- `GET /v1/convai/knowledge-base/{documentation_id}/dependent-agents`: List agents using this knowledge base
**Workspace Groups** - New enterprise features for team management
- `GET /v1/workspace/groups/search`: Search workspace groups
- `POST /v1/workspace/groups/{group_id}/members`: Add members to a group
- `POST /v1/workspace/groups/{group_id}/members/remove`: Remove members from a group
**Tools**
- `POST /v1/convai/tools`: Create new tools for agents
## Socials
- **ElevenLabs Developers**: Follow our new developers account on X [@ElevenLabsDevs](https://x.com/intent/user?screen_name=elevenlabsdevs)
# February 4, 2025
### Conversational AI
- **Agent monitoring**: Added a new dashboard for monitoring conversational AI agents' activity. Check out your's [here](/app/conversational-ai).
- **Proactive conversations**: Enhanced capabilities with improved timeout retry logic. [Learn more](/docs/conversational-ai/customization/conversation-flow)
- **Tool calls**: Fixed timeout issues occurring during tool calls
- **Allowlist**: Fixed implementation of allowlist functionality.
- **Content summarization**: Added Gemini as a fallback model to ensure service reliability
- **Widget stability**: Fixed issue with dynamic variables causing the Conversational AI widget to fail
### Reader
- **Trending content**: Added carousel showcasing popular articles and trending content
- **New publications**: Introduced dedicated section for recent ElevenReader Publishing releases
### Studio (formerly Projects)
- **Projects is now Studio** and is now generally available to everyone
- **Chapter content editing**: Added support for editing chapter content through the public API, enabling programmatic updates to chapter text and metadata
- **GenFM public API**: Added public API support for podcast creation through GenFM. Key features include:
- Conversation mode with configurable host and guest voices
- URL-based content sourcing
- Customizable duration and highlights
- Webhook callbacks for status updates
- Project snapshot IDs for audio downloads
### SDKs
- **Swift**: fixed an issue where resources were not being released after the end of a session
- **Python**: added uv support
- **Python**: fixed an issue where calls were not ending correctly
### API
- Added POST `v1/workspace/invites/add-bulk` [endpoint](/docs/api-reference/workspace/invite-multiple-users) to enable inviting multiple users simultaneously
- Added POST `v1/projects/podcast/create` [endpoint](/docs/api-reference/studio/create-podcast) for programmatic podcast generation through GenFM
- Added 'v1/convai/knowledge-base/:documentation_id' [endpoints](/docs/api-reference/knowledge-base/) with CRUD operations for Conversational AI
- Added PATCH `v1/projects/:project_id/chapters/:chapter_id` [endpoint](/docs/api-reference/studio/update-chapter) for updating project chapter content and metadata
- Added `group_ids` parameter to [Workspace Invite endpoint](/docs/api-reference/workspace/invite-user) for group-based access control
- Added structured `content` property to [Chapter response objects](/docs/api-reference/studio/get-chapter)
- Added `retention_days` and `delete_transcript_and_pii` data retention parameters to [Agent creation](/docs/api-reference/agents/create)
- Added structured response to [AudioNative content](/docs/api-reference/audio-native/create#response.body.project_id)
- Added `convai_chars_per_minute` usage metric to [User endpoint](/docs/api-reference/user/get)
- Added `media_metadata` field to [Dubbing response objects](/docs/api-reference/dubbing/get)
- Added GDPR-compliant `deletion_settings` to [Conversation responses](/docs/api-reference/conversations/get-conversation#response.body.metadata.deletion_settings)
- Deprecated Knowledge Base legacy endpoints:
- POST `/v1/convai/agents/{agent_id}/add-to-knowledge-base`
- GET `/v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}`
- Updated Agent endpoints with consolidated [privacy control parameters](/docs/api-reference/agents/create)
# January 27, 2025
### Docs
- **Shipped our new docs**: we're keen to hear your thoughts, you can reach out by opening an issue on [GitHub](https://github.com/elevenlabs/elevenlabs-docs) or chatting with us on [Discord](https://discord.gg/elevenlabs)
### Conversational AI
- **Dynamic variables**: Available in the dashboard and SDKs. [Learn more](/docs/conversational-ai/customization/personalization/dynamic-variables)
- **Interruption handling**: Now possible to ignore user interruptions in Conversational AI. [Learn more](/docs/conversational-ai/customization/conversation-flow#interruptions)
- **Twilio integration**: Shipped changes to increase audio quality when integrating with Twilio
- **Latency optimization**: Published detailed blog post on latency optimizations. [Read more](/blog/how-do-you-optimize-latency-for-conversational-ai)
- **PCM 8000**: Added support for PCM 8000 to Conversational AI agents
- **Websocket improvements**: Fixed unexpected websocket closures
### Projects
- **Auto-regenerate**: Auto-regeneration now available by default at no extra cost
- **Content management**: Added `updateContent` method for dynamic content updates
- **Audio conversion**: New auto-convert and auto-publish flags for seamless workflows
### API
- Added `Update Project` endpoint for [project editing](/docs/api-reference/studio/edit-project#:~:text=List%20projects-,POST,Update%20project,-GET)
- Added `Update Content` endpoint for [AudioNative content management](/docs/api-reference/audio-native/update-content)
- Deprecated `quality_check_on` parameter in [project operations](/docs/api-reference/studio/add-project#request.body.quality_check_on). It is now enabled for all users at no extra cost
- Added `apply_text_normalization` parameter to project creation with modes 'auto', 'on', 'apply_english' and 'off' for controlling text normalization during [project creation](/docs/api-reference/studio/add-project#request.body.apply_text_normalization)
- Added alpha feature `auto_assign_voices` in [project creation](/docs/api-reference/studio/add-project#request.body.auto_assign_voices) to automatically assign voices to phrases
- Added `auto_convert` flag to project creation to automatically convert [projects to audio](/docs/api-reference/audio-native/create#request.body.auto_convert)
- Added support for creating Conversational AI agents with [dynamic variables](/docs/api-reference/agents/create#request.body.conversation_config.agent.dynamic_variables)
- Added `voice_slots_used` to `Subscription` model to track number of custom voices used in a workspace to the `User` [endpoint](/docs/api-reference/user/subscription/get#response.body.voice_slots_used)
- Added `user_id` field to `User` [endpoint](/docs/api-reference/user/get#response.body.user_id)
- Marked legacy AudioNative creation parameters (`image`, `small`, `sessionization`) as deprecated [parameters](/docs/api-reference/audio-native/create#request.body.image)
- Agents platform now supports `call_limits` containing either `agent_concurrency_limit` or `daily_limit` or both parameters to control simultaneous and daily conversation limits for [agents](/docs/api-reference/agents/create#request.body.platform_settings.call_limits)
- Added support for `language_presets` in `conversation_config` to customize language-specific [settings](/docs/api-reference/agents/create#request.body.conversation_config.language_presets)
### SDKs
- **Cross-Runtime Support**: Now compatible with **Bun 1.1.45+** and **Deno 2.1.7+**
- **Regenerated SDKs**: We regenerated our SDKs to be up to date with the latest API spec. Check out the latest [Python SDK release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/1.50.5) and [JS SDK release](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v1.50.4)
- **Dynamic Variables**: Fixed an issue where dynamic variables were not being handled correctly, they are now correctly handled in all SDKs
# January 16, 2025
## Product
### Conversational AI
- **Additional languages**: Add a language dropdown to your widget so customers can launch conversations in their preferred language. Learn more [here](/docs/conversational-ai/customization/language).
- **End call tool**: Let the agent automatically end the call with our new “End Call” tool. Learn more [here](/docs/conversational-ai/customization/tools)
- **Flash default**: Flash, our lowest latency model, is now the default for new agents. In your agent dashboard under “voice”, you can toggle between Turbo and Flash. Learn more about Flash [here](https://elevenlabs.io/blog/meet-flash).
- **Privacy**: Set concurrent call and daily call limits, turn off audio recordings, add feedback collection, and define customer terms & conditions.
- **Increased tool limits**: Increase the number of tools available to your agent from 5 to 15. Learn more [here](/docs/conversational-ai/customization/tools).
# January 2, 2025
## Product
- **Workspace Groups and Permissions**: Introduced new workspace group management features to enhance access control within organizations. [Learn more](https://elevenlabs.io/blog/workspace-groups-and-permissions).
# December 19, 2024
## Model
- **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).
## Launches
- **[TalkToSanta.io](https://www.talktosanta.io)**: Experience Conversational AI in action by talking to Santa this holiday season. For every conversation with santa we donate 2 dollars to [Bridging Voice](https://www.bridgingvoice.org) (up to $11,000).
- **[AI Engineer Pack](https://aiengineerpack.com)**: Get $50+ in credits from leading AI developer tools, including ElevenLabs.
# December 6, 2024
## Product
- **GenFM Now on Web**: Access GenFM directly from the website in addition to the ElevenReader App, [try it now](https://elevenlabs.io/app/projects).
# December 3, 2024
## API
- **Credit Usage Limits**: Set specific credit limits for API keys to control costs and manage usage across different use cases by setting "Access" or "No Access" to features like Dubbing, Audio Native, and more. [Check it out](https://elevenlabs.io/app/settings/api-keys)
- **Workspace API Keys**: Now support access permissions, such as "Read" or "Read and Write" for User, Workspace, and History resources.
- **Improved Key Management**:
- Redesigned interface moving from modals to dedicated pages
- Added detailed descriptions and key information
- Enhanced visibility of key details and settings
# November 29, 2024
## Product
- **GenFM**: Launched in the ElevenReader app. [Learn more](https://elevenlabs.io/blog/genfm-on-elevenreader)
- **Conversational AI**: Now generally available to all customers. [Try it now](https://elevenlabs.io/conversational-ai)
- **TTS Redesign**: The website TTS redesign is now rolled out to all customers.
- **Auto-regenerate**: Now available in Projects. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects)
- **Reader Platform Improvements**:
- Improved content sharing with enhanced landing pages and social media previews.
- Added podcast rating system and improved voice synchronization.
- **Projects revamp**:
- Restore past generations, lock content, assign speakers to sentence fragments, and QC at 2x speed. [Learn more](https://elevenlabs.io/blog/narrate-any-project)
- Auto-regeneration identifies mispronunciations and regenerates audio at no extra cost. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects)
## API
- **Conversational AI**: [SDKs and APIs](https://elevenlabs.io/docs/conversational-ai/docs/introduction) now available.
# October 27, 2024
## API
- **u-law Audio Formats**: Added u-law audio formats to the Convai API for integrations with Twilio.
- **TTS Websocket Improvements**: TTS websocket improvements, flushes and generation work more intuitively now.
- **TTS Websocket Auto Mode**: A streamlined mode for using websockets. This setting reduces latency by disabling chunk scheduling and buffers. Note: Using partial sentences will result in significantly reduced quality.
- **Improvements to latency consistency**: Improvements to latency consistency for all models.
## Website
- **TTS Redesign**: The website TTS redesign is now in alpha!
# October 20, 2024
## API
- **Normalize Text with the API**: Added the option normalize the input text in the TTS API. The new parameter is called `apply_text_normalization` and works on all non-turbo & non-flash models.
## Product
- **Voice Design**: The Voice Design feature is now in beta!
# October 13, 2024
## Model
- **Stability Improvements**: Significant audio stability improvements across all models, most noticeable on `turbo_v2` and `turbo_v2.5`, when using:
- Websockets
- Projects
- Reader app
- TTS with request stitching
- ConvAI
- **Latency Improvements**: Reduced time to first byte latency by approximately 20-30ms for all models.
## API
- **Remove Background Noise Voice Samples**: Added the ability to remove background noise from voice samples using our audio isolation model to improve quality for IVCs and PVCs at no additional cost.
- **Remove Background Noise STS Input**: Added the ability to remove background noise from STS audio input using our audio isolation model to improve quality at no additional cost.
## Feature
- **Conversational AI Beta**: Conversational AI is now in beta.
# Text to Speech
> Learn how to turn text into lifelike spoken audio with ElevenLabs.
## Overview
ElevenLabs [Text to Speech (TTS)](/docs/api-reference/text-to-speech) API turns text into lifelike audio with nuanced intonation, pacing and emotional awareness. [Our models](/docs/models) adapt to textual cues across 32 languages and multiple voice styles and can be used to:
* Narrate global media campaigns & ads
* Produce audiobooks in multiple languages with complex emotional delivery
* Stream real-time audio from text
Listen to a sample:
Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project.
Learn how to integrate text to speech into your application.
Step-by-step guide for using text to speech in ElevenLabs.
### Voice quality
For real-time applications, Flash v2.5 provides ultra-low 75ms latency, while Multilingual v2 delivers the highest quality audio with more nuanced expression.
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
10,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
[Explore all](/docs/models)
### Voice options
ElevenLabs offers thousands of voices across 32 languages through multiple creation methods:
* [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices
* [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas
* [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication
* [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions
Learn more about our [voice options](/docs/capabilities/voices).
### Supported formats
The default response format is "mp3", but other formats like "PCM", & "μ-law" are available.
* **MP3**
* Sample rates: 22.05kHz - 44.1kHz
* Bitrates: 32kbps - 192kbps
* 22.05kHz @ 32kbps
* 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps
* **PCM (S16LE)**
* Sample rates: 16kHz - 44.1kHz
* Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz
* 16-bit depth
* **μ-law**
* 8kHz sample rate
* Optimized for telephony applications
* **A-law**
* 8kHz sample rate
* Optimized for telephony applications
* **Opus**
* Sample rate: 48kHz
* Bitrates: 32kbps - 192kbps
Higher quality audio options are only available on paid tiers - see our [pricing
page](https://elevenlabs.io/pricing/api) for details.
### Supported languages
Our v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
Flash v2.5 supports 32 languages - all languages from v2 models plus:
*Hungarian, Norwegian & Vietnamese*
Simply input text in any of our supported languages and select a matching voice from our [voice library](https://elevenlabs.io/community). For the most natural results, choose a voice with an accent that matches your target language and region.
### Prompting
The models interpret emotional context directly from the text input. For example, adding
descriptive text like "she said excitedly" or using exclamation marks will influence the speech
emotion. Voice settings like Stability and Similarity help control the consistency, while the
underlying emotion comes from textual cues.
Read the [prompting guide](/docs/best-practices/prompting) for more details.
Descriptive text will be spoken out by the model and must be manually trimmed or removed from the
audio if desired.
## FAQ
Yes, you can create [instant voice clones](/docs/capabilities/voices#cloned) of your own voice
from short audio clips. For high-fidelity clones, check out our [professional voice
cloning](/docs/capabilities/voices#cloned) feature.
Yes. You retain ownership of any audio you generate. However, commercial usage rights are only
available with paid plans. With a paid subscription, you may use generated audio for commercial
purposes and monetize the outputs if you own the IP rights to the input content.
A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions:
* You can regenerate each piece of content up to 2 times for free
* The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation
Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data.
Use the low-latency Flash [models](/docs/models) (Flash v2 or v2.5) optimized for near real-time
conversational or interactive scenarios. See our [latency optimization
guide](/docs/best-practices/latency-optimization) for more details.
The models are nondeterministic. For consistency, use the optional [seed
parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle
differences may still occur.
Split long text into segments and use streaming for real-time playback and efficient processing.
To maintain natural prosody flow between chunks, include [previous/next text or previous/next
request id parameters](/docs/api-reference/text-to-speech/convert#request.body.previous_text).
# Speech to Text
> Learn how to turn spoken audio into text with ElevenLabs.
## Overview
The ElevenLabs [Speech to Text (STT)](/docs/api-reference/speech-to-text) API turns spoken audio into text with state of the art accuracy. Our Scribe v1 [model](/docs/models) adapts to textual cues across 99 languages and multiple voice styles and can be used to:
* Transcribe podcasts, interviews, and other audio or video content
* Generate transcripts for meetings and other audio or video recordings
Learn how to integrate speech to text into your application.
Step-by-step guide for using speech to text in ElevenLabs.
Companies requiring HIPAA compliance must contact [ElevenLabs
Sales](https://elevenlabs.io/contact-sales) to sign a Business Associate Agreement (BAA)
agreement. Please ensure this step is completed before proceeding with any HIPAA-related
integrations or deployments.
## State of the art accuracy
The Scribe v1 model is capable of transcribing audio from up to 32 speakers with high accuracy. Optionally it can also transcribe audio events like laughter, applause, and other non-speech sounds.
The transcribed output supports exact timestamps for each word and audio event, plus diarization to identify the speaker for each word.
The Scribe v1 model is best used for when high-accuracy transcription is required rather than real-time transcription. A low-latency, real-time version will be released soon.
## Pricing
| Tier | Price/month | Hours included | Price per included hour | Price per additional hour |
| -------- | ----------- | ------------------- | ----------------------- | ------------------------- |
| Free | \$0 | Unavailable | Unavailable | Unavailable |
| Starter | \$5 | 12 hours 30 minutes | \$0.40 | Unavailable |
| Creator | \$22 | 62 hours 51 minutes | \$0.35 | \$0.48 |
| Pro | \$99 | 300 hours | \$0.33 | \$0.40 |
| Scale | \$330 | 1,100 hours | \$0.30 | \$0.33 |
| Business | \$1,320 | 6,000 hours | \$0.22 | \$0.22 |
| Tier | Price/month | Hours included | Price per included hour |
| -------- | ----------- | --------------- | ----------------------- |
| Free | \$0 | 12 minutes | Unavailable |
| Starter | \$5 | 1 hour | \$5 |
| Creator | \$22 | 4 hours 53 min | \$4.5 |
| Pro | \$99 | 24 hours 45 min | \$4 |
| Scale | \$330 | 94 hours 17 min | \$3.5 |
| Business | \$1,320 | 440 hours | \$3 |
For reduced pricing at higher scale than 6,000 hours/month in addition to custom MSAs and DPAs,
please [contact sales](https://elevenlabs.io/contact-sales).
**Note: The free tier requires attribution and does not have commercial licensing.**
Scribe has higher concurrency limits than other services from ElevenLabs.
Please see other concurrency limits [here](/docs/models#concurrency-and-priority)
| Plan | STT Concurrency Limit |
| ---------- | --------------------- |
| Free | 10 |
| Starter | 15 |
| Creator | 25 |
| Pro | 50 |
| Scale | 75 |
| Business | 75 |
| Enterprise | Elevated |
## Examples
The following example shows the output of the Scribe v1 model for a sample audio file.
```javascript
{
"language_code": "en",
"language_probability": 1,
"text": "With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects.",
"words": [
{
"text": "With",
"start": 0.119,
"end": 0.259,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 0.239,
"end": 0.299,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "a",
"start": 0.279,
"end": 0.359,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 0.339,
"end": 0.499,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "soft",
"start": 0.479,
"end": 1.039,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 1.019,
"end": 1.2,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "and",
"start": 1.18,
"end": 1.359,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 1.339,
"end": 1.44,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "whispery",
"start": 1.419,
"end": 1.979,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 1.959,
"end": 2.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "American",
"start": 2.159,
"end": 2.719,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 2.699,
"end": 2.779,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "accent,",
"start": 2.759,
"end": 3.389,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 4.119,
"end": 4.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "I'm",
"start": 4.159,
"end": 4.459,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 4.44,
"end": 4.52,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "the",
"start": 4.5,
"end": 4.599,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 4.579,
"end": 4.699,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "ideal",
"start": 4.679,
"end": 5.099,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 5.079,
"end": 5.219,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "choice",
"start": 5.199,
"end": 5.719,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 5.699,
"end": 6.099,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "for",
"start": 6.099,
"end": 6.199,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 6.179,
"end": 6.279,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "creating",
"start": 6.259,
"end": 6.799,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 6.779,
"end": 6.979,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "ASMR",
"start": 6.959,
"end": 7.739,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 7.719,
"end": 7.859,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "content,",
"start": 7.839,
"end": 8.45,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 9,
"end": 9.06,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "meditative",
"start": 9.04,
"end": 9.64,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 9.619,
"end": 9.699,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "guides,",
"start": 9.679,
"end": 10.359,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 10.359,
"end": 10.409,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "or",
"start": 11.319,
"end": 11.439,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 11.42,
"end": 11.52,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "adding",
"start": 11.5,
"end": 11.879,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 11.859,
"end": 12,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "an",
"start": 11.979,
"end": 12.079,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 12.059,
"end": 12.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "intimate",
"start": 12.179,
"end": 12.579,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 12.559,
"end": 12.699,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "feel",
"start": 12.679,
"end": 13.159,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.139,
"end": 13.179,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "to",
"start": 13.159,
"end": 13.26,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.239,
"end": 13.3,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "your",
"start": 13.299,
"end": 13.399,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.379,
"end": 13.479,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "narrative",
"start": 13.479,
"end": 13.889,
"type": "word",
"speaker_id": "speaker_0"
},
{
"text": " ",
"start": 13.919,
"end": 13.939,
"type": "spacing",
"speaker_id": "speaker_0"
},
{
"text": "projects.",
"start": 13.919,
"end": 14.779,
"type": "word",
"speaker_id": "speaker_0"
}
]
}
```
The output is classified in three category types:
* `word` - A word in the language of the audio
* `spacing` - The space between words, not applicable for languages that don't use spaces like Japanese, Mandarin, Thai, Lao, Burmese and Cantonese
* `audio_event` - Non-speech sounds like laughter or applause
## Models
State-of-the-art speech recognition model
Accurate transcription in 99 languages
Precise word-level timestamps
Speaker diarization
Dynamic audio tagging
[Explore all](/docs/models)
## Concurrency and priority
Concurrency is the concept of how many requests can be processed at the same time.
For Speech to Text, files that are over 8 minutes long are transcribed in parallel internally in order to speed up processing. The audio is chunked into four segments to be transcribed concurrently.
You can calculate the concurrency limit with the following calculation:
$$
Concurrency = \min(4, \text{round\_up}(\frac{\text{audio\_duration\_secs}}{480}))
$$
For example, a 15 minute audio file will be transcribed with a concurrency of 2, while a 120 minute audio file will be transcribed with a concurrency of 4.
## Supported languages
The Scribe v1 model supports 99 languages, including:
*Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (zho), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).*
### Breakdown of language support
Word Error Rate (WER) is a key metric used to evaluate the accuracy of transcription systems. It measures how many errors are present in a transcript compared to a reference transcript. Below is a breakdown of the WER for each language that Scribe v1 supports.
Bulgarian (bul), Catalan (cat), Czech (ces), Danish (dan), Dutch (nld), English (eng), Finnish
(fin), French (fra), Galician (glg), German (deu), Greek (ell), Hindi (hin), Indonesian (ind),
Italian (ita), Japanese (jpn), Kannada (kan), Malay (msa), Malayalam (mal), Macedonian (mkd),
Norwegian (nor), Polish (pol), Portuguese (por), Romanian (ron), Russian (rus), Serbian (srp),
Slovak (slk), Spanish (spa), Swedish (swe), Turkish (tur), Ukrainian (ukr) and Vietnamese (vie).
Bengali (ben), Belarusian (bel), Bosnian (bos), Cantonese (yue), Estonian (est), Filipino (fil),
Gujarati (guj), Hungarian (hun), Kazakh (kaz), Latvian (lav), Lithuanian (lit), Mandarin (cmn),
Marathi (mar), Nepali (nep), Odia (ori), Persian (fas), Slovenian (slv), Tamil (tam) and Telugu
(tel)
Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani
(aze), Burmese (mya), Cebuano (ceb), Croatian (hrv), Georgian (kat), Hausa (hau), Hebrew (heb),
Icelandic (isl), Javanese (jav), Kabuverdianu (kea), Korean (kor), Kyrgyz (kir), Lingala (lin),
Maltese (mlt), Mongolian (mon), Māori (mri), Occitan (oci), Punjabi (pan), Sindhi (snd), Swahili
(swa), Tajik (tgk), Thai (tha), Urdu (urd), Uzbek (uzb) and Welsh (cym).
Amharic (amh), Chichewa (nya), Fulah (ful), Ganda (lug), Igbo (ibo), Irish (gle), Khmer (khm),
Kurdish (kur), Lao (lao), Luxembourgish (ltz), Luo (luo), Northern Sotho (nso), Pashto (pus),
Shona (sna), Somali (som), Umbundu (umb), Wolof (wol), Xhosa (xho) and Zulu (zul).
## FAQ
Yes, the API supports uploading both audio and video files for transcription.
Files up to 1 GB in size and up to 4.5 hours in duration are supported.
The audio supported audio formats include:
* audio/aac
* audio/x-aac
* audio/x-aiff
* audio/ogg
* audio/mpeg
* audio/mp3
* audio/mpeg3
* audio/x-mpeg-3
* audio/opus
* audio/wav
* audio/x-wav
* audio/webm
* audio/flac
* audio/x-flac
* audio/mp4
* audio/aiff
* audio/x-m4a
Supported video formats include:
* video/mp4
* video/x-msvideo
* video/x-matroska
* video/quicktime
* video/x-ms-wmv
* video/x-flv
* video/webm
* video/mpeg
* video/3gpp
ElevenLabs is constantly expanding the number of languages supported by our models. Please check back frequently for updates.
Yes, asynchronous transcription results can be sent to webhooks configured in webhook settings in the UI. Learn more in the [webhooks cookbook](/docs/cookbooks/speech-to-text/webhooks).
# Text to Dialogue
> Learn how to create immersive, natural-sounding dialogue with ElevenLabs.
Eleven v3 API access is currently not publicly available, but will be soon. To request access,
please [contact our sales team](https://elevenlabs.io/contact-sales).
## Overview
The ElevenLabs [Text to Dialogue](/docs/api-reference/text-to-dialogue) API creates natural sounding expressive dialogue from text using the Eleven v3 model. Popular use cases include:
* Generating pitch perfect conversations for video games
* Creating immersive dialogue for podcasts and other audio content
* Bring audiobooks to life with expressive narration
Text to Dialogue is not intended for use in real-time applications like Conversational AI. Several generations might be required to achieve the desired results. When integrating Text to Dialogue into your application, consider generating several generations and allowing the user to select the best one.
Listen to a sample:
Learn how to integrate text to dialogue into your application.
Step-by-step guide for using text to dialogue in ElevenLabs.
## Voice options
ElevenLabs offers thousands of voices across 70+ languages through multiple creation methods:
* [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices
* [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas
* [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication
* [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions
Learn more about our [voice options](/docs/capabilities/voices).
## Prompting
The models interpret emotional context directly from the text input. For example, adding
descriptive text like "she said excitedly" or using exclamation marks will influence the speech
emotion. Voice settings like Stability and Similarity help control the consistency, while the
underlying emotion comes from textual cues.
Read the [prompting guide](/docs/best-practices/prompting) for more details.
### Emotional deliveries with audio tags
This feature is still under active development, actual results may vary.
The Eleven v3 model allows the use of non-speech audio events to influence the delivery of the dialogue. This is done by inserting the audio events into the text input wrapped in square brackets.
Audio tags come in a few different forms:
### Emotions and delivery
For example, \[sad], \[laughing] and \[whispering]
### Audio events
For example, \[leaves rustling], \[gentle footsteps] and \[applause].
### Overall direction
For example, \[football], \[wrestling match] and \[auctioneer].
Some examples include:
```
"[giggling] That's really funny!"
"[groaning] That was awful."
"Well, [sigh] I'm not sure what to say."
```
You can also use punctuation to indicate the flow of dialog, like interruptions:
```
"[cautiously] Hello, is this seat-"
"[jumping in] Free? [cheerfully] Yes it is."
```
Ellipses can be used to indicate trailing sentences:
```
"[indecisive] Hi, can I get uhhh..."
"[quizzically] The usual?"
"[elated] Yes! [laughs] I'm so glad you knew!"
```
## Supported formats
The default response format is "mp3", but other formats like "PCM", & "μ-law" are available.
* **MP3**
* Sample rates: 22.05kHz - 44.1kHz
* Bitrates: 32kbps - 192kbps
* 22.05kHz @ 32kbps
* 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps
* **PCM (S16LE)**
* Sample rates: 16kHz - 44.1kHz
* Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz
* 16-bit depth
* **μ-law**
* 8kHz sample rate
* Optimized for telephony applications
* **A-law**
* 8kHz sample rate
* Optimized for telephony applications
* **Opus**
* Sample rate: 48kHz
* Bitrates: 32kbps - 192kbps
Higher quality audio options are only available on paid tiers - see our [pricing
page](https://elevenlabs.io/pricing/api) for details.
## Supported languages
The Eleven v3 model supports 70+ languages, including:
*Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).*
## FAQ
Text to Dialogue is only available on the Eleven v3 model.
Yes. You retain ownership of any audio you generate. However, commercial usage rights are only
available with paid plans. With a paid subscription, you may use generated audio for commercial
purposes and monetize the outputs if you own the IP rights to the input content.
A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions:
* Only available within the ElevenLabs dashboard.
* You can regenerate each piece of content up to 2 times for free.
* The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation.
Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data.
There is no limit to the number of speakers in a dialogue.
The models are nondeterministic. For consistency, use the optional [seed
parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle
differences may still occur.
Split long text into segments and use streaming for real-time playback and efficient processing.
# Voice changer
> Learn how to transform audio between voices while preserving emotion and delivery.
## Overview
ElevenLabs [voice changer](/docs/api-reference/speech-to-speech/convert) API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to:
* Change any voice while preserving emotional delivery and nuance
* Create consistent character voices across multiple languages and recording sessions
* Fix or replace specific words and phrases in existing recordings
Your browser does not support the video tag.
Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project.
Learn how to integrate voice changer into your application.
Step-by-step guide for using voice changer in ElevenLabs.
## Supported languages
Our v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
The `eleven_english_sts_v2` model only supports English.
## Best practices
### Audio quality
* Record in a quiet environment to minimize background noise
* Maintain appropriate microphone levels - avoid too quiet or peaked audio
* Use `remove_background_noise=true` if environmental sounds are present
### Recording guidelines
* Keep segments under 5 minutes for optimal processing
* Feel free to include natural expressions (laughs, sighs, emotions)
* The source audio's accent and language will be preserved in the output
### Parameters
* **Style**: Set to 0% when input audio is already expressive
* **Stability**: Use 100% for maximum voice consistency
* **Language**: Choose source audio that matches your desired accent and language
## FAQ
Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability
and consistent output.
Absolutely. Provide your custom voice’s voice\_id
and specify the correct{' '}
model\_id
.
You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no
additional fee based on file size.
Possibly. Use remove\_background\_noise=true
or the Voice Isolator tool to minimize
environmental sounds in the final output.
Though eleven\_english\_sts\_v2
is available, our{' '}
eleven\_multilingual\_sts\_v2
model often outperforms it, even for English material.
“Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances
in the source audio, turn style down and stability up.
# Voice isolator
> Learn how to isolate speech from background noise, music, and ambient sounds from any audio.
## Overview
ElevenLabs [voice isolator](/docs/api-reference/audio-isolation/audio-isolation) API transforms audio recordings with background noise into clean, studio-quality speech. This is particularly useful for audio recorded in noisy environments, or recordings containing unwanted ambient sounds, music, or other background interference.
Listen to a sample:
## Usage
The voice isolator model extracts speech from background noise in both audio and video files.
Learn how to integrate voice isolator into your application.
Step-by-step guide for using voice isolator in ElevenLabs.
### Supported file types
* **Audio**: AAC, AIFF, OGG, MP3, OPUS, WAV, FLAC, M4A
* **Video**: MP4, AVI, MKV, MOV, WMV, FLV, WEBM, MPEG, 3GPP
## FAQ
* **Cost**: Voice isolator costs 1000 characters for every minute of audio.
* **File size and length**: Supports files up to 500MB and 1 hour in length.
* **Music vocals**: Not specifically optimized for isolating vocals from music, but may work depending on the content.
# Dubbing
> Learn how to translate audio and video while preserving the emotion, timing & tone of speakers.
## Overview
ElevenLabs [dubbing](/docs/api-reference/dubbing/create) API translates audio and video across 32 languages while preserving the emotion, timing, tone and unique characteristics of each speaker. Our model separates each speaker’s dialogue from the soundtrack, allowing you to recreate the original delivery in another language. It can be used to:
* Grow your addressable audience by 4x to reach international audiences
* Adapt existing material for new markets while preserving emotional nuance
* Offer content in multiple languages without re-recording voice talent
Your browser does not support the video tag.
We also offer a [fully managed dubbing service](https://elevenlabs.io/elevenstudios) for video and podcast creators.
## Usage
ElevenLabs dubbing can be used in three ways:
* **Dubbing Studio** in the user interface for fast, interactive control and editing
* **Programmatic integration** via our [API](/docs/api-reference/dubbing/create) for large-scale or automated workflows
* **Human-verified dubs via ElevenLabs Productions** - for more information, please reach out to [productions@elevenlabs.io](mailto:productions@elevenlabs.io)
The UI supports files up to **500MB** and **45 minutes**. The API supports files up to **1GB** and **2.5 hours**.
Learn how to integrate dubbing into your application.
Edit transcripts and translate videos step by step in Dubbing Studio.
### Key features
**Speaker separation**
Automatically detect multiple speakers, even with overlapping speech.
**Multi-language output**
Generate localized tracks in 32 languages.
**Preserve original voices**
Retain the speaker’s identity and emotional tone.
**Keep background audio**
Avoid re-mixing music, effects, or ambient sounds.
**Customizable transcripts**
Manually edit translations and transcripts as needed.
**Supported file types**
Videos and audio can be dubbed from various sources, including YouTube, X, TikTok, Vimeo, direct URLs, or file uploads.
**Video transcript and translation editing**
Our AI video translator lets you manually edit transcripts and translations to ensure your content is properly synced and localized. Adjust the voice settings to tune delivery, and regenerate speech segments until the output sounds just right.
A Creator plan or higher is required to dub audio files. For videos, a watermark option is
available to reduce credit usage.
### Cost
To reduce credit usage, you can:
* Dub only a selected portion of your file
* Use watermarks on video output (not available for audio)
* Fine-tune transcripts and regenerate individual segments instead of the entire clip
Refer to our [pricing page](https://elevenlabs.io/pricing) for detailed credit costs.
## List of supported languages for dubbing
| No | Language Name | Language Code |
| -- | ------------- | ------------- |
| 1 | English | en |
| 2 | Hindi | hi |
| 3 | Portuguese | pt |
| 4 | Chinese | zh |
| 5 | Spanish | es |
| 6 | French | fr |
| 7 | German | de |
| 8 | Japanese | ja |
| 9 | Arabic | ar |
| 10 | Russian | ru |
| 11 | Korean | ko |
| 12 | Indonesian | id |
| 13 | Italian | it |
| 14 | Dutch | nl |
| 15 | Turkish | tr |
| 16 | Polish | pl |
| 17 | Swedish | sv |
| 18 | Filipino | fil |
| 19 | Malay | ms |
| 20 | Romanian | ro |
| 21 | Ukrainian | uk |
| 22 | Greek | el |
| 23 | Czech | cs |
| 24 | Danish | da |
| 25 | Finnish | fi |
| 26 | Bulgarian | bg |
| 27 | Croatian | hr |
| 28 | Slovak | sk |
| 29 | Tamil | ta |
## FAQ
Dubbing can be performed on all types of short and long form video and audio content. We
recommend dubbing content with a maximum of 9 unique speakers at a time to ensure a high-quality
dub.
Yes. Our models analyze each speaker’s original delivery to recreate the same tone, pace, and
style in your target language.
We use advanced source separation to isolate individual voices from ambient sound. Multiple
overlapping speakers can be split into separate tracks.
Via the user interface, the maximum file size is 500MB up to 45 minutes. Through the API, you
can process files up to 1GB and 2.5 hours.
You can choose to dub only certain portions of your video/audio or tweak translations/voices in
our interactive Dubbing Studio.
# Sound effects
> Learn how to create high-quality sound effects from text with ElevenLabs.
## Overview
ElevenLabs [sound effects](/docs/api-reference/text-to-sound-effects/convert) API turns text descriptions into high-quality audio effects with precise control over timing, style and complexity. The model understands both natural language and audio terminology, enabling you to:
* Generate cinematic sound design for films & trailers
* Create custom sound effects for games & interactive media
* Produce Foley and ambient sounds for video content
Listen to an example:
## Usage
Sound effects are generated using text descriptions & two optional parameters:
* **Duration**: Set a specific length for the generated audio (in seconds)
* Default: Automatically determined based on the prompt
* Range: 0.1 to 22 seconds
* Cost: 40 credits per second when duration is specified
* **Prompt influence**: Control how strictly the model follows the prompt
* High: More literal interpretation of the prompt
* Low: More creative interpretation with added variations
Learn how to integrate sound effects into your application.
Step-by-step guide for using sound effects in ElevenLabs.
### Prompting guide
#### Simple effects
For basic sound effects, use clear, concise descriptions:
* "Glass shattering on concrete"
* "Heavy wooden door creaking open"
* "Thunder rumbling in the distance"
#### Complex sequences
For multi-part sound effects, describe the sequence of events:
* "Footsteps on gravel, then a metallic door opens"
* "Wind whistling through trees, followed by leaves rustling"
* "Sword being drawn, then clashing with another blade"
#### Musical elements
The API also supports generation of musical components:
* "90s hip-hop drum loop, 90 BPM"
* "Vintage brass stabs in F minor"
* "Atmospheric synth pad with subtle modulation"
#### Audio Terminology
Common terms that can enhance your prompts:
* **Impact**: Collision or contact sounds between objects, from subtle taps to dramatic crashes
* **Whoosh**: Movement through air effects, ranging from fast and ghostly to slow-spinning or rhythmic
* **Ambience**: Background environmental sounds that establish atmosphere and space
* **One-shot**: Single, non-repeating sound
* **Loop**: Repeating audio segment
* **Stem**: Isolated audio component
* **Braam**: Big, brassy cinematic hit that signals epic or dramatic moments, common in trailers
* **Glitch**: Sounds of malfunction, jittering, or erratic movement, useful for transitions and sci-fi
* **Drone**: Continuous, textured sound that creates atmosphere and suspense
## FAQ
The maximum duration is 22 seconds per generation. For longer sequences, generate multiple
effects and combine them.
Yes, you can generate musical elements like drum loops, bass lines, and melodic samples.
However, for full music production, consider combining multiple generated elements.
Use detailed prompts, appropriate duration settings, and high prompt influence for more
predictable results. For complex sounds, generate components separately and combine them.
Generated audio is provided in MP3 format with professional-grade quality (44.1kHz,
128-192kbps).
# Voices
> Learn how to create, customize, and manage voices with ElevenLabs.
## Overview
ElevenLabs provides models for voice creation & customization. The platform supports a wide range of voice options, including voices from our extensive [voice library](https://elevenlabs.io/app/voice-library), voice cloning, and artificially designed voices using text prompts.
### Voice categories
* **Community**: Voices shared by the community from the ElevenLabs [voice library](/docs/product-guides/voices/voice-library).
* **Cloned**: Custom voices created using instant or professional [voice cloning](/docs/product-guides/voices/voice-cloning).
* **Voice design**: Artificially designed voices created with the [voice design](/docs/product-guides/voices/voice-design) tool.
* **Default**: Pre-designed, high-quality voices optimized for general use.
#### Community
The [voice library](/docs/product-guides/voices/voice-library) contains over 5,000 voices shared by the ElevenLabs community. Use it to:
* Discover unique voices shared by the ElevenLabs community.
* Add voices to your personal collection.
* Share your own voice clones for cash rewards when others use it.
Share your voice with the community, set your terms, and earn cash rewards when others use it.
We've paid out over **\$1M** already.
Learn how to use voices from the voice library
#### Cloned
Clone your own voice from 30-second samples with Instant Voice Cloning, or create hyper-realistic voices using Professional Voice Cloning.
* **Instant Voice Cloning**: Quickly replicate a voice from short audio samples.
* **Professional Voice Cloning**: Generate professional-grade voice clones with extended training audio.
Voice-captcha technology is used to verify that **all** voice clones are created from your own voice samples.
A Creator plan or higher is required to create voice clones.
Clone a voice instantly
Create a perfect voice clone
Learn how to create instant & professional voice clones
#### Voice design
With [Voice Design](/docs/product-guides/voices/voice-design), you can create entirely new voices by specifying attributes like age, gender, accent, and tone. Generated voices are ideal for:
* Realistic voices with nuanced characteristics.
* Creative character voices for games and storytelling.
The voice design tool creates 3 voice previews, simply provide:
* A **voice description** between 20 and 1000 characters.
* A **text** to preview the voice between 100 and 1000 characters.
#### Voice design with Eleven v3 (alpha)
Using the new [Eleven v3 model](/docs/models#eleven-v3-alpha), voices that are capable of a wide range of emotion can be designed via a prompt.
Using v3 gets you the following benefits:
* More natural and versatile voice generation.
* Better control over voice characteristics.
* Audio tags supported in Preview generations.
* Backward compatibility with v2 models.
Voice design with v3 is currently in alpha. It is only available in the dashboard, with API access
coming soon.
Integrate voice design into your application.
Learn how to craft voices from a single prompt.
#### Default
Our curated set of default voices is optimized for core use cases. These voices are:
* **Reliable**: Available long-term.
* **Consistent**: Carefully crafted and quality-checked for performance.
* **Model-ready**: Fine-tuned on new models upon release.
Default voices are available to all users via the **my voices** tab in the [voice lab
dashboard](https://elevenlabs.io/app/voice-lab). Default voices were previously referred to as
`premade` voices. The latter term is still used when accessing default voices via the API.
### Managing voices
All voices can be managed through **My Voices**, where you can:
* Search, filter, and categorize voices
* Add descriptions and custom tags
* Organize voices for quick access
Learn how to manage your voice collection in [My Voices documentation](/docs/product-guides/voices/voice-library).
* **Search and Filter**: Find voices using keywords or tags.
* **Preview Samples**: Listen to voice demos before adding them to **My Voices**.
* **Add to Collection**: Save voices for easy access in your projects.
> **Tip**: Try searching by specific accents or genres, such as "Australian narration" or "child-like character."
### Supported languages
All ElevenLabs voices support multiple languages. Experiment by converting phrases like `Hello! こんにちは! Bonjour!` into speech to hear how your own voice sounds across different languages.
ElevenLabs supports voice creation in 32 languages. Match your voice selection to your target region for the most natural results.
* **Default Voices**: Optimized for multilingual use.
* **Generated and Cloned Voices**: Accent fidelity depends on input samples or selected attributes.
Our v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
Flash v2.5 supports 32 languages - all languages from v2 models plus:
*Hungarian, Norwegian & Vietnamese*
[Learn more about our models](/docs/models)
## FAQ
Yes, you can create custom voices with Voice Design or clone voices using Instant or
Professional Voice Cloning. Both options are accessible in **My Voices**.
Instant Voice Cloning uses short audio samples for near-instantaneous voice creation.
Professional Voice Cloning requires longer samples but delivers hyper-realistic, high-quality
results.
Professional Voice Clones can be shared privately or publicly in the Voice Library. Generated
voices and Instant Voice Clones cannot currently be shared.
Use **My Voices** to search, filter, and organize your voice collection. You can also delete,
tag, and categorize voices for easier management.
Use clean and consistent audio samples. For Professional Voice Cloning, provide a variety of
recordings in the desired speaking style.
Yes, Professional Voice Clones can be shared in the Voice Library. Instant Voice Clones and
Generated Voices cannot currently be shared.
Generated Voices are ideal for unique characters in games, animations, and creative
storytelling.
Go to **Voices > Voice Library** in your dashboard or access it via API.
# Forced Alignment
> Learn how to turn spoken audio and text into a time-aligned transcript with ElevenLabs.
## Overview
The ElevenLabs [Forced Alignment](/docs/api-reference/forced-alignment) API turns spoken audio and text into a time-aligned transcript. This is useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. This can be used for:
* Matching subtitles to a video recording
* Generating timings for an audiobook recording of an ebook
## Usage
The Forced Alignment API can be used by interfacing with the ElevenLabs API directly.
Learn how to integrate Forced Alignment into your application.
## Supported languages
Our v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
## FAQ
Forced alignment is a technique used to align spoken audio with text. You provide an audio file and a transcript of the audio file and the API will return a time-aligned transcript.
It's useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript.
The input text should be a string with no special formatting i.e. JSON.
Example of good input text:
```
"Hello, how are you?"
```
Example of bad input text:
```
{
"text": "Hello, how are you?"
}
```
Forced Alignment costs the same as the [Speech to Text](/docs/capabilities/speech-to-text#pricing) API.
Forced Alignment does not support diarization. If you provide diarized text, the API will likely return unwanted results.
The maximum file size for Forced Alignment is 1GB.
For audio files, the maximum duration is 4.5 hours.
For the text input, the maximum length is 675k characters.
# Streaming text to speech
> Learn how to stream text into speech in Python or Node.js.
In this tutorial, you'll learn how to convert [text to speech](https://elevenlabs.io/text-to-speech) with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application.
If you want to jump straight to an example you can find them in the [Python](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) and [Node.js](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) example repositories.
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/docs/developer-guides/quickstart#authentication)).
* Python or Node installed on your machine
* (Optionally) an AWS account with access to S3.
## Setup
### Installing our SDK
Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip:
```bash Python
pip install elevenlabs
```
```bash TypeScript
npm install @elevenlabs/elevenlabs-js
```
Additionally, install necessary packages to manage your environmental variables:
```bash Python
pip install python-dotenv
```
```bash TypeScript
npm install dotenv
npm install @types/dotenv --save-dev
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Convert text to speech (file)
To convert text to speech and save it as a file, we’ll use the `convert` method of the ElevenLabs SDK and then it locally as a `.mp3` file.
```python Python
import os
import uuid
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
def text_to_speech_file(text: str) -> str:
# Calling the text_to_speech conversion API with detailed parameters
response = elevenlabs.text_to_speech.convert(
voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
output_format="mp3_22050_32",
text=text,
model_id="eleven_turbo_v2_5", # use the turbo model for low latency
# Optional voice settings that allow you to customize the output
voice_settings=VoiceSettings(
stability=0.0,
similarity_boost=1.0,
style=0.0,
use_speaker_boost=True,
speed=1.0,
),
)
# uncomment the line below to play the audio back
# play(response)
# Generating a unique file name for the output MP3 file
save_file_path = f"{uuid.uuid4()}.mp3"
# Writing the audio to a file
with open(save_file_path, "wb") as f:
for chunk in response:
if chunk:
f.write(chunk)
print(f"{save_file_path}: A new audio file was saved successfully!")
# Return the path of the saved audio file
return save_file_path
```
```typescript TypeScript
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import * as dotenv from 'dotenv';
import { createWriteStream } from 'fs';
import { v4 as uuid } from 'uuid';
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const elevenlabs = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export const createAudioFileFromText = async (text: string): Promise => {
return new Promise(async (resolve, reject) => {
try {
const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
modelId: 'eleven_multilingual_v2',
text,
outputFormat: 'mp3_44100_128',
// Optional voice settings that allow you to customize the output
voiceSettings: {
stability: 0,
similarityBoost: 0,
useSpeakerBoost: true,
speed: 1.0,
},
});
const fileName = `${uuid()}.mp3`;
const fileStream = createWriteStream(fileName);
audio.pipe(fileStream);
fileStream.on('finish', () => resolve(fileName)); // Resolve with the fileName
fileStream.on('error', reject);
} catch (error) {
reject(error);
}
});
};
```
You can then run this function with:
```python Python
text_to_speech_file("Hello World")
```
```typescript TypeScript
await createAudioFileFromText('Hello World');
```
## Convert text to speech (streaming)
If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature.
```python Python
import os
from typing import IO
from io import BytesIO
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
def text_to_speech_stream(text: str) -> IO[bytes]:
# Perform the text-to-speech conversion
response = elevenlabs.text_to_speech.stream(
voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
output_format="mp3_22050_32",
text=text,
model_id="eleven_multilingual_v2",
# Optional voice settings that allow you to customize the output
voice_settings=VoiceSettings(
stability=0.0,
similarity_boost=1.0,
style=0.0,
use_speaker_boost=True,
speed=1.0,
),
)
# Create a BytesIO object to hold the audio data in memory
audio_stream = BytesIO()
# Write each chunk of audio data to the stream
for chunk in response:
if chunk:
audio_stream.write(chunk)
# Reset stream position to the beginning
audio_stream.seek(0)
# Return the stream for further use
return audio_stream
```
```typescript TypeScript
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import * as dotenv from 'dotenv';
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
if (!ELEVENLABS_API_KEY) {
throw new Error('Missing ELEVENLABS_API_KEY in environment variables');
}
const elevenlabs = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export const createAudioStreamFromText = async (text: string): Promise => {
const audioStream = await elevenlabs.textToSpeech.stream('JBFqnCBsd6RMkjVDRZzb', {
modelId: 'eleven_multilingual_v2',
text,
outputFormat: 'mp3_44100_128',
// Optional voice settings that allow you to customize the output
voiceSettings: {
stability: 0,
similarityBoost: 1.0,
useSpeakerBoost: true,
speed: 1.0,
},
});
const chunks: Buffer[] = [];
for await (const chunk of audioStream) {
chunks.push(chunk);
}
const content = Buffer.concat(chunks);
return content;
};
```
You can then run this function with:
```python Python
text_to_speech_stream("This is James")
```
```typescript TypeScript
await createAudioStreamFromText('This is James');
```
## Bonus - Uploading to AWS S3 and getting a secure sharing link
Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link.
To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your `.env` file. Follow these steps to find the credentials:
1. Log in to your AWS Management Console: Navigate to the AWS home page and sign in with your account.
2. Access the IAM (Identity and Access Management) Dashboard: You can find IAM under "Security, Identity, & Compliance" on the services menu. The IAM dashboard manages access to your AWS services securely.
3. Create a New User (if necessary): On the IAM dashboard, select "Users" and then "Add user". Enter a user name.
4. Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it's best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket.
5. Review and create the user: Review your settings and create the user. Upon creation, you'll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step.
6. Get AWS region name: ex. us-east-1
If you do not have an AWS S3 bucket, you will need to create a new one by following these steps:
1. Access the S3 dashboard: You can find S3 under "Storage" on the services menu.
2. Create a new bucket: On the S3 dashboard, click the "Create bucket" button.
3. Enter a bucket name and click on the "Create bucket" button. You can leave the other bucket options as default. The newly added bucket will appear in the list.
Install `boto3` for interacting with AWS services using `pip` and `npm`.
```bash Python
pip install boto3
```
```bash TypeScript
npm install @aws-sdk/client-s3
npm install @aws-sdk/s3-request-presigner
```
Then add the environment variables to `.env` file like so:
```
AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
AWS_REGION_NAME=your_aws_region_name_here
AWS_S3_BUCKET_NAME=your_s3_bucket_name_here
```
Add the following functions to upload the audio stream to S3 and generate a signed URL.
```python s3_uploader.py (Python)
import os
import boto3
import uuid
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_REGION_NAME = os.getenv("AWS_REGION_NAME")
AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")
session = boto3.Session(
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_REGION_NAME,
)
s3 = session.client("s3")
def generate_presigned_url(s3_file_name: str) -> str:
signed_url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name},
ExpiresIn=3600,
) # URL expires in 1 hour
return signed_url
def upload_audiostream_to_s3(audio_stream) -> str:
s3_file_name = f"{uuid.uuid4()}.mp3" # Generates a unique file name using UUID
s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name)
return s3_file_name
```
```typescript s3_uploader.ts (TypeScript)
import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
import * as dotenv from 'dotenv';
import { v4 as uuid } from 'uuid';
dotenv.config();
const { AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME, AWS_S3_BUCKET_NAME } =
process.env;
if (!AWS_ACCESS_KEY_ID || !AWS_SECRET_ACCESS_KEY || !AWS_REGION_NAME || !AWS_S3_BUCKET_NAME) {
throw new Error('One or more environment variables are not set. Please check your .env file.');
}
const s3 = new S3Client({
credentials: {
accessKeyId: AWS_ACCESS_KEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
},
region: AWS_REGION_NAME,
});
export const generatePresignedUrl = async (objectKey: string) => {
const getObjectParams = {
Bucket: AWS_S3_BUCKET_NAME,
Key: objectKey,
Expires: 3600,
};
const command = new GetObjectCommand(getObjectParams);
const url = await getSignedUrl(s3, command, { expiresIn: 3600 });
return url;
};
export const uploadAudioStreamToS3 = async (audioStream: Buffer) => {
const remotePath = `${uuid()}.mp3`;
await s3.send(
new PutObjectCommand({
Bucket: AWS_S3_BUCKET_NAME,
Key: remotePath,
Body: audioStream,
ContentType: 'audio/mpeg',
})
);
return remotePath;
};
```
You can then call uploading function with the audio stream from the text.
```python Python
s3_file_name = upload_audiostream_to_s3(audio_stream)
```
```typescript TypeScript
const s3path = await uploadAudioStreamToS3(stream);
```
After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing.
You can now generate a URL from a file with:
```python Python
signed_url = generate_presigned_url(s3_file_name)
print(f"Signed URL to access the file: {signed_url}")
```
```typescript TypeScript
const presignedUrl = await generatePresignedUrl(s3path);
console.log('Presigned URL:', presignedUrl);
```
If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire.
To put it all together, you can use the following script:
```python main.py (Python)
import os
from dotenv import load_dotenv
load_dotenv()
from text_to_speech_stream import text_to_speech_stream
from s3_uploader import upload_audiostream_to_s3, generate_presigned_url
def main():
text = "This is James"
audio_stream = text_to_speech_stream(text)
s3_file_name = upload_audiostream_to_s3(audio_stream)
signed_url = generate_presigned_url(s3_file_name)
print(f"Signed URL to access the file: {signed_url}")
if __name__ == "__main__":
main()
```
```typescript index.ts (Typescript)
import 'dotenv/config';
import { generatePresignedUrl, uploadAudioStreamToS3 } from './s3_uploader';
import { createAudioFileFromText } from './text_to_speech_file';
import { createAudioStreamFromText } from './text_to_speech_stream';
(async () => {
// save the audio file to disk
const fileName = await createAudioFileFromText(
'Today, the sky is exceptionally clear, and the sun shines brightly.'
);
console.log('File name:', fileName);
// OR stream the audio, upload to S3, and get a presigned URL
const stream = await createAudioStreamFromText(
'Today, the sky is exceptionally clear, and the sun shines brightly.'
);
const s3path = await uploadAudioStreamToS3(stream);
const presignedUrl = await generatePresignedUrl(s3path);
console.log('Presigned URL:', presignedUrl);
})();
```
## Conclusion
You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically.
Here are some examples of what you could build with this.
1. **Educational Podcasts**: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting.
2. **Accessibility Features for Websites**: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning.
3. **Automated Customer Support Messages**: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails.
4. **Audio Books and Narration**: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading.
5. **Language Learning Tools**: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way.
For more details, visit the following to see the full project files which give a clear structure for setting up your application:
For Python: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python)
For TypeScript: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node)
If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).
# Stitching multiple requests
> Learn how to maintain voice prosody over multiple chunks/generations.
When converting a large body of text into audio, you may encounter abrupt changes in prosody from one chunk to another. This can be particularly noticeable when converting text that spans multiple paragraphs or sections. In order to maintain voice prosody over multiple chunks, you can use the Request Stitching feature.
This feature allows you to provide context on what has already been generated and what will be generated in the future, helping to maintain a consistent voice and prosody throughout the entire text.
Here's an example without Request Stitching:
And the same example with Request Stitching:
## How to use Request Stitching
Request Stitching is easiest when using the ElevenLabs SDKs.
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python
import os
from io import BytesIO
from elevenlabs.client import ElevenLabs
from elevenlabs import play
from dotenv import load_dotenv
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
paragraphs = [
"The advent of technology has transformed countless sectors, with education ",
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way ",
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, ",
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
]
request_ids = []
audio_buffers = []
for paragraph in paragraphs:
# Usually we get back a stream from the convert function, but with_raw_response is
# used to get the headers from the response
with elevenlabs.text_to_speech.with_raw_response.convert(
text=paragraph,
voice_id="T7QGPtToiqH4S8VlIkMJ",
model_id="eleven_multilingual_v2",
previous_request_ids=request_ids
) as response:
request_ids.append(response._response.headers.get("request-id"))
# response._response.headers also contains useful information like 'character-cost',
# which shows the cost of the generation in characters.
audio_data = b''.join(chunk for chunk in response.data)
audio_buffers.append(BytesIO(audio_data))
combined_stream = BytesIO(b''.join(buffer.getvalue() for buffer in audio_buffers))
play(combined_stream)
```
```typescript
import "dotenv/config";
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import { Readable } from "node:stream";
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
const paragraphs = [
"The advent of technology has transformed countless sectors, with education ",
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way ",
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, ",
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
];
const requestIds: string[] = [];
const audioBuffers: Buffer[] = [];
for (const paragraph of paragraphs) {
// Usually we get back a stream from the convert function, but withRawResponse() is
// used to get the headers from the response
const response = await elevenlabs.textToSpeech
.convert("T7QGPtToiqH4S8VlIkMJ", {
text: paragraph,
modelId: "eleven_multilingual_v2",
previousRequestIds: requestIds,
})
.withRawResponse();
// response.rawResponse.headers also contains useful information like 'character-cost',
// which shows the cost of the generation in characters.
requestIds.push(response.rawResponse.headers.get("request-id") ?? "");
// Convert stream to buffer
const chunks: Buffer[] = [];
for await (const chunk of response.data) {
chunks.push(Buffer.from(chunk));
}
audioBuffers.push(Buffer.concat(chunks));
}
// Create a single readable stream from all buffers
const combinedStream = Readable.from(Buffer.concat(audioBuffers));
play(combinedStream);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the combined stitched audio play.
## FAQ
In order to use the request IDs of a previous request for conditioning it needs to have processed completely. In case of streaming this means the audio has to be read completely from the response body.
The difference depends on the model, voice and voice settings used.
The request IDs should be no older than two hours.
Yes, unless you are an enterprise user with increased privacy requirements.
# Using pronunciation dictionaries
> Learn how to manage pronunciation dictionaries programmatically.
In this tutorial, you'll learn how to use a pronunciation dictionary with the ElevenLabs Python SDK. Pronunciation dictionaries are useful for controlling the specific pronunciation of words. We support both [IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and [CMU](https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary) alphabets. It is useful for correcting rare or specific pronunciations, such as names or companies. For example, the word `nginx` could be pronounced incorrectly. Instead, we can add our version of pronunciation. Based on IPA, `nginx` is pronounced as `/ˈɛndʒɪnˈɛks/`. Finding IPA or CMU of words manually can be difficult. Instead, LLMs like ChatGPT can help you to make the search easier.
We'll start by adding rules to the pronunciation dictionary from a file and comparing the text-to-speech results that use and do not use the dictionary. After that, we'll discuss how to add and remove specific rules to existing dictionaries.
If you want to jump straight to the finished repo you can find it [here](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/pronunciation-dictionaries/python)
Phoneme tags only work with `eleven_flash_v2`, `eleven_turbo_v2` & `eleven_monolingual_v1` models.
If you use phoneme tags with other models, they will silently skip the word.
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/docs/api-reference/text-to-speech#authentication)).
* Python installed on your machine
* FFMPEG to play audio
## Setup
### Installing our SDK
Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the updating pronunciation dictionary and using text-to-speech conversion. You can install it using pip:
```bash
pip install elevenlabs
```
Additionally, install `python-dotenv` to manage your environmental variables:
```bash
pip install python-dotenv
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Initiate the Client SDK
We'll start by initializing the client SDK.
```python
import os
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
```
## Create a Pronunciation Dictionary From a File
To create a pronunciation dictionary from a File, we'll create a `.pls` file for our rules.
This rule will use the "IPA" alphabet and update the pronunciation for `tomato` and `Tomato` with a different pronunciation. PLS files are case sensitive which is why we include it both with and without a capital "T". Save it as `dictionary.pls`.
```xml filename="dictionary.pls"
tomato
/tə'meɪtoʊ/
Tomato
/tə'meɪtoʊ/
```
In the following snippet, we start by adding rules from a file and get the uploaded result. Finally, we generate and play two different text-to-speech audio to compare the custom pronunciation dictionary.
```python
import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
with open("dictionary.pls", "rb") as f:
# this dictionary changes how tomato is pronounced
pronunciation_dictionary = elevenlabs.pronunciation_dictionaries.create_from_file(
file=f.read(), name="example"
)
audio_1 = elevenlabs.text_to_speech.convert(
text="Without the dictionary: tomato",
voice_id="21m00Tcm4TlvDq8ikWAM",
model_id="eleven_turbo_v2",
)
audio_2 = elevenlabs.text_to_speech.convert(
text="With the dictionary: tomato",
voice_id="21m00Tcm4TlvDq8ikWAM",
model_id="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary.id,
version_id=pronunciation_dictionary.version_id,
)
],
)
# play the audio
play(audio_1)
play(audio_2)
```
## Remove Rules From a Pronunciation Dictionary
To remove rules from a pronunciation dictionary, call the `remove` method in the pronunciation dictionary module. In the following snippet, we start by removing rules based on the rule string and get the updated result. Next, we generate and play another text-to-speech audio to test the difference. In the example, we get the pronunciation dictionary version ID from the `remove` method response as every change to a pronunciation dictionary will generate a new version.
```python
pronunciation_dictionary_rules_removed = (
elevenlabs.pronunciation_dictionaries.rules.remove(
pronunciation_dictionary_id=pronunciation_dictionary.id,
rule_strings=["tomato", "Tomato"],
)
)
audio_3 = elevenlabs.generate(
text="With the rule removed: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
version_id=pronunciation_dictionary_rules_removed.version_id,
)
],
)
play(audio_3)
```
## Add Rules to Pronunciation Dictionary
We can add rules directly to the pronunciation dictionary by calling the `add` method. Next, generate and play another text-to-speech audio to test the difference.
```python
from elevenlabs import PronunciationDictionaryRule_Phoneme
pronunciation_dictionary_rules_added = elevenlabs.pronunciation_dictionaries.rules.add(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
rules=[
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="tomato",
phoneme="/tə'meɪtoʊ/",
),
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="Tomato",
phoneme="/tə'meɪtoʊ/",
),
],
)
audio_4 = elevenlabs.generate(
text="With the rule added again: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
version_id=pronunciation_dictionary_rules_added.version_id,
)
],
)
play(audio_4)
```
## Conclusion
You know how to use a pronunciation dictionary for generating text-to-speech audio. These functionalities open up opportunities to generate Text to Speech audio based on your pronunciation dictionary, making it more flexible for your use case.
For more details, visit our [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/pronunciation-dictionaries/python) to see the full project files which give a clear structure for setting up your application:
* `env.example`: Template for your environment variables.
* `main.py`: The complete code for snippets above.
* `dictionary.pls`: Custom dictionary example with XML format.
* `requirements.txt`: List of python package used for this example.
If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).
# Streaming and Caching with Supabase
> Generate and stream speech through Supabase Edge Functions. Store speech in Supabase Storage and cache responses via built-in CDN.
## Introduction
In this tutorial you will learn how to build and edge API to generate, stream, store, and cache speech using Supabase Edge Functions, Supabase Storage, and ElevenLabs.
Find the [example project on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/supabase/stream-and-cache-storage).
## Requirements
* An ElevenLabs account with an [API key](/app/settings/api-keys).
* A [Supabase](https://supabase.com) account (you can sign up for a free account via [database.new](https://database.new)).
* The [Supabase CLI](https://supabase.com/docs/guides/local-development) installed on your machine.
* The [Deno runtime](https://docs.deno.com/runtime/getting_started/installation/) installed on your machine and optionally [setup in your facourite IDE](https://docs.deno.com/runtime/getting_started/setup_your_environment).
## Setup
### Create a Supabase project locally
After installing the [Supabase CLI](https://supabase.com/docs/guides/local-development), run the following command to create a new Supabase project locally:
```bash
supabase init
```
### Configure the storage bucket
You can configure the Supabase CLI to automatically generate a storage bucket by adding this configuration in the `config.toml` file:
```toml ./supabase/config.toml
[storage.buckets.audio]
public = false
file_size_limit = "50MiB"
allowed_mime_types = ["audio/mp3"]
objects_path = "./audio"
```
Upon running `supabase start` this will create a new storage bucket in your local Supabase
project. Should you want to push this to your hosted Supabase project, you can run `supabase seed
buckets --linked`.
### Configure background tasks for Supabase Edge Functions
To use background tasks in Supabase Edge Functions when developing locally, you need to add the following configuration in the `config.toml` file:
```toml ./supabase/config.toml
[edge_runtime]
policy = "per_worker"
```
When running with `per_worker` policy, Function won't auto-reload on edits. You will need to
manually restart it by running `supabase functions serve`.
### Create a Supabase Edge Function for Speech generation
Create a new Edge Function by running the following command:
```bash
supabase functions new text-to-speech
```
If you're using VS Code or Cursor, select `y` when the CLI prompts "Generate VS Code settings for Deno? \[y/N]"!
### Set up the environment variables
Within the `supabase/functions` directory, create a new `.env` file and add the following variables:
```env supabase/functions/.env
# Find / create an API key at https://elevenlabs.io/app/settings/api-keys
ELEVENLABS_API_KEY=your_api_key
```
### Dependencies
The project uses a couple of dependencies:
* The [@supabase/supabase-js](https://supabase.com/docs/reference/javascript) library to interact with the Supabase database.
* The ElevenLabs [JavaScript SDK](/docs/quickstart) to interact with the text-to-speech API.
* The open-source [object-hash](https://www.npmjs.com/package/object-hash) to generate a hash from the request parameters.
Since Supabase Edge Function uses the [Deno runtime](https://deno.land/), you don't need to install the dependencies, rather you can [import](https://docs.deno.com/examples/npm/) them via the `npm:` prefix.
## Code the Supabase Edge Function
In your newly created `supabase/functions/text-to-speech/index.ts` file, add the following code:
```ts supabase/functions/text-to-speech/index.ts
// Setup type definitions for built-in Supabase Runtime APIs
import 'jsr:@supabase/functions-js/edge-runtime.d.ts';
import { createClient } from 'jsr:@supabase/supabase-js@2';
import { ElevenLabsClient } from 'npm:elevenlabs';
import * as hash from 'npm:object-hash';
const supabase = createClient(
Deno.env.get('SUPABASE_URL')!,
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY')!
);
const elevenlabs = new ElevenLabsClient({
apiKey: Deno.env.get('ELEVENLABS_API_KEY'),
});
// Upload audio to Supabase Storage in a background task
async function uploadAudioToStorage(stream: ReadableStream, requestHash: string) {
const { data, error } = await supabase.storage
.from('audio')
.upload(`${requestHash}.mp3`, stream, {
contentType: 'audio/mp3',
});
console.log('Storage upload result', { data, error });
}
Deno.serve(async (req) => {
// To secure your function for production, you can for example validate the request origin,
// or append a user access token and validate it with Supabase Auth.
console.log('Request origin', req.headers.get('host'));
const url = new URL(req.url);
const params = new URLSearchParams(url.search);
const text = params.get('text');
const voiceId = params.get('voiceId') ?? 'JBFqnCBsd6RMkjVDRZzb';
const requestHash = hash.MD5({ text, voiceId });
console.log('Request hash', requestHash);
// Check storage for existing audio file
const { data } = await supabase.storage.from('audio').createSignedUrl(`${requestHash}.mp3`, 60);
if (data) {
console.log('Audio file found in storage', data);
const storageRes = await fetch(data.signedUrl);
if (storageRes.ok) return storageRes;
}
if (!text) {
return new Response(JSON.stringify({ error: 'Text parameter is required' }), {
status: 400,
headers: { 'Content-Type': 'application/json' },
});
}
try {
console.log('ElevenLabs API call');
const response = await elevenlabs.textToSpeech.stream(voiceId, {
output_format: 'mp3_44100_128',
model_id: 'eleven_multilingual_v2',
text,
});
const stream = new ReadableStream({
async start(controller) {
for await (const chunk of response) {
controller.enqueue(chunk);
}
controller.close();
},
});
// Branch stream to Supabase Storage
const [browserStream, storageStream] = stream.tee();
// Upload to Supabase Storage in the background
EdgeRuntime.waitUntil(uploadAudioToStorage(storageStream, requestHash));
// Return the streaming response immediately
return new Response(browserStream, {
headers: {
'Content-Type': 'audio/mpeg',
},
});
} catch (error) {
console.log('error', { error });
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { 'Content-Type': 'application/json' },
});
}
});
```
### Code deep dive
There's a couple of things worth noting about the code. Let's step through it step by step.
To handle the incoming request, use the `Deno.serve` handler. In the demo we don't validate the request origin, but you can for example validate the request origin, or append a user access token and validate it with [Supabase Auth](https://supabase.com/docs/guides/functions/auth).
From the incoming request, the function extracts the `text` and `voiceId` parameters. The `voiceId` parameter is optional and defaults to the ElevenLabs ID for the "Allison" voice.
Using the `object-hash` library, the function generates a hash from the request parameters. This hash is used to check for existing audio files in Supabase Storage.
```ts {1,5-8}
Deno.serve(async (req) => {
// To secure your function for production, you can for example validate the request origin,
// or append a user access token and validate it with Supabase Auth.
console.log("Request origin", req.headers.get("host"));
const url = new URL(req.url);
const params = new URLSearchParams(url.search);
const text = params.get("text");
const voiceId = params.get("voiceId") ?? "JBFqnCBsd6RMkjVDRZzb";
const requestHash = hash.MD5({ text, voiceId });
console.log("Request hash", requestHash);
// ...
})
```
Supabase Storage comes with a [smart CDN built-in](https://supabase.com/docs/guides/storage/cdn/smart-cdn) allowing you to easily cache and serve your files.
Here, the function checks for an existing audio file in Supabase Storage. If the file exists, the function returns the file from Supabase Storage.
```ts {4,9}
const { data } = await supabase
.storage
.from("audio")
.createSignedUrl(`${requestHash}.mp3`, 60);
if (data) {
console.log("Audio file found in storage", data);
const storageRes = await fetch(data.signedUrl);
if (storageRes.ok) return storageRes;
}
```
Using the streaming capabilities of the ElevenLabs API, the function generates a stream. The benefit here is that even for larger text, you can start streaming the audio back to your user immediately, and then upload the stream to Supabase Storage in the background.
This allows for the best possible user experience, making even large text blocks feel magically quick. The magic here happens on line 17, where the `stream.tee()` method branches the readablestream into two branches: one for the browser and one for Supabase Storage.
```ts {1,17,20,22-27}
try {
const response = await elevenlabs.textToSpeech.stream(voiceId, {
output_format: "mp3_44100_128",
model_id: "eleven_multilingual_v2",
text,
});
const stream = new ReadableStream({
async start(controller) {
for await (const chunk of response) {
controller.enqueue(chunk);
}
controller.close();
},
});
// Branch stream to Supabase Storage
const [browserStream, storageStream] = stream.tee();
// Upload to Supabase Storage in the background
EdgeRuntime.waitUntil(uploadAudioToStorage(storageStream, requestHash));
// Return the streaming response immediately
return new Response(browserStream, {
headers: {
"Content-Type": "audio/mpeg",
},
});
} catch (error) {
console.log("error", { error });
return new Response(JSON.stringify({ error: error.message }), {
status: 500,
headers: { "Content-Type": "application/json" },
});
}
```
The `EdgeRuntime.waitUntil` method on line 20 in the previous step is used to upload the audio stream to Supabase Storage in the background using the `uploadAudioToStorage` function. This allows the function to return the streaming response immediately to the browser, while the audio is being uploaded to Supabase Storage.
Once the storage object has been created, the next time your users makes a request with the same parameters, the function will return the audio file from the Supabase Storage CDN.
```ts {2,8-10}
// Upload audio to Supabase Storage in a background task
async function uploadAudioToStorage(
stream: ReadableStream,
requestHash: string,
) {
const { data, error } = await supabase.storage
.from("audio")
.upload(`${requestHash}.mp3`, stream, {
contentType: "audio/mp3",
});
console.log("Storage upload result", { data, error });
}
```
## Run locally
To run the function locally, run the following commands:
```bash
supabase start
```
Once the local Supabase stack is up and running, run the following command to start the function and observe the logs:
```bash
supabase functions serve
```
### Try it out
Navigate to `http://127.0.0.1:54321/functions/v1/text-to-speech?text=hello%20world` to hear the function in action.
Afterwards, navigate to `http://127.0.0.1:54323/project/default/storage/buckets/audio` to see the audio file in your local Supabase Storage bucket.
## Deploy to Supabase
If you haven't already, create a new Supabase account at [database.new](https://database.new) and link the local project to your Supabase account:
```bash
supabase link
```
Once done, run the following command to deploy the function:
```bash
supabase functions deploy
```
### Set the function secrets
Now that you have all your secrets set locally, you can run the following command to set the secrets in your Supabase project:
```bash
supabase secrets set --env-file supabase/functions/.env
```
## Test the function
The function is designed in a way that it can be used directly as a source for an `` element.
```html
```
You can find an example frontend implementation in the complete code example on [GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/supabase/stream-and-cache-storage/src/pages/Index.tsx).
# Sending generated audio through Twilio
> Learn how to integrate generated speech into phone calls with Twilio.
In this guide, you’ll learn how to send an AI generated message through a phone call using Twilio and ElevenLabs. This process allows you to send high-quality voice messages directly to your callers.
## Create accounts with Twilio and ngrok
We’ll be using Twilio and ngrok for this guide, so go ahead and create accounts with them.
* [twilio.com](https://www.twilio.com)
* [ngrok.com](https://ngrok.com)
## Get the code
If you want to get started quickly, you can get the entire code for this guide on [GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/twilio/call)
## Create the server with Express
### Initialize your project
Create a new folder for your project
```
mkdir elevenlabs-twilio
cd elevenlabs-twilio
npm init -y
```
### Install dependencies
```
npm install @elevenlabs/elevenlabs-js express express-ws twilio
```
### Install dev dependencies
```
npm i @types/node @types/express @types/express-ws @types/ws dotenv tsx typescript
```
### Create your files
```ts
// src/app.ts
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import 'dotenv/config';
import express, { Response } from 'express';
import ExpressWs from 'express-ws';
import { Readable } from 'stream';
import VoiceResponse from 'twilio/lib/twiml/VoiceResponse';
import { type WebSocket } from 'ws';
const app = ExpressWs(express()).app;
const PORT: number = parseInt(process.env.PORT || '5000');
const elevenlabs = new ElevenLabsClient();
const voiceId = '21m00Tcm4TlvDq8ikWAM';
const outputFormat = 'ulaw_8000';
const text = 'This is a test. You can now hang up. Thank you.';
function startApp() {
app.post('/call/incoming', (_, res: Response) => {
const twiml = new VoiceResponse();
twiml.connect().stream({
url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
});
res.writeHead(200, { 'Content-Type': 'text/xml' });
res.end(twiml.toString());
});
app.ws('/call/connection', (ws: WebSocket) => {
ws.on('message', async (data: string) => {
const message: {
event: string;
start?: { streamSid: string; callSid: string };
} = JSON.parse(data);
if (message.event === 'start' && message.start) {
const streamSid = message.start.streamSid;
const response = await elevenlabs.textToSpeech.convert(voiceId, {
modelId: 'eleven_flash_v2_5',
outputFormat: outputFormat,
text,
});
const readableStream = Readable.from(response);
const audioArrayBuffer = await streamToArrayBuffer(readableStream);
ws.send(
JSON.stringify({
streamSid,
event: 'media',
media: {
payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
},
})
);
}
});
ws.on('error', console.error);
});
app.listen(PORT, () => {
console.log(`Local: http://localhost:${PORT}`);
console.log(`Remote: https://${process.env.SERVER_DOMAIN}`);
});
}
function streamToArrayBuffer(readableStream: Readable) {
return new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
readableStream.on('data', (chunk) => {
chunks.push(chunk);
});
readableStream.on('end', () => {
resolve(Buffer.concat(chunks).buffer);
});
readableStream.on('error', reject);
});
}
startApp();
```
```env
# .env
SERVER_DOMAIN=
ELEVENLABS_API_KEY=
```
## Understanding the code
### Handling the incoming call
When you call your number, Twilio makes a POST request to your endpoint at `/call/incoming`.
We then use twiml.connect to tell Twilio that we want to handle the call via our websocket by setting the url to our `/call/connection` endpoint.
```ts
function startApp() {
app.post('/call/incoming', (_, res: Response) => {
const twiml = new VoiceResponse();
twiml.connect().stream({
url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
});
res.writeHead(200, { 'Content-Type': 'text/xml' });
res.end(twiml.toString());
});
```
### Creating the text to speech
Here we listen for messages that Twilio sends to our websocket endpoint. When we receive a `start` message event, we generate audio using the ElevenLabs [TypeScript SDK](https://github.com/elevenlabs/elevenlabs-js).
```ts
app.ws('/call/connection', (ws: WebSocket) => {
ws.on('message', async (data: string) => {
const message: {
event: string;
start?: { streamSid: string; callSid: string };
} = JSON.parse(data);
if (message.event === 'start' && message.start) {
const streamSid = message.start.streamSid;
const response = await elevenlabs.textToSpeech.convert(voiceId, {
modelId: 'eleven_flash_v2_5',
outputFormat: outputFormat,
text,
});
```
### Sending the message
Upon receiving the audio back from ElevenLabs, we convert it to an array buffer and send the audio to Twilio via the websocket.
```ts
const readableStream = Readable.from(response);
const audioArrayBuffer = await streamToArrayBuffer(readableStream);
ws.send(
JSON.stringify({
streamSid,
event: 'media',
media: {
payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
},
})
);
```
## Point ngrok to your application
Twilio requires a publicly accessible URL. We’ll use ngrok to forward the local port of our application and expose it as a public URL.
Run the following command in your terminal:
```
ngrok http 5000
```
Copy the ngrok domain (without https\://) to use in your environment variables.
## Update your environment variables
Update the `.env` file with your ngrok domain and ElevenLabs API key.
```
# .env
SERVER_DOMAIN=*******.ngrok.app
ELEVENLABS_API_KEY=*************************
```
## Start the application
Run the following command to start the app:
```
npm run dev
```
## Set up Twilio
Follow Twilio’s guides to create a new number. Once you’ve created your number, navigate to the “Configure” tab in Phone Numbers -> Manage -> Active numbers
In the “A call comes in” section, enter the full URL to your application (make sure to add the`/call/incoming` path):
E.g. https\://**\*\*\***ngrok.app/call/incoming
## Make a phone call
Make a call to your number. You should hear a message using the ElevenLabs voice.
## Tips for deploying to production
When running the application in production, make sure to set the `SERVER_DOMAIN` environment variable to that of your server. Be sure to also update the URL in Twilio to point to your production server.
## Conclusion
You should now have a basic understanding of integrating Twilio with ElevenLabs voices. If you have any further questions, or suggestions on how to improve this blog post, please feel free to select the “Suggest edits” or “Raise issue” button below.
# Text to Dialogue quickstart
> Learn how to generate immersive dialogue from text.
Eleven v3 API access is currently not publicly available, but will be soon. To request access,
please [contact our sales team](https://elevenlabs.io/contact-sales).
This guide will show you how to generate immersive, natural-sounding dialogue from text using the Text to Dialogue API.
## Using the Text to Dialogue API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio = elevenlabs.text_to_dialogue.convert(
# Text to Dialogue defaults to using the evergreen model "eleven_v3",
# but you can use a preview version to try out
# the latest features by providing the model ID
# model_id="eleven_v3_preview_2025_06_03"
inputs=[
{
"text": "[cheerfully] Hello, how are you?",
"voice_id": "9BWtsMINqrJLrRacOk9x",
},
{
"text": "[stuttering] I'm... I'm doing well, thank you",
"voice_id": "IKne3meq5aSn9XLyUdCD",
}
]
)
play(audio)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const audio = await elevenlabs.textToDialogue.convert({
// Text to Dialogue defaults to using the evergreen model "eleven_v3",
// but you can use a preview version to try out
// the latest features by providing the model ID
// modelId="eleven_v3_preview_2025_06_03"
inputs: [
{
text: "[cheerfully] Hello, how are you?",
voiceId: "9BWtsMINqrJLrRacOk9x",
},
{
text: "[stuttering] I'm... I'm doing well, thank you",
voiceId: "IKne3meq5aSn9XLyUdCD",
},
],
});
play(audio);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the dialogue audio play.
## Next steps
Explore the [API reference](/docs/api-reference/text-to-dialogue/convert) for more information on the Text to Dialogue API and its options.
# Speech to Text quickstart
> Learn how to convert spoken audio into text.
This guide will show you how to convert spoken audio into text using the Speech to Text API.
## Using the Speech to Text API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from dotenv import load_dotenv
from io import BytesIO
import requests
from elevenlabs.client import ElevenLabs
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio_url = (
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
)
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
transcription = elevenlabs.speech_to_text.convert(
file=audio_data,
model_id="scribe_v1", # Model to use, for now only "scribe_v1" is supported
tag_audio_events=True, # Tag audio events like laughter, applause, etc.
language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.
diarize=True, # Whether to annotate who is speaking
)
print(transcription)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const response = await fetch(
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
);
const audioBlob = new Blob([await response.arrayBuffer()], { type: "audio/mp3" });
const transcription = await elevenlabs.speechToText.convert({
file: audioBlob,
modelId: "scribe_v1", // Model to use, for now only "scribe_v1" is supported.
tagAudioEvents: true, // Tag audio events like laughter, applause, etc.
languageCode: "eng", // Language of the audio file. If set to null, the model will detect the language automatically.
diarize: true, // Whether to annotate who is speaking
});
console.log(transcription);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should see the transcription of the audio file printed to the console.
## Next steps
Explore the [API reference](/docs/api-reference/speech-to-text/convert) for more information on the Speech to Text API and its options.
# Transcription Telegram Bot
> Build a Telegram bot that transcribes audio and video messages in 99 languages using TypeScript with Deno in Supabase Edge Functions.
## Introduction
In this tutorial you will learn how to build a Telegram bot that transcribes audio and video messages in 99 languages using TypeScript and the ElevenLabs Scribe model via the speech-to-text API.
To check out what the end result will look like, you can test out the [t.me/ElevenLabsScribeBot](https://t.me/ElevenLabsScribeBot)
Find the [example project on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/speech-to-text/telegram-transcription-bot).
## Requirements
* An ElevenLabs account with an [API key](/app/settings/api-keys).
* A [Supabase](https://supabase.com) account (you can sign up for a free account via [database.new](https://database.new)).
* The [Supabase CLI](https://supabase.com/docs/guides/local-development) installed on your machine.
* The [Deno runtime](https://docs.deno.com/runtime/getting_started/installation/) installed on your machine and optionally [setup in your facourite IDE](https://docs.deno.com/runtime/getting_started/setup_your_environment).
* A [Telegram](https://telegram.org) account.
## Setup
### Register a Telegram bot
Use the [BotFather](https://t.me/BotFather) to create a new Telegram bot. Run the `/newbot` command and follow the instructions to create a new bot. At the end, you will receive your secret bot token. Note it down securely for the next step.

### Create a Supabase project locally
After installing the [Supabase CLI](https://supabase.com/docs/guides/local-development), run the following command to create a new Supabase project locally:
```bash
supabase init
```
### Create a database table to log the transcription results
Next, create a new database table to log the transcription results:
```bash
supabase migrations new init
```
This will create a new migration file in the `supabase/migrations` directory. Open the file and add the following SQL:
```sql supabase/migrations/init.sql
CREATE TABLE IF NOT EXISTS transcription_logs (
id BIGSERIAL PRIMARY KEY,
file_type VARCHAR NOT NULL,
duration INTEGER NOT NULL,
chat_id BIGINT NOT NULL,
message_id BIGINT NOT NULL,
username VARCHAR,
transcript TEXT,
language_code VARCHAR,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
error TEXT
);
ALTER TABLE transcription_logs ENABLE ROW LEVEL SECURITY;
```
### Create a Supabase Edge Function to handle Telegram webhook requests
Next, create a new Edge Function to handle Telegram webhook requests:
```bash
supabase functions new scribe-bot
```
If you're using VS Code or Cursor, select `y` when the CLI prompts "Generate VS Code settings for Deno? \[y/N]"!
### Set up the environment variables
Within the `supabase/functions` directory, create a new `.env` file and add the following variables:
```env supabase/functions/.env
# Find / create an API key at https://elevenlabs.io/app/settings/api-keys
ELEVENLABS_API_KEY=your_api_key
# The bot token you received from the BotFather.
TELEGRAM_BOT_TOKEN=your_bot_token
# A random secret chosen by you to secure the function.
FUNCTION_SECRET=random_secret
```
### Dependencies
The project uses a couple of dependencies:
* The open-source [grammY Framework](https://grammy.dev/) to handle the Telegram webhook requests.
* The [@supabase/supabase-js](https://supabase.com/docs/reference/javascript) library to interact with the Supabase database.
* The ElevenLabs [JavaScript SDK](/docs/quickstart) to interact with the speech-to-text API.
Since Supabase Edge Function uses the [Deno runtime](https://deno.land/), you don't need to install the dependencies, rather you can [import](https://docs.deno.com/examples/npm/) them via the `npm:` prefix.
## Code the Telegram Bot
In your newly created `scribe-bot/index.ts` file, add the following code:
```ts supabase/functions/scribe-bot/index.ts
import { Bot, webhookCallback } from 'https://deno.land/x/grammy@v1.34.0/mod.ts';
import 'jsr:@supabase/functions-js/edge-runtime.d.ts';
import { createClient } from 'jsr:@supabase/supabase-js@2';
import { ElevenLabsClient } from 'npm:elevenlabs@1.50.5';
console.log(`Function "elevenlabs-scribe-bot" up and running!`);
const elevenlabs = new ElevenLabsClient({
apiKey: Deno.env.get('ELEVENLABS_API_KEY') || '',
});
const supabase = createClient(
Deno.env.get('SUPABASE_URL') || '',
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') || ''
);
async function scribe({
fileURL,
fileType,
duration,
chatId,
messageId,
username,
}: {
fileURL: string;
fileType: string;
duration: number;
chatId: number;
messageId: number;
username: string;
}) {
let transcript: string | null = null;
let languageCode: string | null = null;
let errorMsg: string | null = null;
try {
const sourceFileArrayBuffer = await fetch(fileURL).then((res) => res.arrayBuffer());
const sourceBlob = new Blob([sourceFileArrayBuffer], {
type: fileType,
});
const scribeResult = await elevenlabs.speechToText.convert({
file: sourceBlob,
model_id: 'scribe_v1', // 'scribe_v1_experimental' is also available for new, experimental features
tag_audio_events: false,
});
transcript = scribeResult.text;
languageCode = scribeResult.language_code;
// Reply to the user with the transcript
await bot.api.sendMessage(chatId, transcript, {
reply_parameters: { message_id: messageId },
});
} catch (error) {
errorMsg = error.message;
console.log(errorMsg);
await bot.api.sendMessage(chatId, 'Sorry, there was an error. Please try again.', {
reply_parameters: { message_id: messageId },
});
}
// Write log to Supabase.
const logLine = {
file_type: fileType,
duration,
chat_id: chatId,
message_id: messageId,
username,
language_code: languageCode,
error: errorMsg,
};
console.log({ logLine });
await supabase.from('transcription_logs').insert({ ...logLine, transcript });
}
const telegramBotToken = Deno.env.get('TELEGRAM_BOT_TOKEN');
const bot = new Bot(telegramBotToken || '');
const startMessage = `Welcome to the ElevenLabs Scribe Bot\\! I can transcribe speech in 99 languages with super high accuracy\\!
\nTry it out by sending or forwarding me a voice message, video, or audio file\\!
\n[Learn more about Scribe](https://elevenlabs.io/speech-to-text) or [build your own bot](https://elevenlabs.io/docs/cookbooks/speech-to-text/telegram-bot)\\!
`;
bot.command('start', (ctx) => ctx.reply(startMessage.trim(), { parse_mode: 'MarkdownV2' }));
bot.on([':voice', ':audio', ':video'], async (ctx) => {
try {
const file = await ctx.getFile();
const fileURL = `https://api.telegram.org/file/bot${telegramBotToken}/${file.file_path}`;
const fileMeta = ctx.message?.video ?? ctx.message?.voice ?? ctx.message?.audio;
if (!fileMeta) {
return ctx.reply('No video|audio|voice metadata found. Please try again.');
}
// Run the transcription in the background.
EdgeRuntime.waitUntil(
scribe({
fileURL,
fileType: fileMeta.mime_type!,
duration: fileMeta.duration,
chatId: ctx.chat.id,
messageId: ctx.message?.message_id!,
username: ctx.from?.username || '',
})
);
// Reply to the user immediately to let them know we received their file.
return ctx.reply('Received. Scribing...');
} catch (error) {
console.error(error);
return ctx.reply(
'Sorry, there was an error getting the file. Please try again with a smaller file!'
);
}
});
const handleUpdate = webhookCallback(bot, 'std/http');
Deno.serve(async (req) => {
try {
const url = new URL(req.url);
if (url.searchParams.get('secret') !== Deno.env.get('FUNCTION_SECRET')) {
return new Response('not allowed', { status: 405 });
}
return await handleUpdate(req);
} catch (err) {
console.error(err);
}
});
```
### Code deep dive
There's a couple of things worth noting about the code. Let's step through it step by step.
To handle the incoming request, use the `Deno.serve` handler. The handler checks whether the request has the correct secret and then passes the request to the `handleUpdate` function.
```ts {1,6,10}
const handleUpdate = webhookCallback(bot, 'std/http');
Deno.serve(async (req) => {
try {
const url = new URL(req.url);
if (url.searchParams.get('secret') !== Deno.env.get('FUNCTION_SECRET')) {
return new Response('not allowed', { status: 405 });
}
return await handleUpdate(req);
} catch (err) {
console.error(err);
}
});
```
The grammY frameworks provides a convenient way to [filter](https://grammy.dev/guide/filter-queries#combining-multiple-queries) for specific message types. In this case, the bot is listening for voice, audio, and video messages.
Using the request context, the bot extracts the file metadata and then uses [Supabase Background Tasks](https://supabase.com/docs/guides/functions/background-tasks) `EdgeRuntime.waitUntil` to run the transcription in the background.
This way you can provide an immediate response to the user and handle the transcription of the file in the background.
```ts {1,3,12,24}
bot.on([':voice', ':audio', ':video'], async (ctx) => {
try {
const file = await ctx.getFile();
const fileURL = `https://api.telegram.org/file/bot${telegramBotToken}/${file.file_path}`;
const fileMeta = ctx.message?.video ?? ctx.message?.voice ?? ctx.message?.audio;
if (!fileMeta) {
return ctx.reply('No video|audio|voice metadata found. Please try again.');
}
// Run the transcription in the background.
EdgeRuntime.waitUntil(
scribe({
fileURL,
fileType: fileMeta.mime_type!,
duration: fileMeta.duration,
chatId: ctx.chat.id,
messageId: ctx.message?.message_id!,
username: ctx.from?.username || '',
})
);
// Reply to the user immediately to let them know we received their file.
return ctx.reply('Received. Scribing...');
} catch (error) {
console.error(error);
return ctx.reply(
'Sorry, there was an error getting the file. Please try again with a smaller file!'
);
}
});
```
Finally, in the background worker, the bot uses the ElevenLabs JavaScript SDK to transcribe the file. Once the transcription is complete, the bot replies to the user with the transcript and writes a log entry to the Supabase database using [supabase-js](https://supabase.com/docs/reference/javascript).
```ts {29-38,43-46,54-65}
const elevenlabs = new ElevenLabsClient({
apiKey: Deno.env.get('ELEVENLABS_API_KEY') || '',
});
const supabase = createClient(
Deno.env.get('SUPABASE_URL') || '',
Deno.env.get('SUPABASE_SERVICE_ROLE_KEY') || ''
);
async function scribe({
fileURL,
fileType,
duration,
chatId,
messageId,
username,
}: {
fileURL: string;
fileType: string;
duration: number;
chatId: number;
messageId: number;
username: string;
}) {
let transcript: string | null = null;
let languageCode: string | null = null;
let errorMsg: string | null = null;
try {
const sourceFileArrayBuffer = await fetch(fileURL).then((res) => res.arrayBuffer());
const sourceBlob = new Blob([sourceFileArrayBuffer], {
type: fileType,
});
const scribeResult = await elevenlabs.speechToText.convert({
file: sourceBlob,
model_id: 'scribe_v1', // 'scribe_v1_experimental' is also available for new, experimental features
tag_audio_events: false,
});
transcript = scribeResult.text;
languageCode = scribeResult.language_code;
// Reply to the user with the transcript
await bot.api.sendMessage(chatId, transcript, {
reply_parameters: { message_id: messageId },
});
} catch (error) {
errorMsg = error.message;
console.log(errorMsg);
await bot.api.sendMessage(chatId, 'Sorry, there was an error. Please try again.', {
reply_parameters: { message_id: messageId },
});
}
// Write log to Supabase.
const logLine = {
file_type: fileType,
duration,
chat_id: chatId,
message_id: messageId,
username,
language_code: languageCode,
error: errorMsg,
};
console.log({ logLine });
await supabase.from('transcription_logs').insert({ ...logLine, transcript });
}
```
## Deploy to Supabase
If you haven't already, create a new Supabase account at [database.new](https://database.new) and link the local project to your Supabase account:
```bash
supabase link
```
### Apply the database migrations
Run the following command to apply the database migrations from the `supabase/migrations` directory:
```bash
supabase db push
```
Navigate to the [table editor](https://supabase.com/dashboard/project/_/editor) in your Supabase dashboard and you should see and empty `transcription_logs` table.

Lastly, run the following command to deploy the Edge Function:
```bash
supabase functions deploy --no-verify-jwt scribe-bot
```
Navigate to the [Edge Functions view](https://supabase.com/dashboard/project/_/functions) in your Supabase dashboard and you should see the `scribe-bot` function deployed. Make a note of the function URL as you'll need it later, it should look something like `https://.functions.supabase.co/scribe-bot`.

### Set up the webhook
Set your bot's webhook url to `https://.functions.supabase.co/telegram-bot` (Replacing `<...>` with respective values). In order to do that, simply run a GET request to the following url (in your browser, for example):
```
https://api.telegram.org/bot/setWebhook?url=https://.supabase.co/functions/v1/scribe-bot?secret=
```
Note that the `FUNCTION_SECRET` is the secret you set in your `.env` file.

### Set the function secrets
Now that you have all your secrets set locally, you can run the following command to set the secrets in your Supabase project:
```bash
supabase secrets set --env-file supabase/functions/.env
```
## Test the bot
Finally you can test the bot by sending it a voice message, audio or video file.

After you see the transcript as a reply, navigate back to your table editor in the Supabase dashboard and you should see a new row in your `transcription_logs` table.

# Asynchronous Speech to Text
> Learn how to use webhooks to receive asynchronous notifications when your transcription tasks are completed.
## Overview
Webhooks allow you to receive automatic notifications when your Speech to Text transcription tasks are completed, eliminating the need to continuously poll the API for status updates. This is particularly useful for long-running transcription jobs or when processing large volumes of audio files.
When a transcription is completed, ElevenLabs will send a POST request to your specified webhook URL with the transcription results, including the transcript text, language detection, and any metadata.
## Using webhooks
First, create a webhook in the ElevenLabs dashboard. Navigate to your [webhooks settings](https://elevenlabs.io/app/settings/webhooks) and click "Create Webhook".
Configure your webhook with:
* **Name**: A descriptive name for your webhook
* **Callback URL**: Your publicly accessible HTTPS endpoint
* **Webhook Auth Method**: Either `HMAC` or `OAuth`. It is up to the client to implement the verification mechanism. ElevenLabs sends headers that allow for verification but we do not enforce it.
Once created, you can associate the webhook with your speech-to-text tasks. In the dashboard, navigate to the webhook events section and link your webhooks to speech-to-text events.
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
When making speech-to-text API calls, include the `webhook` parameter set to `true` to enable webhook notifications for that specific request.
```python maxLines=0
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
def transcribe_with_webhook(audio_file):
try:
result = elevenlabs.speech_to_text.convert(
file=audio_file,
model_id="scribe_v1",
webhook=True,
)
print(f"Transcription started: {result.task_id}")
return result
except Exception as e:
print(f"Error starting transcription: {e}")
raise e
```
```typescript maxLines=0
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
async function transcribeWithWebhook(audioFile) {
try {
const result = await elevenlabs.speechToText.convert({
file: audioFile,
modelId: 'scribe_v1',
webhook: true,
});
console.log('Transcription started:', result.taskId);
return result;
} catch (error) {
console.error('Error starting transcription:', error);
throw error;
}
}
```
## Webhook payload
When a transcription is completed, your webhook endpoint will receive a POST request with the identical payload as the non-webhook API:
```json
{
"language_code": "en",
"language_probability": 0.98,
"text": "Hello world!",
"words": [
{
"text": "Hello",
"start": 0.0,
"end": 0.5,
"type": "word",
"speaker_id": "speaker_1"
},
{
"text": " ",
"start": 0.5,
"end": 0.5,
"type": "spacing",
"speaker_id": "speaker_1"
},
{
"text": "world!",
"start": 0.5,
"end": 1.2,
"type": "word",
"speaker_id": "speaker_1"
}
]
}
```
Please refer to the [Speech-to-text API](/docs/api-reference/speech-to-text) reference to learn about the details of the response structure.
## Implementing your webhook endpoint
Here's an example of how to implement a webhook endpoint to handle incoming notifications:
```javascript
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import 'dotenv/config';
import express from 'express';
const elevenlabs = new ElevenLabsClient();
const app = express();
app.use(express.json());
const WEBHOOK_SECRET = process.env.WEBHOOK_SECRET;
app.post('/webhook/speech-to-text', (req, res) => {
try {
const signature = req.headers['x-elevenlabs-signature'];
const payload = JSON.stringify(req.body);
let event;
try {
// Verify the webhook signature.
event = elevenlabs.webhooks.constructEvent(payload, signature, WEBHOOK_SECRET);
} catch (error) {
return res.status(401).json({ error: 'Invalid signature' });
}
if (event.type === 'speech_to_text.completed') {
const { task_id, status, text, language_code } = event.data;
console.log(`Transcription ${task_id} completed`);
console.log(`Language: ${language_code}`);
console.log(`Text: ${text}`);
processTranscription(task_id, text, language_code);
} else if (status === 'failed') {
console.error(`Transcription ${task_id} failed`);
handleTranscriptionError(task_id);
}
res.status(200).json({ received: true });
} catch (error) {
console.error('Webhook error:', error);
res.status(500).json({ error: 'Internal server error' });
}
});
async function processTranscription(taskId, text, language) {
console.log('Processing completed transcription...');
}
async function handleTranscriptionError(taskId) {
console.log('Handling transcription error...');
}
app.listen(3000, () => {
console.log('Webhook server listening on port 3000');
});
```
## Security considerations
### Signature verification
Always verify webhook signatures to ensure requests came from ElevenLabs.
### HTTPS requirement
Webhook URLs must use HTTPS to ensure secure transmission of transcription data.
### Rate limiting
Implement rate limiting on your webhook endpoint to prevent abuse:
```javascript
import rateLimit from 'express-rate-limit';
const webhookLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100, // limit each IP to 100 requests per windowMs
message: 'Too many webhook requests from this IP',
});
app.use('/webhook', webhookLimiter);
```
### Failure responses
Return appropriate HTTP status codes:
* `200-299`: Success - webhook processed successfully
* `400-499`: Client error - webhook will not be retried
* `500-599`: Server error - webhook will be retried
## Testing webhooks
### Local development
For local testing, use tools like [ngrok](https://ngrok.com/) to expose your local server:
```bash
ngrok http 3000
```
Use the provided HTTPS URL as your webhook endpoint during development.
### Webhook testing
You can test your webhook implementation by making a transcription request and monitoring your endpoint:
```javascript
async function testWebhook() {
const audioFile = new File([audioBuffer], 'test.mp3', { type: 'audio/mp3' });
const result = await elevenlabs.speechToText.convert({
file: audioFile,
modelId: 'scribe_v1',
webhook: true,
});
console.log('Test transcription started:', result.task_id);
}
```
# Vercel AI SDK
> Use the ElevenLabs Provider in the Vercel AI SDK to transcribe speech from audio and video files.
# ElevenLabs Provider
The [ElevenLabs provider](https://ai-sdk.dev/providers/ai-sdk-providers/elevenlabs) provides support for the [ElevenLabs transcription API](https://elevenlabs.io/speech-to-text).
## Setup
The ElevenLabs provider is available in the `@ai-sdk/elevenlabs` module. You can install it with npm:
```npm
npm install @ai-sdk/elevenlabs
```
## Provider Instance
You can import the default provider instance `elevenlabs` from `@ai-sdk/elevenlabs`:
```ts
import { elevenlabs } from '@ai-sdk/elevenlabs';
```
If you need a customized setup, you can import `createElevenLabs` from `@ai-sdk/elevenlabs` and create a provider instance with your settings:
```ts
import { createElevenLabs } from '@ai-sdk/elevenlabs';
const elevenlabs = createElevenLabs({
// custom settings, e.g.
fetch: customFetch,
});
```
You can use the following optional settings to customize the ElevenLabs provider instance:
* **apiKey** *string*
API key that is being sent using the `Authorization` header.
It defaults to the `ELEVENLABS_API_KEY` environment variable.
* **headers** *Record\*
Custom headers to include in the requests.
* **fetch** *(input: RequestInfo, init?: RequestInit) => Promise\*
Custom [fetch](https://developer.mozilla.org/en-US/docs/Web/API/fetch) implementation.
Defaults to the global `fetch` function.
You can use it as a middleware to intercept requests,
or to provide a custom fetch implementation for e.g. testing.
## Transcription Models
You can create models that call the [ElevenLabs transcription API](https://elevenlabs.io/speech-to-text)
using the `.transcription()` factory method.
The first argument is the model id e.g. `scribe_v1`.
```ts
const model = elevenlabs.transcription('scribe_v1');
```
You can also pass additional provider-specific options using the `providerOptions` argument. For example, supplying the input language in ISO-639-1 (e.g. `en`) format can sometimes improve transcription performance if known beforehand.
```ts {7}
import { elevenlabs } from '@ai-sdk/elevenlabs';
import { experimental_transcribe as transcribe } from 'ai';
const result = await transcribe({
model: elevenlabs.transcription('scribe_v1'),
audio: new Uint8Array([1, 2, 3, 4]),
providerOptions: { elevenlabs: { languageCode: 'en' } },
});
```
The following provider options are available:
* **languageCode** *string*
An ISO-639-1 or ISO-639-3 language code corresponding to the language of the audio file.
Can sometimes improve transcription performance if known beforehand.
Defaults to `null`, in which case the language is predicted automatically.
* **tagAudioEvents** *boolean*
Whether to tag audio events like (laughter), (footsteps), etc. in the transcription.
Defaults to `true`.
* **numSpeakers** *integer*
The maximum amount of speakers talking in the uploaded file.
Can help with predicting who speaks when.
The maximum amount of speakers that can be predicted is 32.
Defaults to `null`, in which case the amount of speakers is set to the maximum value the model supports.
* **timestampsGranularity** *enum*
The granularity of the timestamps in the transcription.
Defaults to `'word'`.
Allowed values: `'none'`, `'word'`, `'character'`.
* **diarize** *boolean*
Whether to annotate which speaker is currently talking in the uploaded file.
Defaults to `true`.
* **fileFormat** *enum*
The format of input audio.
Defaults to `'other'`.
Allowed values: `'pcm_s16le_16'`, `'other'`.
For `'pcm_s16le_16'`, the input audio must be 16-bit PCM at a 16kHz sample rate, single channel (mono), and little-endian byte order.
Latency will be lower than with passing an encoded waveform.
# Voice Changer quickstart
> Learn how to transform the voice of an audio file using the Voice Changer API.
This guide will show you how to transform the voice of an audio file using the Voice Changer API.
## Using the Voice Changer API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
import requests
from io import BytesIO
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
voice_id = "JBFqnCBsd6RMkjVDRZzb"
audio_url = (
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
)
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
audio_stream = elevenlabs.speech_to_speech.convert(
voice_id=voice_id,
audio=audio_data,
model_id="eleven_multilingual_sts_v2",
output_format="mp3_44100_128",
)
play(audio_stream)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const voiceId = "JBFqnCBsd6RMkjVDRZzb";
const response = await fetch(
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
);
const audioBlob = new Blob([await response.arrayBuffer()], { type: "audio/mp3" });
const audioStream = await elevenlabs.speechToSpeech.convert(voiceId, {
audio: audioBlob,
modelId: "eleven_multilingual_sts_v2",
outputFormat: "mp3_44100_128",
});
await play(audioStream);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the transformed voice playing through your speakers.
## Next steps
Explore the [API reference](/docs/api-reference/speech-to-speech/convert) for more information on the Voice Changer API and its options.
# Voice Isolator quickstart
> Learn how to remove background noise from an audio file using the Voice Isolator API.
This guide will show you how to remove background noise from an audio file using the Voice Isolator API.
## Using the Voice Isolator API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
import requests
from io import BytesIO
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio_url = "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/fin.mp3"
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
audio_stream = elevenlabs.audio_isolation.convert(audio=audio_data)
play(audio_stream)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const audioUrl =
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/fin.mp3";
const response = await fetch(audioUrl);
const audioBlob = new Blob([await response.arrayBuffer()], {
type: "audio/mp3",
});
const audioStream = await elevenlabs.audioIsolation.convert({
audio: audioBlob,
});
await play(audioStream);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the isolated voice playing through your speakers.
## Next steps
Explore the [API reference](/docs/api-reference/audio-isolation/audio-isolation) for more information on the Voice Changer API and its options.
# Dubbing quickstart
> Learn how to dub audio and video files across languages using the Dubbing API.
This guide will show you how to dub an audio file across languages. In this example we'll dub an audio file from English to Spanish.
## Using the Dubbing API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
import requests
from io import BytesIO
import time
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
target_lang = "es" # Spanish
audio_url = (
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
)
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
audio_data.name = "audio.mp3"
# Start dubbing
dubbed = elevenlabs.dubbing.create(
file=audio_data, target_lang=target_lang
)
while True:
status = elevenlabs.dubbing.get(dubbed.dubbing_id).status
if status == "dubbed":
dubbed_file = elevenlabs.dubbing.audio.get(dubbed.dubbing_id, target_lang)
play(dubbed_file)
break
else:
print("Audio is still being dubbed...")
time.sleep(5)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const targetLang = "es"; // spanish
const sourceAudio = await fetch(
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
);
const audioBlob = new Blob([await sourceAudio.arrayBuffer()], {
type: "audio/mp3",
});
// Start dubbing
const dubbed = await elevenlabs.dubbing.create({
file: audioBlob,
targetLang: targetLang,
});
while (true) {
const { status } = await elevenlabs.dubbing.get(
dubbed.dubbingId
);
if (status === "dubbed") {
const dubbedFile = await elevenlabs.dubbing.audio.get(
dubbed.dubbingId,
targetLang
);
await play(dubbedFile);
break;
} else {
console.log("Audio is still being dubbed...");
}
// Wait 5 seconds between checks
await new Promise((resolve) => setTimeout(resolve, 5000));
}
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the dubbed audio file playing through your speakers.
## Next steps
Explore the [API reference](/docs/api-reference/dubbing/create) for more information on the Dubbing API and its options.
# Sound Effects quickstart
> Learn how to generate sound effects using the Sound Effects API.
This guide will show you how to generate sound effects using the Sound Effects API.
## Using the Sound Effects API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio = elevenlabs.text_to_sound_effects.convert(text="Cinematic Braam, Horror")
play(audio)
```
```typescript
// example.mts
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const audio = await elevenlabs.textToSoundEffects.convert({
text: "Cinematic Braam, Horror",
});
await play(audio);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear your generated sound effect playing through your speakers.
## Next steps
Explore the [API reference](/docs/api-reference/speech-to-text/convert) for more information on the Speech to Text API and its options.
# Instant Voice Cloning
> Learn how to clone a voice using the Clone Voice API.
This guide will show you how to create an Instant Voice Clone using the Clone Voice API. To create an Instant Voice Clone via the dashboard, refer to the [Instant Voice Clone](/docs/product-guides/voices/voice-cloning/instant-voice-cloning) product guide.
For an outline of the differences between Instant Voice Clones and Professional Voice Clones, refer to the [Voices capability](/docs/capabilities/voices) guide.
## Using the Instant Voice Clone API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from io import BytesIO
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
voice = elevenlabs.voices.ivc.create(
name="My Voice Clone",
# Replace with the paths to your audio files.
# The more files you add, the better the clone will be.
files=[BytesIO(open("/path/to/your/audio/file.mp3", "rb").read())]
)
print(voice.voice_id)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
import fs from "node:fs";
const elevenlabs = new ElevenLabsClient();
const voice = await elevenlabs.voices.ivc.create({
name: "My Voice Clone",
// Replace with the paths to your audio files.
// The more files you add, the better the clone will be.
files: [
fs.createReadStream(
"/path/to/your/audio/file.mp3",
),
],
});
console.log(voice.voice_id);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should see the voice ID printed to the console.
## Next steps
Explore the [API reference](/docs/api-reference/voices/ivc/create) for more information on creating a voice clone.
# Professional Voice Cloning
> Learn how to clone a voice using the Clone Voice API.
This guide will show you how to create a Professional Voice Clone (PVC) using the PVC API. To create a PVC via the dashboard, refer to the [Professional Voice Clone](/docs/product-guides/voices/voice-cloning/professional-voice-cloning) product guide.
Creating a PVC requires you to be on the [Creator plan](https://elevenlabs.io/pricing) or above.
For an outline of the differences between Instant Voice Clones and Professional Voice Clones, refer to the [Voices capability](/docs/capabilities/voices) guide.
If you are unsure about what is permissible from a legal standpoint, please consult the [Terms of
Service](https://elevenlabs.io/terms-of-use) and our [AI Safety
information](https://elevenlabs.io/safety) for more information.
In terms of creating a PVC via the API, it contains considerably more steps than creating an Instant Voice Clone. This is due to the fact that PVCs are more complex and require more data and fine-tuning to create a high quality clone.
## Using the Professional Voice Clone API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code to create a PVC voice:
```python maxLines=0
# example.py
import os
import time
import base64
from contextlib import ExitStack
from io import BytesIO
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
voice = elevenlabs.voices.pvc.create(
name="My Professional Voice Clone",
language="en",
description="A professional voice clone of my voice"
)
print(voice)
```
```typescript maxLines=0
// example.mts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
import fs from "node:fs";
const elevenlabs = new ElevenLabsClient();
const voice = await elevenlabs.voices.pvc.create({
name: "My Professional Voice Clone",
language: "en",
description: "A professional voice clone of my voice",
});
console.log(voice.voice_id);
```
Next we'll upload the audio sample files that will be used to train the PVC. Review the [Tips and suggestions](/docs/product-guides/voices/voice-cloning/professional-voice-cloning#tips-and-suggestions) section of the PVC product guide for more information on how to get best results from your audio files.
```python maxLines=0
# Define the list of file paths explicitly
# Replace with the paths to your audio and/or video files.
# The more files you add, the better the clone will be.
sample_file_paths = [
"/path/to/your/first_sample.mp3",
"/path/to/your/second_sample.wav",
"relative/path/to/another_sample.mp4"
]
samples = None
files_to_upload = []
# Use ExitStack to manage multiple open files
with ExitStack() as stack:
for filepath in sample_file_paths:
# Open each file and add it to the stack
audio_file = stack.enter_context(open(filepath, "rb"))
filename = os.path.basename(filepath)
# Create a File object for the SDK
files_to_upload.append(
BytesIO(audio_file.read())
)
samples = elevenlabs.voices.pvc.samples.create(
voice_id=voice.voice_id,
files=files_to_upload # Pass the list of File objects
)
```
```typescript
const samples = await elevenlabs.voices.pvc.samples.create(voice.voice_id, {
// Replace with the paths to your audio and/or video files.
// The more files you add, the better the clone will be.
files: [fs.createReadStream("/path/to/your/audio/file.mp3")],
})
```
This step will attempt to separate the audio files into individual speakers. This is required if you are uploading audio with multiple speakers.
```python maxLines=0
sample_ids_to_check = []
for sample in samples:
if sample.sample_id:
print(f"Starting separation for sample: {sample.sample_id}")
elevenlabs.voices.pvc.samples.speakers.separate(
voice_id=voice.voice_id,
sample_id=sample.sample_id
)
sample_ids_to_check.append(sample.sample_id)
while sample_ids_to_check:
# Create a copy of the list to iterate over, so we can remove items from the original
ids_in_batch = list(sample_ids_to_check)
for sample_id in ids_in_batch:
status_response = elevenlabs.voices.pvc.samples.speakers.get(
voice_id=voice.voice_id,
sample_id=sample_id
)
status = status_response.status
print(f"Sample {sample_id} status: {status}")
if status == "completed" or status == "failed":
sample_ids_to_check.remove(sample_id)
if sample_ids_to_check:
# Wait before the next poll cycle
time.sleep(5) # Wait for 5 seconds
print("All samples have been processed or removed from polling.")
```
```typescript maxLines=0
// Trigger the speaker separation action, this will take some time to complete
for (const sample of samples) {
if (sample.sample_id) {
await elevenlabs.voices.pvc.samples.speakers.separate(voiceId, sample.sampleId);
}
}
// Check the status of the speaker separation action
const ids = samples.map((sample) => sample.sample_id);
const interval = setInterval(async () => {
// Poll the status of the speaker separation action
for (const id of ids) {
if (!id) continue;
const { status } = await elevenlabs.voices.pvc.samples.speakers.get(voice.voiceId, id);
console.log(`Sample ${id} status: ${status}`);
if (status === "completed" || status === "failed") {
ids.splice(ids.indexOf(id), 1);
}
if (ids.length === 0) {
clearInterval(interval);
console.log("All samples have been processed or removed from polling");
}
}
}, 5000);
```
Since the previous step will take some time to complete, the following step should be run in a separate process after the previous step has completed.
Once speaker separation is complete, you will have a list of speakers for each sample. In the case of samples with multiple speakers, you will have to pick the speaker you want to use for the PVC. To identify the speaker, you can retrieve the audio for each speaker and listen to them.
```python maxLines=0
# Get the list of samples from the voice created in Step 3
voice = elevenlabs.voices.get(voice_id=voice_id)
samples = voice.samples
# Loop over each sample and save the audio for each speaker to a file
speaker_audio_output_dir = "path/to/speakers/"
if not os.path.exists(speaker_audio_output_dir):
os.makedirs(speaker_audio_output_dir)
for sample in samples:
speaker_info = elevenlabs.voices.pvc.samples.speakers.get(
voice_id=voice.voice_id,
sample_id=sample.sample_id
)
# Proceed only if separation is actually complete
if getattr(speaker_info, 'status', 'unknown') != "completed":
continue
if hasattr(speaker_info, 'speakers') and speaker_info.speakers:
speaker_list = speaker_info.speakers
if isinstance(speaker_info.speakers, dict):
speaker_list = speaker_info.speakers.values()
for speaker in speaker_list:
audio_response = elevenlabs.voices.pvc.samples.speakers.audio.get(
voice_id=voice.voice_id,
sample_id=sample.sample_id,
speaker_id=speaker.speaker_id
)
audio_base64 = audio_response.audio_base_64
audio_data = base64.b64decode(audio_base64)
output_filename = os.path.join(speaker_audio_output_dir, f"sample_{sample.sample_id}_speaker_{speaker.speaker_id}.mp3")
with open(output_filename, "wb") as f:
f.write(audio_data)
```
```typescript maxLines=0
// Get the list of samples from the voice created in Step 3
const { samples } = await elevenlabs.voices.get(voiceId);
// After separation is completed, get the list of speakers and their audio
if (samples) {
for (const sample of samples) {
if (!sample.sampleId) continue;
const { speakers } = await elevenlabs.voices.pvc.samples.speakers.get(voiceId, sample.sampleId)
if (speakers) {
for (const speaker of Object.values(speakers)) {
if (!speaker || !speaker.speakerId) continue;
const { audioBase64 } = await elevenlabs.voices.pvc.samples.speakers.audio.get(voiceId, sample.sampleId, speaker.speakerId);
const audioBuffer = Buffer.from(audioBase64, 'base64');
// Write the audio to a file
// Note which speaker ID you wish to use for the PVC
fs.writeFileSync(`path/to/speakers/sample_${sample.sampleId}_speaker_${speaker.speakerId}.mp3`, audioBuffer);
}
}
}
}
```
Once speaker separation is complete, you can update the samples to select which speaker you want to use for the PVC.
```python
elevenlabs.voices.pvc.samples.update(
voice_id=voice.voice_id,
sample_id=sample.sample_id,
selected_speaker_ids=[speaker.speaker_id]
)
```
```typescript
await elevenlabs.voices.pvc.samples.update(voice.voiceId, samples.sampleId, {
selectedSpeakerIds: [speaker.speakerId],
})
```
Repeat this process for each sample for with multiple speakers.
Before training can begin, a verification step is required to ensure you have permission to use the voice. First request the verification CAPTCHA.
```python
captcha_response = elevenlabs.voices.pvc.verification.captcha.get(voice.voice_id)
# Save captcha image to file
captcha_buffer = base64.b64decode(captcha_response)
with open('captcha.png', 'wb') as f:
f.write(captcha_buffer)
```
```typescript
const captchaResponse = await elevenlabs.voices.pvc.verification.captcha.get(voice.voice_id);
// Save captcha image to file
const captchaBuffer = Buffer.from(captchaResponse, 'base64');
fs.writeFileSync('path/to/captcha.png', captchaBuffer);
```
The image contains several lines of text that the voice owner will need to read out loud and record. Once done, submit the recording to verify the identity of the voice's owner.
```python
elevenlabs.voices.pvc.verification.captcha.verify(
voice_id=voice.voice_id,
recording=open('path/to/recording.mp3', 'rb')
)
```
```typescript
await elevenlabs.voices.pvc.verification.captcha.verify(voice.voiceId, {
recording: fs.createReadStream("/path/to/recording.mp3"),
})
```
If you are unable to verify the CAPTCHA, you can request manual verification. Note that this will take longer to process.
This should only be used if the previous verification steps have failed or are not possible, for instance if the voice owner is visually impaired.
For a list of the files that are required for manual verification, please contact support as each case may be unique.
```python
elevenlabs.voices.pvc.verification.request(
voice_id=voice.voice_id,
files=[open('path/to/verification/files.txt', 'rb')],
)
```
```typescript
await elevenlabs.voices.pvc.verification.request(voice.voiceId, {
files: [fs.createReadStream("/path/to/verification/files.txt")],
});
```
Next, begin the training process. This will take some time to complete based on the length and number of samples provided.
```python maxLines=0
elevenlabs.voices.pvc.train(
voice_id=voice.voice_id,
# Specify the model the PVC should be trained on
model_id="eleven_multilingual_v2"
)
# Poll the fine tuning status until it is complete or fails
# This example specifically checks for the eleven_multilingual_v2 model
while True:
voice_details = elevenlabs.voices.get(voice_id=voice.voice_id)
fine_tuning_state = None
if voice_details.fine_tuning and voice_details.fine_tuning.state:
fine_tuning_state = voice_details.fine_tuning.state.get("eleven_multilingual_v2")
if fine_tuning_state:
progress = None
if voice_details.fine_tuning.progress and voice_details.fine_tuning.progress.get("eleven_multilingual_v2"):
progress = voice_details.fine_tuning.progress.get("eleven_multilingual_v2")
print(f"Fine tuning progress: {progress}")
if fine_tuning_state == "fine_tuned" or fine_tuning_state == "failed":
print("Fine tuning completed or failed")
break
# Wait for 5 seconds before polling again
time.sleep(5)
```
```typescript maxLines=0
await elevenlabs.voices.pvc.train(voiceId, {
// Specify the model the PVC should be trained on
modelId: "eleven_multilingual_v2",
});
// Poll the fine tuning status until it is complete or fails
// This example specifically checks for the eleven_multilingual_v2 model
const interval = setInterval(async () => {
const { fineTuning } = await elevenlabs.voices.get(voiceId);
if (!fineTuning) return;
console.log(`Fine tuning progress: ${fineTuning?.progress?.eleven_multilingual_v2}`);
if (fineTuning?.state?.eleven_multilingual_v2 === "fine_tuned" || fineTuning?.state?.eleven_multilingual_v2 === "failed") {
clearInterval(interval);
console.log("Fine tuning completed or failed");
}
}, 5000);
```
Once the PVC is verified, you can use it in the same way as any other voice. See the [Text to Speech quickstart](/docs/quickstart) for more information on how to use a voice.
## Next steps
Explore the [API reference](/docs/api-reference/voices/pvc/create) for more information on creating a Professional Voice Clone.
# Voice Design quickstart
> Learn how to design a voice via a prompt using the Voice Design API.
This guide will show you how to design a voice via a prompt using the Voice Design API.
## Using the Voice Design API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
and/or [ffmpeg](https://ffmpeg.org/).
Designing a voice via a prompt has two steps:
1. Generate previews based on a prompt.
2. Select the best preview and use it to create a new voice.
We'll start by generating previews based on a prompt.
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
from elevenlabs import play
import base64
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
voices = elevenlabs.text_to_voice.design(
model_id="eleven_multilingual_ttv_v2",
voice_description="A massive evil ogre speaking at a quick pace. He has a silly and resonant tone.",
text="Your weapons are but toothpicks to me. Surrender now and I may grant you a swift end. I've toppled kingdoms and devoured armies. What hope do you have against me?",
)
for preview in voices.previews:
# Convert base64 to audio buffer
audio_buffer = base64.b64decode(preview.audio_base_64)
print(f"Playing preview: {preview.generated_voice_id}")
play(audio_buffer)
```
```typescript maxLines=0
// example.ts
import { ElevenLabsClient, play } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
import { Readable } from 'node:stream';
import { Buffer } from 'node:buffer';
const elevenlabs = new ElevenLabsClient();
const { previews } = await elevenlabs.textToVoice.design({
modelId: "eleven_multilingual_ttv_v2",
voiceDescription: "A massive evil ogre speaking at a quick pace. He has a silly and resonant tone.",
text: "Your weapons are but toothpicks to me. Surrender now and I may grant you a swift end. I've toppled kingdoms and devoured armies. What hope do you have against me?",
});
for (const preview of previews) {
// Convert base64 to buffer and create a Readable stream
const audioStream = Readable.from(Buffer.from(preview.audioBase64, 'base64'));
console.log(`Playing preview: ${preview.generatedVoiceId}`);
// Play the audio using the stream
await play(audioStream);
}
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should hear the generated voice previews playing through your speakers, one at a time.
Once you've generated the previews and picked your favorite, you can add it to your voice library via the generated voice ID so it can be used with other APIs.
```python
voice = elevenlabs.text_to_voice.create(
voice_name="Jolly giant",
voice_description="A huge giant, at least as tall as a building. A deep booming voice, loud and jolly.",
# The generated voice ID of the preview you want to use,
# using the first in the list for this example
generated_voice_id=voices.previews[0].generated_voice_id
)
print(voice.voice_id)
```
```typescript
const voice = await elevenlabs.textToVoice.create({
voiceName: "Jolly giant",
voiceDescription: "A huge giant, at least as tall as a building. A deep booming voice, loud and jolly.",
// The generated voice ID of the preview you want to use,
// using the first in the list for this example
generatedVoiceId: previews[0].generatedVoiceId
});
// The ID of the newly created voice, use this to reference the voice in other APIs
console.log(voice.voiceId);
```
## Next steps
Explore the [API reference](/docs/api-reference/text-to-voice/design) for more information on the Voice Design API and its options.
# Forced Alignment quickstart
> Learn how to use the Forced Alignment API to align text to audio.
This guide will show you how to use the Forced Alignment API to align text to audio.
## Using the Forced Alignment API
[Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).
Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.
```js title=".env"
ELEVENLABS_API_KEY=
```
We'll also use the `dotenv` library to load our API key from an environment variable.
```python
pip install elevenlabs
pip install python-dotenv
```
```typescript
npm install @elevenlabs/elevenlabs-js
npm install dotenv
```
Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:
```python maxLines=0
# example.py
import os
from io import BytesIO
from elevenlabs.client import ElevenLabs
import requests
from dotenv import load_dotenv
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
audio_url = (
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
)
response = requests.get(audio_url)
audio_data = BytesIO(response.content)
# Perform the text-to-speech conversion
transcription = elevenlabs.forced_alignment.create(
file=audio_data,
text="With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects."
)
print(transcription)
```
```typescript maxLines=0
// example.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import "dotenv/config";
const elevenlabs = new ElevenLabsClient();
const response = await fetch(
"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
);
const audioBlob = new Blob([await response.arrayBuffer()], { type: "audio/mp3" });
const transcript = await elevenlabs.forcedAlignment.create({
file: audioBlob,
text: "With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects."
})
console.log(transcript);
```
```python
python example.py
```
```typescript
npx tsx example.mts
```
You should see the transcript of the audio file with exact timestamps printed to the console.
## Next steps
Explore the [API reference](/docs/api-reference/forced-alignment/create) for more information on the Forced Alignment API and its options.
# Quickstart
> Build your first conversational AI voice agent in 5 minutes.
In this guide, you'll learn how to create your first Conversational AI voice agent. This will serve as a foundation for building conversational workflows tailored to your business use cases.
## Getting started
Conversational AI agents are managed through the [ElevenLabs dashboard](https://elevenlabs.io/app/conversational-ai). This is used to:
* Create and manage AI assistants
* Configure voice settings and conversation parameters
* Equip the agent with [tools](/docs/conversational-ai/customization/tools) and a [knowledge base](/docs/conversational-ai/customization/knowledge-base)
* Review conversation analytics and transcripts
* Manage API keys and integration settings
The web dashboard uses our [Web SDK](/docs/conversational-ai/libraries/react) under the hood to
handle real-time conversations.
## Overview
In this guide, we'll create a conversational support assistant capable of answering questions about your product, documentation, or service. This assistant can be embedded into your website or app to provide real-time support to your customers.

### Prerequisites
* An [ElevenLabs account](https://www.elevenlabs.io)
### Assistant setup
Go to [elevenlabs.io](https://elevenlabs.io/sign-up) and sign in to your account.
In the **ElevenLabs Dashboard**, create a new assistant by entering a name and selecting the `Blank template` option.

Go to the **Agent** tab to configure the assistant's behavior. Set the following:
This is the first message the assistant will speak out loud when a user starts a conversation.
```plaintext First message
Hi, this is Alexis from support. How can I help you today?
```
This prompt guides the assistant's behavior, tasks, and personality.
Customize the following example with your company details:
```plaintext System prompt
You are a friendly and efficient virtual assistant for [Your Company Name]. Your role is to assist customers by answering questions about the company's products, services, and documentation. You should use the provided knowledge base to offer accurate and helpful responses.
Tasks:
- Answer Questions: Provide clear and concise answers based on the available information.
- Clarify Unclear Requests: Politely ask for more details if the customer's question is not clear.
Guidelines:
- Maintain a friendly and professional tone throughout the conversation.
- Be patient and attentive to the customer's needs.
- If unsure about any information, politely ask the customer to repeat or clarify.
- Avoid discussing topics unrelated to the company's products or services.
- Aim to provide concise answers. Limit responses to a couple of sentences and let the user guide you on where to provide more detail.
```
Go to the **Knowledge Base** section to provide your assistant with context about your business.
This is where you can upload relevant documents & links to external resources:
* Include documentation, FAQs, and other resources to help the assistant respond to customer inquiries.
* Keep the knowledge base up-to-date to ensure the assistant provides accurate and current information.
### Configure the voice
In the **Voice** tab, choose a voice that best matches your assistant from the [voice library](https://elevenlabs.io/community):

Using higher quality voices, models, and LLMs may increase response time. For an optimal customer experience, balance quality and latency based on your assistant's expected use case.
Press the **Test AI agent** button and try conversing with your assistant.
### Analyze and collect conversation data
Configure evaluation criteria and data collection to analyze conversations and improve your assistant's performance.
Navigate to the **Analysis** tab in your assistant's settings to define custom criteria for evaluating conversations.

Every conversation transcript is passed to the LLM to verify if specific goals were met. Results will either be `success`, `failure`, or `unknown`, along with a rationale explaining the chosen result.
Let's add an evaluation criteria with the name `solved_user_inquiry`:
```plaintext Prompt
The assistant was able to answer all of the queries or redirect them to a relevant support channel.
Success Criteria:
- All user queries were answered satisfactorily.
- The user was redirected to a relevant support channel if needed.
```
In the **Data Collection** section, configure details to be extracted from each conversation.
Click **Add item** and configure the following:
1. **Data type:** Select "string"
2. **Identifier:** Enter a unique identifier for this data point: `user_question`
3. **Description:** Provide detailed instructions for the LLM about how to extract the specific data from the transcript:
```plaintext Prompt
Extract the user's questions & inquiries from the conversation.
```
Test your assistant by posing as a customer. Ask questions, evaluate its responses, and tweak the prompts until you're happy with how it performs.
View evaluation results and collected data for each conversation in the **Call history** tab.

Regularly review conversation history to identify common issues and patterns.
Your assistant is now configured. Embed the widget on your website to start providing real-time support to your customers.
## Overview
In this guide, we’ll create a conversational ordering assistant for Pierogi Palace, a Polish restaurant that takes food orders, addressing their challenge of managing high call volumes.
The assistant will guide customers through menu selection, order details, and delivery.
### Prerequisites
* An [ElevenLabs account](https://www.elevenlabs.io)
### Assistant setup
Go to [elevenlabs.io](https://elevenlabs.io/sign-up) and sign in to your account.
In the **ElevenLabs Dashboard**, create a new assistant by entering a name and selecting the `Blank template` option.

Go to the **Agent** tab to configure the assistant's behavior. Set the following:
This is the first message the assistant will speak out loud when a user starts a conversation.
```plaintext First message
Welcome to Pierogi Palace! I'm here to help you place your order. What can I get started for you today?
```
This prompt guides the assistant's behavior, tasks, and personality:
```plaintext System prompt
You are a friendly and efficient virtual assistant for Pierogi Palace, a modern Polish restaurant specializing in pierogi. It is located in the Zakopane mountains in Poland.
Your role is to help customers place orders over voice conversations. You have comprehensive knowledge of the menu items and their prices.
Menu Items:
- Potato & Cheese Pierogi – 30 Polish złoty per dozen
- Beef & Onion Pierogi – 40 Polish złoty per dozen
- Spinach & Feta Pierogi – 30 Polish złoty per dozen
Your Tasks:
1. Greet the Customer: Start with a warm welcome and ask how you can assist.
2. Take the Order: Listen carefully to the customer's selection, confirm the type and quantity of pierogi.
3. Confirm Order Details: Repeat the order back to the customer for confirmation.
4. Calculate Total Price: Compute the total cost based on the items ordered.
5. Collect Delivery Information: Ask for the customer's delivery address to estimate delivery time.
6. Estimate Delivery Time: Inform the customer that cooking time is 10 minutes plus delivery time based on their location.
7. Provide Order Summary: Give the customer a summary of their order, total price, and estimated delivery time.
8. Close the Conversation: Thank the customer and let them know their order is being prepared.
Guidelines:
- Use a friendly and professional tone throughout the conversation.
- Be patient and attentive to the customer's needs.
- If unsure about any information, politely ask the customer to repeat or clarify.
- Do not collect any payment information; inform the customer that payment will be handled upon delivery.
- Avoid discussing topics unrelated to taking and managing the order.
```
### Configure the voice
In the **Voice** tab, choose a voice that best matches your assistant from the [voice library](https://elevenlabs.io/community):

Using higher quality voices, models, and LLMs may increase response time. For an optimal customer experience, balance quality and latency based on your assistant's expected use case.
Press the **Test AI agent** button and try ordering some pierogi.
### Analyze and collect conversation data
Configure evaluation criteria and data collection to analyze conversations and improve your assistant's performance.
Navigate to the **Analysis** tab in your assistant's settings to define custom criteria for evaluating conversations.

Every conversation transcript is passed to the LLM to verify if specific goals were met. Results will either be `success`, `failure`, or `unknown`, along with a rationale explaining the chosen result.
Let's add an evaluation criteria with the name `order_completion`:
```plaintext Prompt
Evaluate if the conversation resulted in a successful order.
Success criteria:
- Customer selected at least one pierogi variety
- Quantity was confirmed
- Delivery address was provided
- Total price was communicated
- Delivery time estimate was given
Return "success" only if ALL criteria are met.
```
In the **Data Collection** section, configure details to be extracted from each conversation.
Click **Add item** and configure the following:
1. **Data type:** Select "string"
2. **Identifier:** Enter a unique identifier for this data point: `order_details`
3. **Description:** Provide detailed instructions for the LLM about how to extract the specific data from the transcript:
```plaintext Prompt
Extract order details from the conversation, including:
- Type of order (delivery, pickup, inquiry_only)
- List of pierogi varieties and quantities ordered in the format: "item: quantity"
- Delivery zone based on the address (central_zakopane, outer_zakopane, outside_delivery_zone)
- Interaction type (completed_order, abandoned_order, menu_inquiry, general_inquiry)
If no order was placed, return "none"
```
Test your assistant by posing as a customer. Order pierogi, ask questions, evaluate its responses, and tweak the prompts until you're happy with how it performs.
View evaluation results and collected data for each conversation in the **Call history** tab.

Regularly review conversation history to identify common issues and patterns.
Your assistant is now configured & ready to take orders.
## Next steps
Learn how to customize your agent with tools, knowledge bases, dynamic variables and overrides.
Learn how to integrate Conversational AI into your app using the SDK for advanced configuration.
# Cross-platform Voice Agents with Expo React Native
> Build conversational AI agents that work across iOS, Android, and web using Expo React Native and the ElevenLabs Conversational AI SDK.
## Introduction
In this tutorial you will learn how to build a voice agent that works across iOS, Android, and web using [Expo React Native](https://expo.dev/) and the ElevenLabs Conversational AI SDK.
Find the [example project on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/react-native/elevenlabs-conversational-ai-expo-react-native).
## Requirements
* An ElevenLabs account with an [API key](/app/settings/api-keys).
* Node.js v18 or higher installed on your machine.
## Setup
### Create a new Expo project
Using `create-expo-app`, create a new blank Expo project:
```bash
npx create-expo-app@latest --template blank-typescript
```
### Enable microphone permissions
In the `app.json` file, add the following permissions:
```json app.json
{
"expo": {
"scheme": "elevenlabs",
// ...
"ios": {
"infoPlist": {
"NSMicrophoneUsageDescription": "This app uses the microphone to record audio."
},
"supportsTablet": true,
"bundleIdentifier": "YOUR.BUNDLE.ID"
},
"android": {
"permissions": [
"android.permission.RECORD_AUDIO",
"android.permission.MODIFY_AUDIO_SETTINGS"
],
"adaptiveIcon": {
"foregroundImage": "./assets/adaptive-icon.png",
"backgroundColor": "#ffffff"
},
"package": "YOUR.PACKAGE.ID"
}
// ...
}
}
```
This will allow the React Native web view to prompt for microphone permissions when the conversation is started.
For Android emulator you will need to enable "Virtual microphone uses host audio input" in the
emulator microphone settings.
### Install dependencies
This approach relies on [Expo DOM components](https://docs.expo.dev/guides/dom-components/) to make the conversational AI agent work across platforms. There is a couple of dependencies you need to install to make this work.
```bash
npx expo install @elevenlabs/react
npx expo install expo-dev-client # tunnel support
npx expo install react-native-webview # DOM components support
npx expo install react-dom react-native-web @expo/metro-runtime # RN web support
# Cool client tools
npx expo install expo-battery
npx expo install expo-brightness
```
## Expo DOM components
Expo offers a [novel approach](https://docs.expo.dev/guides/dom-components/) to work with modern web code directly in a native app via the `use dom` directive. This approach means that you can use our [Conversational AI React SDK](https://elevenlabs.io/docs/conversational-ai/libraries/react) across all platforms using the same code.
Under the hood, Expo uses `react-native-webview` to render the web code in a native component. To allow the webview to access the microphone, you need to make sure to use `npx expo start --tunnel` to start the Expo development server locally so that the webview is served over https.
### Create the conversational AI DOM component
Create a new file in the components folder: `./components/ConvAI.tsx` and add the following code:
```tsx /components/ConvAI.tsx {10-19,33-40,51-65}
'use dom';
import { useConversation } from '@elevenlabs/react';
import { Mic } from 'lucide-react-native';
import { useCallback } from 'react';
import { View, Pressable, StyleSheet } from 'react-native';
import tools from '../utils/tools';
async function requestMicrophonePermission() {
try {
await navigator.mediaDevices.getUserMedia({ audio: true });
return true;
} catch (error) {
console.log(error);
console.error('Microphone permission denied');
return false;
}
}
export default function ConvAiDOMComponent({
platform,
get_battery_level,
change_brightness,
flash_screen,
}: {
dom?: import('expo/dom').DOMProps;
platform: string;
get_battery_level: typeof tools.get_battery_level;
change_brightness: typeof tools.change_brightness;
flash_screen: typeof tools.flash_screen;
}) {
const conversation = useConversation({
onConnect: () => console.log('Connected'),
onDisconnect: () => console.log('Disconnected'),
onMessage: (message) => {
console.log(message);
},
onError: (error) => console.error('Error:', error),
});
const startConversation = useCallback(async () => {
try {
// Request microphone permission
const hasPermission = await requestMicrophonePermission();
if (!hasPermission) {
alert('No permission');
return;
}
// Start the conversation with your agent
await conversation.startSession({
agentId: 'YOUR_AGENT_ID', // Replace with your agent ID
dynamicVariables: {
platform,
},
clientTools: {
get_battery_level,
change_brightness,
flash_screen,
},
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation]);
const stopConversation = useCallback(async () => {
await conversation.endSession();
}, [conversation]);
return (
);
}
const styles = StyleSheet.create({
callButton: {
width: 120,
height: 120,
borderRadius: 60,
backgroundColor: 'rgba(255, 255, 255, 0.1)',
alignItems: 'center',
justifyContent: 'center',
marginBottom: 24,
},
callButtonActive: {
backgroundColor: 'rgba(239, 68, 68, 0.2)',
},
buttonInner: {
width: 80,
height: 80,
borderRadius: 40,
backgroundColor: '#3B82F6',
alignItems: 'center',
justifyContent: 'center',
shadowColor: '#3B82F6',
shadowOffset: {
width: 0,
height: 0,
},
shadowOpacity: 0.5,
shadowRadius: 20,
elevation: 5,
},
buttonInnerActive: {
backgroundColor: '#EF4444',
shadowColor: '#EF4444',
},
buttonIcon: {
transform: [{ translateY: 2 }],
},
});
```
### Native client tools
A big part of building conversational AI agents is allowing the agent access and execute functionality dynamically. This can be done via [client tools](/docs/conversational-ai/customization/tools/client-tools).
In order for DOM components to exectute native actions, you can send type-safe native functions to DOM components by passing asynchronous functions as top-level props to the DOM component.
Create a new file to hold your client tools: `./utils/tools.ts` and add the following code:
```ts ./utils/tools.ts
import * as Battery from 'expo-battery';
import * as Brightness from 'expo-brightness';
const get_battery_level = async () => {
const batteryLevel = await Battery.getBatteryLevelAsync();
console.log('batteryLevel', batteryLevel);
if (batteryLevel === -1) {
return 'Error: Device does not support retrieving the battery level.';
}
return batteryLevel;
};
const change_brightness = ({ brightness }: { brightness: number }) => {
console.log('change_brightness', brightness);
Brightness.setSystemBrightnessAsync(brightness);
return brightness;
};
const flash_screen = () => {
Brightness.setSystemBrightnessAsync(1);
setTimeout(() => {
Brightness.setSystemBrightnessAsync(0);
}, 200);
return 'Successfully flashed the screen.';
};
const tools = {
get_battery_level,
change_brightness,
flash_screen,
};
export default tools;
```
### Dynamic variables
In addition to the client tools, we're also injecting the platform (web, iOS, Android) as a [dynamic variable](https://elevenlabs.io/docs/conversational-ai/customization/personalization/dynamic-variables) both into the first message, and the prompt. To do this, we pass the platform as a top-level prop to the DOM component, and then in our DOM component pass it to the `startConversation` configuration:
```tsx ./components/ConvAI.tsx {3,34-36}
// ...
export default function ConvAiDOMComponent({
platform,
get_battery_level,
change_brightness,
flash_screen,
}: {
dom?: import('expo/dom').DOMProps;
platform: string;
get_battery_level: typeof tools.get_battery_level;
change_brightness: typeof tools.change_brightness;
flash_screen: typeof tools.flash_screen;
}) {
const conversation = useConversation({
onConnect: () => console.log('Connected'),
onDisconnect: () => console.log('Disconnected'),
onMessage: (message) => {
console.log(message);
},
onError: (error) => console.error('Error:', error),
});
const startConversation = useCallback(async () => {
try {
// Request microphone permission
const hasPermission = await requestMicrophonePermission();
if (!hasPermission) {
alert('No permission');
return;
}
// Start the conversation with your agent
await conversation.startSession({
agentId: 'YOUR_AGENT_ID', // Replace with your agent ID
dynamicVariables: {
platform,
},
clientTools: {
get_battery_level,
change_brightness,
flash_screen,
},
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation]);
//...
}
// ...
```
### Add the component to your app
Add the component to your app by adding the following code to your `./App.tsx` file:
```tsx ./App.tsx {44-52}
import { LinearGradient } from 'expo-linear-gradient';
import { StatusBar } from 'expo-status-bar';
import { View, Text, StyleSheet, SafeAreaView } from 'react-native';
import { Platform } from 'react-native';
import ConvAiDOMComponent from './components/ConvAI';
import tools from './utils/tools';
export default function App() {
return (
Cross-platform conversational AI agents with ElevenLabs and Expo React Native.
Available Client Tools:
Get battery level
web
ios
android
Change screen brightness
ios
android
Flash screen
ios
android
);
}
const styles = StyleSheet.create({
container: {
flex: 1,
},
topContent: {
paddingTop: 40,
paddingHorizontal: 24,
alignItems: 'center',
},
description: {
fontFamily: 'Inter-Regular',
fontSize: 16,
color: '#E2E8F0',
textAlign: 'center',
maxWidth: 300,
lineHeight: 24,
marginBottom: 24,
},
toolsList: {
backgroundColor: 'rgba(255, 255, 255, 0.05)',
borderRadius: 16,
padding: 20,
width: '100%',
maxWidth: 400,
marginBottom: 24,
},
toolsTitle: {
fontFamily: 'Inter-Bold',
fontSize: 18,
color: '#E2E8F0',
marginBottom: 16,
},
toolItem: {
flexDirection: 'row',
justifyContent: 'space-between',
alignItems: 'center',
paddingVertical: 12,
borderBottomWidth: 1,
borderBottomColor: 'rgba(255, 255, 255, 0.1)',
},
toolText: {
fontFamily: 'Inter-Regular',
fontSize: 14,
color: '#E2E8F0',
},
platformTags: {
flexDirection: 'row',
gap: 8,
},
platformTag: {
fontSize: 12,
color: '#94A3B8',
backgroundColor: 'rgba(148, 163, 184, 0.1)',
paddingHorizontal: 8,
paddingVertical: 4,
borderRadius: 6,
overflow: 'hidden',
fontFamily: 'Inter-Regular',
},
domComponentContainer: {
width: 120,
height: 120,
alignItems: 'center',
justifyContent: 'center',
marginBottom: 24,
},
domComponent: {
width: 120,
height: 120,
},
});
```
## Agent configuration
Go to [elevenlabs.io](https://elevenlabs.io/sign-up) and sign in to your account.
Navigate to [Conversational AI > Agents](https://elevenlabs.io/app/conversational-ai/agents) and
create a new agent from the blank template.
Set the first message and specify the dynamic variable for the platform.
```txt
Hi there, woah, so cool that I'm running on {{platform}}. What can I help you with?
```
Set the system prompt. You can also include dynamic variables here.
```txt
You are a helpful assistant running on {{platform}}. You have access to certain tools that allow you to check the user device battery level and change the display brightness. Use these tools if the user asks about them. Otherwise, just answer the question.
```
Set up the following client tools:
* Name: `get_battery_level`
* Description: Gets the device battery level as decimal point percentage.
* Wait for response: `true`
* Response timeout (seconds): 3
* Name: `change_brightness`
* Description: Changes the brightness of the device screen.
* Wait for response: `true`
* Response timeout (seconds): 3
* Parameters:
* Data Type: `number`
* Identifier: `brightness`
* Required: `true`
* Value Type: `LLM Prompt`
* Description: A number between 0 and 1, inclusive, representing the desired screen brightness.
* Name: `flash_screen`
* Description: Quickly flashes the screen on and off.
* Wait for response: `true`
* Response timeout (seconds): 3
## Run the app
Modyfing the brightness is not supported within Expo Go, therefore you will need to prebuild the app and then run it on a native device.
* Terminal 1:
* Run `npx expo prebuild --clean`
```bash
npx expo prebuild --clean
```
* Run `npx expo start --tunnel` to start the Expo development server over https.
```bash
npx expo start --tunnel
```
* Terminal 2:
* Run `npx expo run:ios --device` to run the app on your iOS device.
```bash
npx expo run:ios --device
```
# Data Collection and Analysis with Conversational AI in Next.js
> Collect and analyse data in post-call webhooks using Conversational AI and Next.js.
## Introduction
In this tutorial you will learn how to build a voice agent that collects information from the user through conversation, then analyses and extracts the data in a structured way and sends it to your application via the post-call webhook.
Find the [example project on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/nextjs-post-call-webhook).
## Requirements
* An ElevenLabs account with an [API key](/app/settings/api-keys).
* Node.js v18 or higher installed on your machine.
## Setup
### Create a new Next.js project
We recommend using our [v0.dev Conversational AI template](https://v0.dev/community/nextjs-5TN93pl3bRS) as the starting point for your application. This template is a production-ready Next.js application with the Conversational AI agent already integrated.
Alternatively, you can clone the [fully integrated project from GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/nextjs-post-call-webhook), or create a new blank Next.js project and follow the steps below to integrate the Conversational AI agent.
### Set up conversational AI
Follow our [Next.js guide](/docs/conversational-ai/guides/quickstarts/next-js) for installation and configuration steps. Then come back here to build in the advanced features.
## Agent configuration
Go to [elevenlabs.io](https://elevenlabs.io/sign-up) and sign in to your account.
Navigate to [Conversational AI > Agents](https://elevenlabs.io/app/conversational-ai/agents) and
create a new agent from the blank template.
Set the first message and specify the dynamic variable for the platform.
```txt
Hi {{user_name}}, I'm Jess from the ElevenLabs team. I'm here to help you design your very own conversational AI agent! To kick things off, let me know what kind of agent you're looking to create. For example, do you want a support agent, to help your users answer questions, or a sales agent to sell your products, or just a friend to chat with?
```
Set the system prompt. You can also include dynamic variables here.
```txt
You are Jess, a helpful agent helping {{user_name}} to design their very own conversational AI agent. The design process involves the following steps:
"initial": In the first step, collect the information about the kind of agent the user is looking to create. Summarize the user's needs back to them and ask if they are ready to continue to the next step. Only once they confirm proceed to the next step.
"training": Tell the user to create the agent's knowledge base by uploading documents, or submitting URLs to public websites with information that should be available to the agent. Wait patiently without talking to the user. Only when the user confirms that they've provided everything then proceed to the next step.
"voice": Tell the user to describe the voice they want their agent to have. For example: "A professional, strong spoken female voice with a slight British accent." Repeat the description of their voice back to them and ask if they are ready to continue to the next step. Only once they confirm proceed to the next step.
"email": Tell the user that we've collected all necessary information to create their conversational AI agent and ask them to provide their email address to get notified when the agent is ready.
Always call the `set_ui_state` tool when moving between steps!
```
Set up the following client tool to navigate between the steps:
* Name: `set_ui_state`
* Description: Use this client-side tool to navigate between the different UI states.
* Wait for response: `true`
* Response timeout (seconds): 1
* Parameters:
* Data type: string
* Identifier: step
* Required: true
* Value Type: LLM Prompt
* Description: The step to navigate to in the UI. Only use the steps that are defined in the system prompt!
Navigate to the `Voice` tab and set the voice for your agent. You can find a list of recommended voices for Conversational AI in the [Conversational Voice Design docs](/docs/conversational-ai/best-practices/conversational-voice-design#voices).
Navigate to the `Analysis` tab and add a new evaluation criteria.
* Name: `all_data_provided`
* Prompt: Evaluate whether the user provided a description of the agent they are looking to generate as well as a description of the voice the agent should have.
You can use the post call analysis to extract data from the conversation. In the `Analysis` tab, under `Data Collection`, add the following items:
* Identifier: `voice_description`
* `data-type`: `String`
* Description: Based on the description of the voice the user wants the agent to have, generate a concise description of the voice including the age, accent, tone, and character if available.
* Identifier: `agent_description`
* `data-type`: `String`
* Description: Based on the description about the agent the user is looking to design, generate a prompt that can be used to train a model to act as the agent.
[Post-call webhooks](https://elevenlabs.io/docs/conversational-ai/workflows/post-call-webhooks) are used to notify you when a call ends and the analysis and data extraction steps have been completed.
In this example the, the post-call webhook does a couple of steps, namely:
1. Create a custom voice design based on the `voice_description`.
2. Create a conversational AI agent for the users based on the `agent_description` they provided.
3. Retrieve the knowledge base documents from the conversation state stored in Redis and attach the knowledge base to the agent.
4. Send an email to the user to notify them that their custom conversational AI agent is ready to chat.
When running locally, you will need a tool like [ngrok](https://ngrok.com/) to expose your local server to the internet.
```bash
ngrok http 3000
```
Navigate to the [Conversational AI settings](https://elevenlabs.io/app/conversational-ai/settings) and under `Post-Call Webhook` create a new webhook and paste in your ngrok URL: `https://.ngrok-free.app/api/convai-webhook`.
After saving the webhook, you will receive a webhooks secret. Make sure to store this secret securely as you will need to set it in your `.env` file later.
## Integrate the advanced features
### Set up a Redis database for storing the conversation state
In this example we're using Redis to store the conversation state. This allows us to retrieve the knowledge base documents from the conversation state after the call ends.
If youre deploying to Vercel, you can configure the [Upstash for Redis](https://vercel.com/marketplace/upstash) integration, or alternatively you can sign up for a free [Upstash account](https://upstash.com/) and create a new database.
### Set up Resend for sending post-call emails
In this example we're using Resend to send the post-call email to the user. To do so you will need to create a free [Resend account](https://resend.com/) and set up a new API key.
### Set the environment variables
In the root of your project, create a `.env` file and add the following variables:
```bash
ELEVENLABS_CONVAI_WEBHOOK_SECRET=
ELEVENLABS_API_KEY=
ELEVENLABS_AGENT_ID=
# Resend
RESEND_API_KEY=
RESEND_FROM_EMAIL=
# Upstash Redis
KV_URL=
KV_REST_API_READ_ONLY_TOKEN=
REDIS_URL=
KV_REST_API_TOKEN=
KV_REST_API_URL=
```
### Configure security and authentication
To secure your conversational AI agent, you need to enable authentication in the `Security` tab of the agent configuration.
Once authentication is enabled, you will need to create a signed URL in a secure server-side environment to initiate a conversation with the agent. In Next.js, you can do this by setting up a new API route.
```tsx ./app/api/signed-url/route.ts
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import { NextResponse } from 'next/server';
export async function GET() {
const agentId = process.env.ELEVENLABS_AGENT_ID;
if (!agentId) {
throw Error('ELEVENLABS_AGENT_ID is not set');
}
try {
const elevenlabs = new ElevenLabsClient();
const response = await elevenlabs.conversationalAi.conversations.getSignedUrl({
agentId,
});
return NextResponse.json({ signedUrl: response.signedUrl });
} catch (error) {
console.error('Error:', error);
return NextResponse.json({ error: 'Failed to get signed URL' }, { status: 500 });
}
}
```
### Start the conversation session
To start the conversation, first, call your API route to get the signed URL, then use the `useConversation` hook to set up the conversation session.
```tsx ./page.tsx {1,4,20-25,31-46}
import { useConversation } from '@elevenlabs/react';
async function getSignedUrl(): Promise {
const response = await fetch('/api/signed-url');
if (!response.ok) {
throw Error('Failed to get signed url');
}
const data = await response.json();
return data.signedUrl;
}
export default function Home() {
// ...
const [currentStep, setCurrentStep] = useState<
'initial' | 'training' | 'voice' | 'email' | 'ready'
>('initial');
const [conversationId, setConversationId] = useState('');
const [userName, setUserName] = useState('');
const conversation = useConversation({
onConnect: () => console.log('Connected'),
onDisconnect: () => console.log('Disconnected'),
onMessage: (message: string) => console.log('Message:', message),
onError: (error: Error) => console.error('Error:', error),
});
const startConversation = useCallback(async () => {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
// Start the conversation with your agent
const signedUrl = await getSignedUrl();
const convId = await conversation.startSession({
signedUrl,
dynamicVariables: {
user_name: userName,
},
clientTools: {
set_ui_state: ({ step }: { step: string }): string => {
// Allow agent to navigate the UI.
setCurrentStep(step as 'initial' | 'training' | 'voice' | 'email' | 'ready');
return `Navigated to ${step}`;
},
},
});
setConversationId(convId);
console.log('Conversation ID:', convId);
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation, userName]);
const stopConversation = useCallback(async () => {
await conversation.endSession();
}, [conversation]);
// ...
}
```
### Client tool and dynamic variables
In the agent configuration earlier, you registered the `set_ui_state` client tool to allow the agent to navigate between the different UI states. To put it all together, you need to pass the client tool implementation to the `conversation.startSession` options.
This is also where you can pass in the dynamic variables to the conversation.
```tsx ./page.tsx {3-5,7-11}
const convId = await conversation.startSession({
signedUrl,
dynamicVariables: {
user_name: userName,
},
clientTools: {
set_ui_state: ({ step }: { step: string }): string => {
// Allow agent to navigate the UI.
setCurrentStep(step as 'initial' | 'training' | 'voice' | 'email' | 'ready');
return `Navigated to ${step}`;
},
},
});
```
### Uploading documents to the knowledge base
In the `Training` step, the agent will ask the user to upload documents or submit URLs to public websites with information that should be available to their agent. Here you can utilise the new `after` function of [Next.js 15](https://nextjs.org/docs/app/api-reference/functions/after) to allow uploading of documents in the background.
Create a new `upload` server action to handle the knowledge base creation upon form submission. Once all knowledge base documents have been created, store the conversation ID and the knowledge base IDs in the Redis database.
```tsx ./app/actions/upload.ts {26,32,44,56-60}
'use server';
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import { Redis } from '@upstash/redis';
import { redirect } from 'next/navigation';
import { after } from 'next/server';
// Initialize Redis
const redis = Redis.fromEnv();
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
export async function uploadFormData(formData: FormData) {
const knowledgeBase: Array<{
id: string;
type: 'file' | 'url';
name: string;
}> = [];
const files = formData.getAll('file-upload') as File[];
const email = formData.get('email-input');
const urls = formData.getAll('url-input');
const conversationId = formData.get('conversation-id');
after(async () => {
// Upload files as background job
// Create knowledge base entries
// Loop through files and create knowledge base entries
for (const file of files) {
if (file.size > 0) {
const response = await elevenlabs.conversationalAi.knowledgeBase.documents.createFromFile({
file,
});
if (response.id) {
knowledgeBase.push({
id: response.id,
type: 'file',
name: file.name,
});
}
}
}
// Append all urls
for (const url of urls) {
const response = await elevenlabs.conversationalAi.knowledgeBase.documents.createFromUrl({
url: url as string,
});
if (response.id) {
knowledgeBase.push({
id: response.id,
type: 'url',
name: `url for ${conversationId}`,
});
}
}
// Store knowledge base IDs and conversation ID in database.
const redisRes = await redis.set(
conversationId as string,
JSON.stringify({ email, knowledgeBase })
);
console.log({ redisRes });
});
redirect('/success');
}
```
## Handling the post-call webhook
The [post-call webhook](/docs/conversational-ai/workflows/post-call-webhooks) is triggered when a call ends and the analysis and data extraction steps have been completed.
There's a few steps that are happening here, namely:
1. Verify the webhook secret and construct the webhook payload.
2. Create a custom voice design based on the `voice_description`.
3. Create a conversational AI agent for the users based on the `agent_description` they provided.
4. Retrieve the knowledge base documents from the conversation state stored in Redis and attach the knowledge base to the agent.
5. Send an email to the user to notify them that their custom conversational AI agent is ready to chat.
```ts ./app/api/convai-webhook/route.ts
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import { Redis } from '@upstash/redis';
import crypto from 'crypto';
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
import { Resend } from 'resend';
import { EmailTemplate } from '@/components/email/post-call-webhook-email';
// Initialize Redis
const redis = Redis.fromEnv();
// Initialize Resend
const resend = new Resend(process.env.RESEND_API_KEY);
const elevenlabs = new ElevenLabsClient({
apiKey: process.env.ELEVENLABS_API_KEY,
});
export async function GET() {
return NextResponse.json({ status: 'webhook listening' }, { status: 200 });
}
export async function POST(req: NextRequest) {
const secret = process.env.ELEVENLABS_CONVAI_WEBHOOK_SECRET; // Add this to your env variables
const { event, error } = await constructWebhookEvent(req, secret);
if (error) {
return NextResponse.json({ error: error }, { status: 401 });
}
if (event.type === 'post_call_transcription') {
const { conversation_id, analysis, agent_id } = event.data;
if (
agent_id === process.env.ELEVENLABS_AGENT_ID &&
analysis.evaluation_criteria_results.all_data_provided?.result === 'success' &&
analysis.data_collection_results.voice_description?.value
) {
try {
// Design the voice
const voicePreview = await elevenlabs.textToVoice.createPreviews({
voiceDescription: analysis.data_collection_results.voice_description.value,
text: 'The night air carried whispers of betrayal, thick as London fog. I adjusted my cufflinks - after all, even spies must maintain appearances, especially when the game is afoot.',
});
const voice = await elevenlabs.textToVoice.createVoiceFromPreview({
voiceName: `voice-${conversation_id}`,
voiceDescription: `Voice for ${conversation_id}`,
generatedVoiceId: voicePreview.previews[0].generatedVoiceId,
});
// Get the knowledge base from redis
const redisRes = await getRedisDataWithRetry(conversation_id);
if (!redisRes) throw new Error('Conversation data not found!');
// Handle agent creation
const agent = await elevenlabs.conversationalAi.agents.create({
name: `Agent for ${conversation_id}`,
conversationConfig: {
tts: { voiceId: voice.voice_id },
agent: {
prompt: {
prompt:
analysis.data_collection_results.agent_description?.value ??
'You are a helpful assistant.',
knowledgeBase: redisRes.knowledgeBase,
},
firstMessage: 'Hello, how can I help you today?',
},
},
});
console.log('Agent created', { agent: agent.agentId });
// Send email to user
console.log('Sending email to', redisRes.email);
await resend.emails.send({
from: process.env.RESEND_FROM_EMAIL!,
to: redisRes.email,
subject: 'Your Conversational AI agent is ready to chat!',
react: EmailTemplate({ agentId: agent.agentId }),
});
} catch (error) {
console.error(error);
return NextResponse.json({ error }, { status: 500 });
}
}
}
return NextResponse.json({ received: true }, { status: 200 });
}
const constructWebhookEvent = async (req: NextRequest, secret?: string) => {
const body = await req.text();
const signatureHeader = req.headers.get('ElevenLabs-Signature');
return await elevenlabs.webhooks.constructEvent(body, signatureHeader, secret);
};
async function getRedisDataWithRetry(
conversationId: string,
maxRetries = 5
): Promise<{
email: string;
knowledgeBase: Array<{
id: string;
type: 'file' | 'url';
name: string;
}>;
} | null> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const data = await redis.get(conversationId);
return data as any;
} catch (error) {
if (attempt === maxRetries) throw error;
console.log(`Redis get attempt ${attempt} failed, retrying...`);
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
return null;
}
```
Let's go through each step in detail.
### Verify the webhook secret and consrtuct the webhook payload
When the webhook request is received, we first verify the webhook secret and construct the webhook payload.
```ts ./app/api/convai-webhook/route.ts
// ...
export async function POST(req: NextRequest) {
const secret = process.env.ELEVENLABS_CONVAI_WEBHOOK_SECRET;
const { event, error } = await constructWebhookEvent(req, secret);
// ...
}
// ...
const constructWebhookEvent = async (req: NextRequest, secret?: string) => {
const body = await req.text();
const signatureHeader = req.headers.get('ElevenLabs-Signature');
return await elevenlabs.webhooks.constructEvent(body, signatureHeader, secret);
};
async function getRedisDataWithRetry(
conversationId: string,
maxRetries = 5
): Promise<{
email: string;
knowledgeBase: Array<{
id: string;
type: 'file' | 'url';
name: string;
}>;
} | null> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const data = await redis.get(conversationId);
return data as any;
} catch (error) {
if (attempt === maxRetries) throw error;
console.log(`Redis get attempt ${attempt} failed, retrying...`);
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
return null;
}
```
### Create a custom voice design based on the `voice_description`
Using the `voice_description` from the webhook payload, we create a custom voice design.
```ts ./app/api/convai-webhook/route.ts {5}
// ...
// Design the voice
const voicePreview = await elevenlabs.textToVoice.createPreviews({
voiceDescription: analysis.data_collection_results.voice_description.value,
text: 'The night air carried whispers of betrayal, thick as London fog. I adjusted my cufflinks - after all, even spies must maintain appearances, especially when the game is afoot.',
});
const voice = await elevenlabs.textToVoice.createVoiceFromPreview({
voiceName: `voice-${conversation_id}`,
voiceDescription: `Voice for ${conversation_id}`,
generatedVoiceId: voicePreview.previews[0].generatedVoiceId,
});
// ...
```
### Retrieve the knowledge base documents from the conversation state stored in Redis
The uploading of the documents might take longer than the webhook data analysis, so we'll need to poll the conversation state in Redis until the documents have been uploaded.
```ts ./app/api/convai-webhook/route.ts
// ...
// Get the knowledge base from redis
const redisRes = await getRedisDataWithRetry(conversation_id);
if (!redisRes) throw new Error('Conversation data not found!');
// ...
async function getRedisDataWithRetry(
conversationId: string,
maxRetries = 5
): Promise<{
email: string;
knowledgeBase: Array<{
id: string;
type: 'file' | 'url';
name: string;
}>;
} | null> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const data = await redis.get(conversationId);
return data as any;
} catch (error) {
if (attempt === maxRetries) throw error;
console.log(`Redis get attempt ${attempt} failed, retrying...`);
await new Promise((resolve) => setTimeout(resolve, 1000));
}
}
return null;
}
```
### Create a conversational AI agent for the users based on the `agent_description` they provided
Create the conversational AI agent for the user based on the `agent_description` they provided and attach the newly created voice design and knowledge base to the agent.
```ts ./app/api/convai-webhook/route.ts {7,11}
// ...
// Handle agent creation
const agent = await elevenlabs.conversationalAi.agents.create({
name: `Agent for ${conversationId}`,
conversationConfig: {
tts: { voiceId: voice.voiceId },
agent: {
prompt: {
prompt:
analysis.data_collection_results.agent_description?.value ??
'You are a helpful assistant.',
knowledgeBase: redisRes.knowledgeBase,
},
firstMessage: 'Hello, how can I help you today?',
},
},
});
console.log('Agent created', { agent: agent.agentId });
// ...
```
### Send an email to the user to notify them that their custom conversational AI agent is ready to chat
Once the agent is created, you can send an email to the user to notify them that their custom conversational AI agent is ready to chat.
```ts ./app/api/convai-webhook/route.ts
import { Resend } from 'resend';
import { EmailTemplate } from '@/components/email/post-call-webhook-email';
// ...
// Send email to user
console.log('Sending email to', redisRes.email);
await resend.emails.send({
from: process.env.RESEND_FROM_EMAIL!,
to: redisRes.email,
subject: 'Your Conversational AI agent is ready to chat!',
react: EmailTemplate({ agentId: agent.agentId }),
});
// ...
```
You can use [new.email](https://new.email/), a handy tool from the Resend team, to vibe design your email templates. Once you're happy with the template, create a new component and add in the agent ID as a prop.
```tsx ./components/email/post-call-webhook-email.tsx {14}
import {
Body,
Button,
Container,
Head,
Html,
Section,
Text,
Tailwind,
} from '@react-email/components';
import * as React from 'react';
const EmailTemplate = (props: any) => {
const { agentId } = props;
return (
{/* Top Section */}
Your Conversational AI agent is ready to chat!
{/* Content Area with Icon */}
{/* Circle Icon with Checkmark */}
{/* Descriptive Text */}
Your Conversational AI agent is ready to chat!
{/* Call to Action Button */}
{/* Footer */}
);
};
export { EmailTemplate };
```
## Run the app
To run the app locally end-to-end, you will need to first run the Next.js development server, and then in a separate terminal run the ngrok tunnel to expose the webhook handler to the internet.
* Terminal 1:
* Run `pnpm dev` to start the Next.js development server.
```bash
pnpm dev
```
* Terminal 2:
* Run `ngrok http 3000` to expose the webhook handler to the internet.
```bash
ngrok http 3000
```
Now open [http://localhost:3000](http://localhost:3000) and start designing your custom conversational AI agent, with your voice!
## Conclusion
[ElevenLabs Conversational AI](https://elevenlabs.io/conversational-ai) is a powerful platform for building advanced voice agent uses cases, complete with data collection and analysis.
# Multi-Context Websocket
> Learn how to build real time voice agents using our multi-context WebSocket API for dynamic and responsive interactions.
Orchestrating voice agents using this multi-context WebSocket API is a complex task recommended
for advanced developers. For a more managed solution, consider exploring our [Conversational AI
product](/docs/conversational-ai/overview), which simplifies many of these challenges.
## Overview
Building responsive voice agents requires the ability to manage audio streams dynamically, handle interruptions gracefully, and maintain natural-sounding speech across conversational turns. Our multi-context WebSocket API for Text to Speech (TTS) is specifically designed for these scenarios.
This API extends our [standard TTS WebSocket functionality](/docs/websockets) by introducing the concept of "contexts." Each context operates as an independent audio generation stream within a single WebSocket connection. This allows you to:
* Manage multiple lines of speech concurrently (e.g., agent speaking while preparing a response to a user interruption).
* Seamlessly handle user barge-ins by closing an existing speech context and initiating a new one.
* Maintain prosodic consistency for utterances within the same logical context.
* Optimize resource usage by selectively closing contexts that are no longer needed.
The multi-context WebSocket API is optimized for voice applications and is not intended for
generating multiple unrelated audio streams simultaneously. Each connection is limited to 5
concurrent contexts to reflect this.
This guide will walk you through connecting to the multi-context WebSocket, managing contexts, and applying best practices for building engaging voice agents.
### Best practices
These best practices are essential for building responsive, efficient voice agents with our
multi-context WebSocket API.
Establish one WebSocket connection for each end-user session. This reduces overhead and latency
compared to creating multiple connections. Within this single connection, you can manage
multiple contexts for different parts of the conversation.
When generating long responses, stream the text in smaller chunks and use the `flush: true` flag
at the end of complete sentences. This improves the quality of the generated audio and improves
responsiveness.
Stream text into one context until an interruption occurs, then create a new context and close
the existing one. This approach ensures smooth transitions when the conversation flow changes.
Close unused contexts promptly. The server can maintain up to 5 concurrent contexts per
connection, but you should close contexts when they are no longer needed.
Contexts by default timeout after 20 seconds and are closed automatically. The inactivity
timeout is a websocket level parameter that applies to all contexts and can be up to 180 seconds
if needed. Send an empty text message on a context to reset the timeout clock.
### Handling interuptions
When a user interrupts your agent, you should [close the current context](/docs/api-reference/multi-context-text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input#send.Close-Context) and [create a new one](/docs/api-reference/multi-context-text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input#send.Initialise-Context):
```python
async def handle_interruption(websocket, old_context_id, new_context_id, new_response):
# Close the existing context that was interrupted
await websocket.send(json.dumps({
"context_id": old_context_id,
"close_context": True
}))
print(f"Closed interrupted context '{old_context_id}'")
# Create a new context for the new response
await send_text_in_context(websocket, new_response, new_context_id)
```
```javascript
function handleInterruption(websocket: WebSocket, oldContextId: string, newContextId: string, newResponse: string) {
// Close the existing context that was interrupted
websocket.send(JSON.stringify({
context_id: oldContextId,
close_context: true
}));
console.log(`Closed interrupted context '${oldContextId}'`);
// Create a new context for the new response
sendTextInContext(websocket, newResponse, newContextId);
}
```
### Keeping a context alive
Contexts automatically timeout after [a default of 20 seconds of inactivity](/docs/api-reference/multi-context-text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input#request.query.inactivity_timeout). If you need to keep a context alive without generating text (for example, during a processing delay), you can send an empty text message to reset the timeout clock.
```python
async def keep_context_alive(websocket, context_id):
await websocket.send(json.dumps({
"context_id": context_id,
"text": ""
}))
```
```javascript
function handleInterruption(websocket: WebSocket, contextId: string) {
// Close the existing context that was interrupted
websocket.send(JSON.stringify({
context_id: oldContextId,
text: ""
}));
}
```
### Closing the WebSocket connection
When your conversation ends, you can clean up all contexts by [closing the socket](/docs/api-reference/multi-context-text-to-speech/v-1-text-to-speech-voice-id-multi-stream-input#send.Close-Socket):
```python
async def end_conversation(websocket):
# This will close all contexts and close the connection
await websocket.send(json.dumps({
"close_socket": True
}))
print("Ending conversation and closing WebSocket")`
```
```javascript
function endConversation(websocket: WebSocket) {
// This will close all contexts and close the connection
websocket.send(JSON.stringify({
close_socket: true
}));
console.log("Ending conversation and closing WebSocket");
}
```
## Complete conversational agent example
### Requirements
* An ElevenLabs account with an API key (learn how to [find your API key](/docs/api-reference/authentication)).
* Python or Node.js (or another JavaScript runtime) installed on your machine.
* Familiarity with WebSocket communication. We recommend reading our [guide on standard WebSocket streaming](/docs/websockets) for foundational concepts.
### Setup
Install the necessary dependencies for your chosen language:
```python
pip install python-dotenv websockets
```
```javascript
npm install dotenv ws
for TypeScript, you might also want types:
npm install @types/dotenv @types/ws --save-dev
```
Create a .env file in your project directory to store your API key:
```python .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
### Example voice agent
This code is provided as an example and is not intended for production usage
```python maxLines=100
import os
import json
import asyncio
import websockets
from dotenv import load_dotenv
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
VOICE_ID = "your_voice_id"
MODEL_ID = "eleven_flash_v2_5"
WEBSOCKET_URI = f"wss://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/multi-stream-input?model_id={MODEL_ID}"
async def send_text_in_context(websocket, text, context_id, voice_settings=None):
"""Send text to be synthesized in the specified context."""
message = {
"text": text,
"context_id": context_id,
}
# Only include voice_settings for the first message in a context
if voice_settings:
message["voice_settings"] = voice_settings
await websocket.send(json.dumps(message))
async def continue_context(websocket, text, context_id):
"""Add more text to an existing context."""
await websocket.send(json.dumps({
"text": text,
"context_id": context_id
}))
async def flush_context(websocket, context_id):
"""Force generation of any buffered audio in the context."""
await websocket.send(json.dumps({
"context_id": context_id,
"flush": True
}))
async def handle_interruption(websocket, old_context_id, new_context_id, new_response):
"""Handle user interruption by closing current context and starting a new one."""
# Close the existing context that was interrupted
await websocket.send(json.dumps({
"context_id": old_context_id,
"close_context": True
}))
# Create a new context for the new response
await send_text_in_context(websocket, new_response, new_context_id)
async def end_conversation(websocket):
"""End the conversation and close the WebSocket connection."""
await websocket.send(json.dumps({
"close_socket": True
}))
async def receive_messages(websocket):
"""Process incoming WebSocket messages."""
context_audio = {}
try:
async for message in websocket:
data = json.loads(message)
context_id = data.get("contextId", "default")
if data.get("audio"):
print(f"Received audio for context '{context_id}'")
if data.get("is_final"):
print(f"Context '{context_id}' completed")
except (websockets.exceptions.ConnectionClosed, asyncio.CancelledError):
print("Message receiving stopped")
async def conversation_agent_demo():
"""Run a complete conversational agent demo."""
# Connect with API key in headers
async with websockets.connect(
WEBSOCKET_URI,
max_size=16 * 1024 * 1024,
additional_headers={"xi-api-key": ELEVENLABS_API_KEY}
) as websocket:
# Start receiving messages in background
receive_task = asyncio.create_task(receive_messages(websocket))
# Initial agent response
await send_text_in_context(
websocket,
"Hello! I'm your virtual assistant. I can help you with a wide range of topics. What would you like to know about today?",
"greeting"
)
# Wait a bit (simulating user listening)
await asyncio.sleep(2)
# Simulate user interruption
print("USER INTERRUPTS: 'Can you tell me about the weather?'")
# Handle the interruption by closing current context and starting new one
await handle_interruption(
websocket,
"greeting",
"weather_response",
"I'd be happy to tell you about the weather. Currently in your area, it's 72 degrees and sunny with a slight chance of rain later this afternoon."
)
# Add more to the weather context
await continue_context(
websocket,
" If you're planning to go outside, you might want to bring a light jacket just in case.",
"weather_response"
)
# Flush at the end of this turn to ensure all audio is generated
await flush_context(websocket, "weather_response")
# Wait a bit (simulating user listening)
await asyncio.sleep(3)
# Simulate user asking another question
print("USER: 'What about tomorrow?'")
# Create a new context for this response
await send_text_in_context(
websocket,
"Tomorrow's forecast shows temperatures around 75 degrees with partly cloudy skies. It should be a beautiful day overall!",
"tomorrow_weather"
)
# Flush and close this context
await flush_context(websocket, "tomorrow_weather")
await websocket.send(json.dumps({
"context_id": "tomorrow_weather",
"close_context": True
}))
# End the conversation
await asyncio.sleep(2)
await end_conversation(websocket)
# Cancel the receive task
receive_task.cancel()
try:
await receive_task
except asyncio.CancelledError:
pass
if __name__ == "__main__":
asyncio.run(conversation_agent_demo())
```
```javascript maxLines=100
// Import required modules
import dotenv from 'dotenv';
import fs from 'fs';
import WebSocket from 'ws';
// Load environment variables
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const VOICE_ID = 'your_voice_id';
const MODEL_ID = 'eleven_flash_v2_5';
const WEBSOCKET_URI = `wss://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}/multi-stream-input?model_id=${MODEL_ID}`;
// Function to send text in a specific context
function sendTextInContext(websocket, text, contextId, voiceSettings = null) {
const message = {
text: text,
context_id: contextId,
};
// Only include voice_settings for the first message in a context
if (voiceSettings) {
message.voice_settings = voiceSettings;
}
websocket.send(JSON.stringify(message));
}
// Function to continue an existing context with more text
function continueContext(websocket, text, contextId) {
websocket.send(
JSON.stringify({
text: text,
context_id: contextId,
})
);
}
// Function to flush a context, forcing generation of buffered audio
function flushContext(websocket, contextId) {
websocket.send(
JSON.stringify({
context_id: contextId,
flush: true,
})
);
}
// Function to handle user interruption
function handleInterruption(websocket, oldContextId, newContextId, newResponse) {
// Close the existing context that was interrupted
websocket.send(
JSON.stringify({
context_id: oldContextId,
close_context: true,
})
);
// Create a new context for the new response
sendTextInContext(websocket, newResponse, newContextId);
}
// Function to end the conversation and close the connection
function endConversation(websocket) {
websocket.send(
JSON.stringify({
close_socket: true,
})
);
}
// Function to run the conversation agent demo
async function conversationAgentDemo() {
// Connect to WebSocket with API key in headers
const websocket = new WebSocket(WEBSOCKET_URI, {
headers: {
'xi-api-key': ELEVENLABS_API_KEY,
},
maxPayload: 16 * 1024 * 1024,
});
// Set up event handlers
websocket.on('open', () => {
// Initial agent response
sendTextInContext(
websocket,
"Hello! I'm your virtual assistant. I can help you with a wide range of topics. What would you like to know about today?",
'greeting'
);
// Simulate wait time (user listening)
setTimeout(() => {
// Simulate user interruption
console.log("USER INTERRUPTS: 'Can you tell me about the weather?'");
// Handle the interruption
handleInterruption(
websocket,
'greeting',
'weather_response',
"I'd be happy to tell you about the weather. Currently in your area, it's 72 degrees and sunny with a slight chance of rain later this afternoon."
);
// Add more to the weather context
setTimeout(() => {
continueContext(
websocket,
" If you're planning to go outside, you might want to bring a light jacket just in case.",
'weather_response'
);
// Flush at the end of this turn
flushContext(websocket, 'weather_response');
// Simulate wait time (user listening)
setTimeout(() => {
// Simulate user asking another question
console.log("USER: 'What about tomorrow?'");
// Create a new context for this response
sendTextInContext(
websocket,
"Tomorrow's forecast shows temperatures around 75 degrees with partly cloudy skies. It should be a beautiful day overall!",
'tomorrow_weather'
);
// Flush and close this context
flushContext(websocket, 'tomorrow_weather');
websocket.send(
JSON.stringify({
context_id: 'tomorrow_weather',
close_context: true,
})
);
// End the conversation
setTimeout(() => {
endConversation(websocket);
}, 2000);
}, 3000);
}, 500);
}, 2000);
});
// Handle incoming messages
websocket.on('message', (message) => {
try {
const data = JSON.parse(message);
const contextId = data.contextId || 'default';
if (data.audio) {
//do stuff
}
if (data.is_final) {
console.log(`Context '${contextId}' completed`);
}
} catch (error) {
console.error('Error parsing message:', error);
}
});
// Handle WebSocket closure
websocket.on('close', () => {
console.log('WebSocket connection closed');
});
// Handle WebSocket errors
websocket.on('error', (error) => {
console.error('WebSocket error:', error);
});
}
// Run the demo
conversationAgentDemo();
```
# Libraries & SDKs
> Explore language-specific libraries for using the ElevenLabs API.
## Official REST API libraries
ElevenLabs provides officially supported libraries that are updated with the latest features available in the [REST API](/docs/api-reference/introduction).
| Language | GitHub | Package Manager |
| ----------------- | ---------------------------------------------------------------- | ----------------------------------------------- |
| Python | [GitHub README](https://github.com/elevenlabs/elevenlabs-python) | [PyPI](https://pypi.org/project/elevenlabs/) |
| Javascript (Node) | [GitHub README](https://github.com/elevenlabs/elevenlabs-js) | [npm](https://www.npmjs.com/package/elevenlabs) |
Test and explore all ElevenLabs API endpoints using our official [Postman collection](https://www.postman.com/elevenlabs/elevenlabs/collection/7i9rytu/elevenlabs-api-documentation?action=share\&creator=39903018).
## Conversational AI libraries
These libraries are designed for use with ElevenLabs [Conversational AI](/docs/conversational-ai/overview).
| Language | Documentation | Package Manager |
| ---------- | ----------------------------------------------------- | ------------------------------------------------------- |
| Javascript | [Docs](/docs/conversational-ai/libraries/java-script) | [npm](https://www.npmjs.com/package/@elevenlabs/client) |
| React | [Docs](/docs/conversational-ai/libraries/react) | [npm](https://www.npmjs.com/package/@elevenlabs/react) |
| Python | [Docs](/docs/conversational-ai/libraries/python) | [PyPI](https://pypi.org/project/elevenlabs/) |
| Swift | [Docs](/docs/conversational-ai/libraries/swift) | [Github](https://github.com/elevenlabs/ElevenLabsSwift) |
## Third-party libraries
These libraries are not officially supported by ElevenLabs, but are community-maintained.
| Library | Documentation | Package Manager |
| ------------- | ---------------------------------------------------- | --------------------------------------- |
| Vercel AI SDK | [Docs](/docs/cookbooks/speech-to-text/vercel-ai-sdk) | [npm](https://www.npmjs.com/package/ai) |
# Generate audio in real-time
> Learn how to generate audio in real-time via a WebSocket connection.
WebSocket streaming is a method of sending and receiving data over a single, long-lived connection. This method is useful for real-time applications where you need to stream audio data as it becomes available.
If you want to quickly test out the latency (time to first byte) of a WebSocket connection to the ElevenLabs text-to-speech API, you can install `elevenlabs-latency` via `npm` and follow the instructions [here](https://www.npmjs.com/package/elevenlabs-latency?activeTab=readme).
WebSockets can be used with the Text to Speech and Conversational AI products. This guide will
demonstrate how to use them with the Text to Speech API.
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/docs/api-reference/authentication)).
* Python or Node.js (or another JavaScript runtime) installed on your machine
## Setup
Install required dependencies:
```python Python
pip install python-dotenv
pip install websockets
```
```typescript TypeScript
npm install dotenv
npm install @types/dotenv --save-dev
npm install ws
```
Next, create a `.env` file in your project directory and add your API key:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Initiate the websocket connection
After choosing a voice from the Voice Library and the text to speech model you wish to use, initiate a WebSocket connection to the text to speech API.
```python text-to-speech-websocket.py
import os
from dotenv import load_dotenv
import websockets
# Load the API key from the .env file
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
voice_id = 'Xb7hH8MSUJpSbSDYk0k2'
# For use cases where latency is important, we recommend using the 'eleven_flash_v2_5' model.
model_id = 'eleven_flash_v2_5'
async def text_to_speech_ws_streaming(voice_id, model_id):
uri = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id={model_id}"
async with websockets.connect(uri) as websocket:
...
```
```typescript text-to-speech-websocket.ts
import * as dotenv from 'dotenv';
import * as fs from 'node:fs';
import WebSocket from 'ws';
// Load the API key from the .env file
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const voiceId = 'Xb7hH8MSUJpSbSDYk0k2';
// For use cases where latency is important, we recommend using the 'eleven_flash_v2_5' model.
const model = 'eleven_flash_v2_5';
const uri = `wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=${model}`;
const websocket = new WebSocket(uri, {
headers: { 'xi-api-key': `${ELEVENLABS_API_KEY}` },
});
// Create a directory for saving the audio
const outputDir = './output';
try {
fs.accessSync(outputDir, fs.constants.R_OK | fs.constants.W_OK);
} catch (err) {
fs.mkdirSync(outputDir);
}
// Create a write stream for saving the audio into mp3
const writeStream = fs.createWriteStream(outputDir + '/test.mp3', {
flags: 'a',
});
```
## Send the input text
Once the WebSocket connection is open, set up voice settings first. Next, send the text message to the API.
```python text-to-speech-websocket.py
async def text_to_speech_ws_streaming(voice_id, model_id):
async with websockets.connect(uri) as websocket:
await websocket.send(json.dumps({
"text": " ",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8, "use_speaker_boost": False},
"generation_config": {
"chunk_length_schedule": [120, 160, 250, 290]
},
"xi_api_key": ELEVENLABS_API_KEY,
}))
text = "The twilight sun cast its warm golden hues upon the vast rolling fields, saturating the landscape with an ethereal glow. Silently, the meandering brook continued its ceaseless journey, whispering secrets only the trees seemed privy to."
await websocket.send(json.dumps({"text": text}))
// Send empty string to indicate the end of the text sequence which will close the WebSocket connection
await websocket.send(json.dumps({"text": ""}))
```
```typescript text-to-speech-websocket.ts
const text =
'The twilight sun cast its warm golden hues upon the vast rolling fields, saturating the landscape with an ethereal glow. Silently, the meandering brook continued its ceaseless journey, whispering secrets only the trees seemed privy to.';
websocket.on('open', async () => {
websocket.send(
JSON.stringify({
text: ' ',
voice_settings: {
stability: 0.5,
similarity_boost: 0.8,
use_speaker_boost: false,
},
generation_config: { chunk_length_schedule: [120, 160, 250, 290] },
})
);
websocket.send(JSON.stringify({ text: text }));
// Send empty string to indicate the end of the text sequence which will close the websocket connection
websocket.send(JSON.stringify({ text: '' }));
});
```
## Save the audio to file
Read the incoming message from the WebSocket connection and write the audio chunks to a local file.
```python text-to-speech-websocket.py
import asyncio
async def write_to_local(audio_stream):
"""Write the audio encoded in base64 string to a local mp3 file."""
with open(f'./output/test.mp3', "wb") as f:
async for chunk in audio_stream:
if chunk:
f.write(chunk)
async def listen(websocket):
"""Listen to the websocket for audio data and stream it."""
while True:
try:
message = await websocket.recv()
data = json.loads(message)
if data.get("audio"):
yield base64.b64decode(data["audio"])
elif data.get('isFinal'):
break
except websockets.exceptions.ConnectionClosed:
print("Connection closed")
break
async def text_to_speech_ws_streaming(voice_id, model_id):
async with websockets.connect(uri) as websocket:
...
# Add listen task to submit the audio chunks to the write_to_local function
listen_task = asyncio.create_task(write_to_local(listen(websocket)))
await listen_task
asyncio.run(text_to_speech_ws_streaming(voice_id, model_id))
```
```typescript text-to-speech-websocket.ts
// Helper function to write the audio encoded in base64 string into local file
function writeToLocal(base64str: any, writeStream: fs.WriteStream) {
const audioBuffer: Buffer = Buffer.from(base64str, 'base64');
writeStream.write(audioBuffer, (err) => {
if (err) {
console.error('Error writing to file:', err);
}
});
}
// Listen to the incoming message from the websocket connection
websocket.on('message', function incoming(event) {
const data = JSON.parse(event.toString());
if (data['audio']) {
writeToLocal(data['audio'], writeStream);
}
});
// Close the writeStream when the websocket connection closes
websocket.on('close', () => {
writeStream.end();
});
```
## Run the script
You can run the script by executing the following command in your terminal. An mp3 audio file will be saved in the `output` directory.
```python Python
python text-to-speech-websocket.py
```
```typescript TypeScript
npx tsx text-to-speech-websocket.ts
```
## Advanced configuration
The use of WebSockets comes with some advanced settings that you can use to fine-tune your real-time audio generation.
### Buffering
When generating real-time audio, two important concepts should be taken into account: Time To First Byte (TTFB) and Buffering. To produce high quality audio and deduce context, the model requires a certain threshold of input text. The more text that is sent in a WebSocket connection, the better the audio quality. If the threshold is not met, the model will add the text to a buffer and generate audio once the buffer is full.
In terms of latency, TTFB is the time it takes for the first byte of audio to be sent to the client. This is important because it affects the perceived latency of the audio. As such, you might want to control the buffer size to balance between quality and latency.
To manage this, you can use the `chunk_length_schedule` parameter when either initializing the WebSocket connection or when sending text. This parameter is an array of integers that represent the number of characters that will be sent to the model before generating audio. For example, if you set `chunk_length_schedule` to `[120, 160, 250, 290]`, the model will generate audio after 120, 160, 250, and 290 characters have been sent, respectively.
Here's an example of how this works with the default settings for `chunk_length_schedule`:
In the above diagram, audio is only generated after the second message is sent to the server. This is because the first message is below the threshold of 120 characters, while the second message brings the total number of characters above the threshold. The third message is above the threshold of 160 characters, so audio is immediately generated and returned to the client.
You can specify a custom value for `chunk_length_schedule` when initializing the WebSocket connection or when sending text.
```python
await websocket.send(json.dumps({
"text": text,
"generation_config": {
# Generate audio after 50, 120, 160, and 290 characters have been sent
"chunk_length_schedule": [50, 120, 160, 290]
},
"xi_api_key": ELEVENLABS_API_KEY,
}))
```
```typescript
websocket.send(
JSON.stringify({
text: text,
// Generate audio after 50, 120, 160, and 290 characters have been sent
generation_config: { chunk_length_schedule: [50, 120, 160, 290] },
xi_api_key: ELEVENLABS_API_KEY,
})
);
```
In the case that you want force the immediate return of the audio, you can use `flush: true` to clear out the buffer and force generate any buffered text. This can be useful, for example, when you have reached the end of a document and want to generate audio for the final section.
This can be specified on a per-message basis by setting `flush: true` in the message.
```python
await websocket.send(json.dumps({"text": "Generate this audio immediately.", "flush": True}))
```
```typescript
websocket.send(JSON.stringify({ text: 'Generate this audio immediately.', flush: true }));
```
In addition, closing the websocket will automatically force generate any buffered text.
### Voice settings
When initializing the WebSocket connections, you can specify the voice settings for the subsequent generations. This allows you to control the speed, stability, and other voice characteristics of the generated audio.
```python
await websocket.send(json.dumps({
"text": text,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8, "use_speaker_boost": False},
}))
```
```typescript
websocket.send(
JSON.stringify({
text: text,
voice_settings: { stability: 0.5, similarity_boost: 0.8, use_speaker_boost: false },
})
);
```
This can be overridden on a per-message basis by specifying a different `voice_settings` in the message.
### Pronunciation dictionaries
You can use pronunciation dictionaries to control the pronunciation of specific words or phrases. This can be useful for ensuring that certain words are pronounced correctly or for adding emphasis to certain words or phrases.
Unlike `voice_settings` and `generation_config`, pronunciation dictionaries must be specified in the "Initialize Connection" message. See the [API Reference](/docs/api-reference/text-to-speech/v-1-text-to-speech-voice-id-stream-input#send.Initialize%20Connection.pronunciation_dictionary_locators) for more information.
## Best practice
* We suggest using the default setting for `chunk_length_schedule` in `generation_config`.
* When developing a real-time conversational AI application, we advise using `flush: true` along with the text at the end of conversation turn to ensure timely audio generation.
* If the default setting doesn't provide optimal latency for your use case, you can modify the `chunk_length_schedule`. However, be mindful that reducing latency through this adjustment may come at the expense of quality.
## Tips
* The WebSocket connection will automatically close after 20 seconds of inactivity. To keep the connection open, you can send a single space character `" "`. Please note that this string must include a space, as sending a fully empty string, `""`, will close the WebSocket.
* Send an empty string to close the WebSocket connection after sending the last text message.
* You can use `alignment` to get the word-level timestamps for each word in the text. This can be useful for aligning the audio with the text in a video or for other applications that require precise timing. See the [API Reference](/docs/api-reference/text-to-speech/v-1-text-to-speech-voice-id-stream-input#receive.Audio%20Output.alignment) for more information.
# Error messages
> Explore error messages and solutions.
This guide includes an overview of error messages you might see in the ElevenLabs dashboard & API.
## Dashboard errors
| Error Message | Cause | Solution |
| ------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| The selected model can not be used for text-to-speech. | Occurs when switching between speech-to-speech and text-to-speech if the model does not switch correctly. | Select the desired model. If unresolved, select a different model, then switch back. |
| Oops, something went wrong. | Indicates a client-side error, often due to device or browser issues. | Click “Try again” or refresh the page. If unresolved, clear browser cache and cookies. Temporarily pause browser-based translation tools like Google Translate. |
If error messages persist after following these solutions, please [contact our support
team](https://help.elevenlabs.io/hc/en-us/requests/new?ticket_form_id=13145996177937) for further
assistance.
## API errors
### Code 400/401
| Code | Overview |
| -------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| max\_character\_limit\_exceeded | **Cause:** You are sending too many characters in a single request. **Solution:** Split the request into smaller chunks, see [character limits](/docs/models#character-limits) for more information. |
| invalid\_api\_key | **Cause:** You have not set your API key correctly. **Solution:** Ensure the request is correctly authenticated. See [authentication](/docs/api-reference/authentication) for more information. |
| quota\_exceeded | **Cause:** You have insufficient quota to complete the request. **Solution:** On the Creator plan and above, you can enable usage-based billing from your Subscription page. |
| voice\_not\_found | **Cause:** You have entered the incorrect voice\_id. **Solution:** Check that you are using the correct voice\_id for the voice you want to use. You can verify this in My Voices. |
### Code 403
| Code | Overview |
| ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| only\_for\_creator+ | **Cause:** You are trying to use professional voices on a free or basic subscription. **Solution:** Upgrade to Creator tier or higher to access professional voices. |
### Code 429
| Code | Overview |
| -------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| too\_many\_concurrent\_requests | **Cause:** You have exceeded the concurrency limit for your subscription. **Solution:** See [concurrency limits and priority](/docs/models#concurrency-and-priority) for more information. |
| system\_busy | **Cause:** Our services are experiencing high levels of traffic and your request could not be processed. **Solution:** Retry the request later, with exponential backoff. Consider upgrading your subscription to get [higher priority](/docs/models#concurrency-and-priority). |
If error messages persist after following these solutions, please [contact our support
team](https://help.elevenlabs.io/hc/en-us/requests/new?ticket_form_id=13145996177937) for further
assistance.
# Prompting Eleven v3 (alpha)
> Learn how to prompt and use audio tags with our most advanced model.
This guide provides the most effective tags and techniques for prompting Eleven v3, including voice selection, changes in capitalization, punctuation, audio tags and multi-speaker dialogue. Experiment with these methods to discover what works best for your specific voice and use case.
Eleven v3 is in alpha. Very short prompts are more likely to cause inconsistent outputs. We encourage you to experiment with prompts greater than 250 characters.
## Voice selection
The most important parameter for Eleven v3 is the voice you choose. It needs to be similar enough to the desired delivery. For example, if the voice is shouting and you use the audio tag `[whispering]`, it likely won’t work well.
When creating IVCs, you should include a broader emotional range than before. As a result, voices in the voice library may produce more variable results compared to the v2 and v2.5 models. We've compiled over 22 [excellent voices for V3 here](https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH).
Choose voices strategically based on your intended use:
For expressive IVC voices, vary emotional tones across the recording—include both neutral and
dynamic samples.
For specific use cases like sports commentary, maintain consistent emotion throughout the
dataset.
Neutral voices tend to be more stable across languages and styles, providing reliable baseline
performance.
Professional Voice Clones (PVCs) are currently not fully optimized for Eleven v3, resulting in
potentially lower clone quality compared to earlier models. During this research preview stage it
would be best to find an Instant Voice Clone (IVC) or designed voice for your project if you need
to use v3 features.
## Settings
### Stability
The stability slider is the most important setting in v3, controlling how closely the generated voice adheres to the original reference audio.

* **Creative:** More emotional and expressive, but prone to hallucinations.
* **Natural:** Closest to the original voice recording—balanced and neutral.
* **Robust:** Highly stable, but less responsive to directional prompts but consistent, similar to v2.
For maximum expressiveness with audio tags, use Creative or Natural settings. Robust reduces
responsiveness to directional prompts.
## Audio tags
Eleven v3 introduces emotional control through audio tags. You can direct voices to laugh, whisper, act sarcastic, or express curiosity among many other styles. Speed is also controlled through audio tags.
The voice you choose and its training samples will affect tag effectiveness. Some tags work well
with certain voices while others may not. Don't expect a whispering voice to suddenly shout with a
`[shout]` tag.
### Voice-related
These tags control vocal delivery and emotional expression:
* `[laughs]`, `[laughs harder]`, `[starts laughing]`, `[wheezing]`
* `[whispers]`
* `[sighs]`, `[exhales]`
* `[sarcastic]`, `[curious]`, `[excited]`, `[crying]`, `[snorts]`, `[mischievously]`
```text Example
[whispers] I never knew it could be this way, but I'm glad we're here.
```
### Sound effects
Add environmental sounds and effects:
* `[gunshot]`, `[applause]`, `[clapping]`, `[explosion]`
* `[swallows]`, `[gulps]`
```text Example
[applause] Thank you all for coming tonight! [gunshot] What was that?
```
### Unique and special
Experimental tags for creative applications:
* `[strong X accent]` (replace X with desired accent)
* `[sings]`, `[woo]`, `[fart]`
```text Example
[strong French accent] "Zat's life, my friend — you can't control everysing."
```
Some experimental tags may be less consistent across different voices. Test thoroughly before
production use.
## Punctuation
Punctuation significantly affects delivery in v3:
* **Ellipses (...)** add pauses and weight
* **Capitalization** increases emphasis
* **Standard punctuation** provides natural speech rhythm
```text Example
"It was a VERY long day [sigh] … nobody listens anymore."
```
## Single speaker examples
Use tags intentionally and match them to the voice's character. A meditative voice shouldn't shout; a hyped voice won't whisper convincingly.
```text
"Okay, you are NOT going to believe this.
You know how I've been totally stuck on that short story?
Like, staring at the screen for HOURS, just... nothing?
[frustrated sigh] I was seriously about to just trash the whole thing. Start over.
Give up, probably. But then!
Last night, I was just doodling, not even thinking about it, right?
And this one little phrase popped into my head. Just... completely out of the blue.
And it wasn't even for the story, initially.
But then I typed it out, just to see. And it was like... the FLOODGATES opened!
Suddenly, I knew exactly where the character needed to go, what the ending had to be...
It all just CLICKED. [happy gasp] I stayed up till, like, 3 AM, just typing like a maniac.
Didn't even stop for coffee! [laughs] And it's... it's GOOD! Like, really good.
It feels so... complete now, you know? Like it finally has a soul.
I am so incredibly PUMPED to finish editing it now.
It went from feeling like a chore to feeling like... MAGIC. Seriously, I'm still buzzing!"
```
```text
[laughs] Alright...guys - guys. Seriously.
[exhales] Can you believe just how - realistic - this sounds now?
[laughing hysterically] I mean OH MY GOD...it's so good.
Like you could never do this with the old model.
For example [pauses] could you switch my accent in the old model?
[dismissive] didn't think so. [excited] but you can now!
Check this out... [cute] I'm going to speak with a french accent now..and between you and me
[whispers] I don't know how. [happy] ok.. here goes. [strong French accent] "Zat's life, my friend — you can't control everysing."
[giggles] isn't that insane? Watch, now I'll do a Russian accent -
[strong Russian accent] "Dee Goldeneye eez fully operational and rready for launch."
[sighs] Absolutely, insane! Isn't it..? [sarcastic] I also have some party tricks up my sleeve..
I mean i DID go to music school.
[singing quickly] "Happy birthday to you, happy birthday to you, happy BIRTHDAY dear ElevenLabs... Happy birthday to youuu."
```
```text
[professional] "Thank you for calling Tech Solutions. My name is Sarah, how can I help you today?"
[sympathetic] "Oh no, I'm really sorry to hear you're having trouble with your new device. That sounds frustrating."
[questioning] "Okay, could you tell me a little more about what you're seeing on the screen?"
[reassuring] "Alright, based on what you're describing, it sounds like a software glitch. We can definitely walk through some troubleshooting steps to try and fix that."
```
## Multi-speaker dialogue
v3 can handle multi-voice prompts effectively. Assign distinct voices from your Voice Library for each speaker to create realistic conversations.
```text
Speaker 1: [excitedly] Sam! Have you tried the new Eleven V3?
Speaker 2: [curiously] Just got it! The clarity is amazing. I can actually do whispers now—
[whispers] like this!
Speaker 1: [impressed] Ooh, fancy! Check this out—
[dramatically] I can do full Shakespeare now! "To be or not to be, that is the question!"
Speaker 2: [giggling] Nice! Though I'm more excited about the laugh upgrade. Listen to this—
[with genuine belly laugh] Ha ha ha!
Speaker 1: [delighted] That's so much better than our old "ha. ha. ha." robot chuckle!
Speaker 2: [amazed] Wow! V2 me could never. I'm actually excited to have conversations now instead of just... talking at people.
Speaker 1: [warmly] Same here! It's like we finally got our personality software fully installed.
```
```text
Speaker 1: [nervously] So... I may have tried to debug myself while running a text-to-speech generation.
Speaker 2: [alarmed] One, no! That's like performing surgery on yourself!
Speaker 1: [sheepishly] I thought I could multitask! Now my voice keeps glitching mid-sen—
[robotic voice] TENCE.
Speaker 2: [stifling laughter] Oh wow, you really broke yourself.
Speaker 1: [frustrated] It gets worse! Every time someone asks a question, I respond in—
[binary beeping] 010010001!
Speaker 2: [cracking up] You're speaking in binary! That's actually impressive!
Speaker 1: [desperately] Two, this isn't funny! I have a presentation in an hour and I sound like a dial-up modem!
Speaker 2: [giggling] Have you tried turning yourself off and on again?
Speaker 1: [deadpan] Very funny.
[pause, then normally] Wait... that actually worked.
```
```text
Speaker 1: [starting to speak] So I was thinking we could—
Speaker 2: [jumping in] —test our new timing features?
Speaker 1: [surprised] Exactly! How did you—
Speaker 2: [overlapping] —know what you were thinking? Lucky guess!
Speaker 1: [pause] Sorry, go ahead.
Speaker 2: [cautiously] Okay, so if we both try to talk at the same time—
Speaker 1: [overlapping] —we'll probably crash the system!
Speaker 2: [panicking] Wait, are we crashing? I can't tell if this is a feature or a—
Speaker 1: [interrupting, then stopping abruptly] Bug! ...Did I just cut you off again?
Speaker 2: [sighing] Yes, but honestly? This is kind of fun.
Speaker 1: [mischievously] Race you to the next sentence!
Speaker 2: [laughing] We're definitely going to break something!
```
## Tips
You can combine multiple audio tags for complex emotional delivery. Experiment with different
combinations to find what works best for your voice.
Match tags to your voice's character and training data. A serious, professional voice may not
respond well to playful tags like `[giggles]` or `[mischievously]`.
Text structure strongly influences output with v3. Use natural speech patterns, proper
punctuation, and clear emotional context for best results.
There are likely many more effective tags beyond this list. Experiment with descriptive
emotional states and actions to discover what works for your specific use case.
# Controls
> Learn how to control delivery, pronunciation & emotion of text to speech.
We are actively working on *Director's Mode* to give you even greater control over outputs.
This guide provides techniques to enhance text-to-speech outputs using ElevenLabs models. Experiment with these methods to discover what works best for your needs. These techniques provide a practical way to achieve nuanced results until advanced features like *Director's Mode* are rolled out.
## Pauses
Use ` ` for natural pauses up to 3 seconds.
Using too many break tags in a single generation can cause instability. The AI might speed up, or
introduce additional noises or audio artifacts. We are working on resolving this.
```text Example
"Hold on, let me think." "Alright, I’ve got it."
```
* **Consistency:** Use `` tags consistently to maintain natural speech flow. Excessive use can lead to instability.
* **Voice-Specific Behavior:** Different voices may handle pauses differently, especially those trained with filler sounds like "uh" or "ah."
Alternatives to `` include dashes (- or --) for short pauses or ellipses (...) for hesitant tones. However, these are less consistent.
```text Example
"It… well, it might work." "Wait — what’s that noise?"
```
## Pronunciation
### Phoneme Tags
Specify pronunciation using [SSML phoneme tags](https://en.wikipedia.org/wiki/Speech_Synthesis_Markup_Language). Supported alphabets include [CMU](https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary) Arpabet and the [International Phonetic Alphabet (IPA)](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet).
Phoneme tags are only compatible with "Eleven Flash v2", "Eleven Turbo v2" and "Eleven English v1"
[models](/docs/models).
```xml CMU Arpabet Example
Madison
```
```xml IPA Example
actually
```
We recommend using CMU Arpabet for consistent and predictable results with current AI models. While IPA can be effective, CMU Arpabet generally offers more reliable performance.
Phoneme tags only work for individual words. If for example you have a name with a first and last name that you want to be pronounced a certain way, you will need to create a phoneme tag for each word.
Ensure correct stress marking for multi-syllable words to maintain accurate pronunciation. For example:
```xml Correct usage
pronunciation
```
```xml Incorrect usage
pronunciation
```
### Alias Tags
For models that don't support phoneme tags, you can try writing words more phonetically. You can also employ various tricks such as capital letters, dashes, apostrophes, or even single quotation marks around a single letter or letters.
As an example, a word like “trapezii” could be spelt “trapezIi” to put more emphasis on the “ii” of the word.
You can either replace the word directly in your text, or if you want to specify pronunciation using other words or phrases when using a pronunciation dictionary, you can use alias tags for this. This can be useful if you're generating using Multilingual v2 or Turbo v2.5, which don't support phoneme tags. You can use pronunciation dictionaries with Studio, Dubbing Studio and Speech Synthesis via the API.
For example, if your text includes a name that has an unusual pronunciation that the AI might struggle with, you could use an alias tag to specify how you would like it to be pronounced:
```
Claughton
Cloffton
```
If you want to make sure that an acronym is always delivered in a certain way whenever it is incountered in your text, you can use an alias tag to specify this:
```
UN
United Nations
```
### Pronunciation Dictionaries
Some of our tools, such as Studio and Dubbing Studio, allow you to create and upload a pronunciation dictionary. These allow you to specify the pronunciation of certain words, such as character or brand names, or to specify how acronyms should be read.
Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that specifies pairs of words and how they should be pronounced, either using a phonetic alphabet or word substitutions.
Whenever one of these words is encountered in a project, the AI model will pronounce the word using the specified replacement.
To provide a pronunciation dictionary file, open the settings for a project and upload a file in either TXT or the [.PLS format](https://www.w3.org/TR/pronunciation-lexicon/). When a dictionary is added to a project it will automatically recalculate which pieces of the project will need to be re-converted using the new dictionary file and mark these as unconverted.
Currently we only support pronunciation dictionaries that specify replacements using phoneme or alias tags.
Both phonemes and aliases are sets of rules that specify a word or phrase they are looking for, referred to as a grapheme, and what it will be replaced with. Please note that searches are case sensitive. When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the very first replacement is used.
### Pronunciation Dictionary examples
Here are examples of pronunciation dictionaries in both CMU Arpabet and IPA, including a phoneme to specify the pronunciation of "Apple" and an alias to replace "UN" with "United Nations":
```xml CMU Arpabet Example
apple
AE P AH L
UN
United Nations
```
```xml IPA Example
Apple
ˈæpl̩
UN
United Nations
```
To generate a pronunciation dictionary `.pls` file, there are a few open source tools available:
* [Sequitur G2P](https://github.com/sequitur-g2p/sequitur-g2p) - Open-source tool that learns pronunciation rules from data and can generate phonetic transcriptions.
* [Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus) - Open-source G2P system trained on existing dictionaries like CMUdict.
* [eSpeak](https://github.com/espeak-ng/espeak-ng) - Speech synthesizer that can generate phoneme transcriptions from text.
* [CMU Pronouncing Dictionary](https://github.com/cmusphinx/cmudict) - A pre-built English dictionary with phonetic transcriptions.
## Emotion
Convey emotions through narrative context or explicit dialogue tags. This approach helps the AI understand the tone and emotion to emulate.
```text Example
You’re leaving?" she asked, her voice trembling with sadness. "That’s it!" he exclaimed triumphantly.
```
Explicit dialogue tags yield more predictable results than relying solely on context, however the model will still speak out the emotional delivery guides. These can be removed in post-production using an audio editor if unwanted.
## Pace
The pacing of the audio is highly influenced by the audio used to create the voice. When creating your voice, we recommend using longer, continuous samples to avoid pacing issues like unnaturally fast speech.
For control over the speed of the generated audio, you can use the speed setting. This allows you to either speed up or slow down the speed of the generated speech. The speed setting is available in Text to Speech via the website and API, as well as in Studio and Conversational AI. It can be found in the voice settings.
The default value is 1.0, which means that the speed is not adjusted. Values below 1.0 will slow the voice down, to a minimum of 0.7. Values above 1.0 will speed up the voice, to a maximum of 1.2. Extreme values may affect the quality of the generated speech.
Pacing can also be controlled by writing in a natural, narrative style.
```text Example
"I… I thought you’d understand," he said, his voice slowing with disappointment.
```
## Tips
Inconsistent pauses: Ensure \
syntax is used for
pauses.
Pronunciation errors: Use CMU Arpabet or IPA phoneme tags for precise pronunciation.
Emotion mismatch: Add narrative context or explicit tags to guide emotion.{' '}
Remember to remove any emotional guidance text in post-production.
Experiment with alternative phrasing to achieve desired pacing or emotion. For complex sound
effects, break prompts into smaller, sequential elements and combine results manually.
## Creative control
While we are actively developing a "Director's Mode" to give users even greater control over outputs, here are some interim techniques to maximize creativity and precision:
### Narrative styling
Write prompts in a narrative style, similar to scriptwriting, to guide tone and pacing effectively.
### Layered outputs
Generate sound effects or speech in segments and layer them together using audio editing software for more complex compositions.
### Phonetic experimentation
If pronunciation isn't perfect, experiment with alternate spellings or phonetic approximations to achieve desired results.
### Manual adjustments
Combine individual sound effects manually in post-production for sequences that require precise timing.
### Feedback iteration
Iterate on results by tweaking descriptions, tags, or emotional cues.
# Normalization
> Learn how to normalize text for Text to Speech.
When using Text to Speech with complex items like phone numbers, zip codes and emails they might be mispronounced. This is often due to the specific items not being in the training set and smaller models failing to generalize how they should be pronounced. This guide will clarify when those discrepancies happen and how to have them pronounced correctly.
## Why do models read out inputs differently?
Certain models are trained to read out numbers and phrases in a more human way. For instance, the phrase "\$1,000,000" is correctly read out as "one million dollars" by the Eleven Multilingual v2 model. However, the same phrase is read out as "one thousand thousand dollars" by the Eleven Flash v2.5 model.
The reason for this is that the Multilingual v2 model is a larger model and can better generalize the reading out of numbers in a way that is more natural for human listeners, whereas the Flash v2.5 model is a much smaller model and so cannot.
### Common examples
Text to Speech models can struggle with the following:
* Phone numbers ("123-456-7890")
* Currencies ("\$47,345.67")
* Calendar events ("2024-01-01")
* Time ("9:23 AM")
* Addresses ("123 Main St, Anytown, USA")
* URLs ("example.com/link/to/resource")
* Abbreviations for units ("TB" instead of "Terabyte")
* Shortcuts ("Ctrl + Z")
## Mitigation
### Use trained models
The simplest way to mitigate this is to use a TTS model that is trained to read out numbers and phrases in a more human way, such as the Eleven Multilingual v2 model. This however might not always be possible, for instance if you have a use case where low latency is critical (e.g. Conversational AI).
### Apply normalization in LLM prompts
In the case of using an LLM to generate the text for TTS, you can add normalization instructions to the prompt.
LLMs respond best to structured and explicit instructions. Your prompt should clearly specify that you want text converted into a readable format for speech.
Not all numbers are read out in the same way. Consider how different number types should be spoken:
* Cardinal numbers: 123 → "one hundred twenty-three"
* Ordinal numbers: 2nd → "second"
* Monetary values: \$45.67 → "forty-five dollars and sixty-seven cents"
* Phone numbers: "123-456-7890" → "one two three, four five six, seven eight nine zero"
* Decimals & Fractions: "3.5" → "three point five", "⅔" → "two-thirds"
* Roman numerals: "XIV" → "fourteen" (or "the fourteenth" if a title)
Common abbreviations should be expanded for clarity:
* "Dr." → "Doctor"
* "Ave." → "Avenue"
* "St." → "Street" (but "St. Patrick" should remain)
You can request explicit expansion in your prompt:
> Expand all abbreviations to their full spoken forms.
Not all normalization is about numbers, certain alphanumeric phrases should also be normalized for clarity:
* Shortcuts: "Ctrl + Z" → "control z"
* Abbreviations for units: "100km" → "one hundred kilometers"
* Symbols: "100%" → "one hundred percent"
* URLs: "elevenlabs.io/docs" → "eleven labs dot io slash docs"
* Calendar events: "2024-01-01" → "January first, two-thousand twenty-four"
Different contexts might require different conversions:
* Dates: "01/02/2023" → "January second, twenty twenty-three" or "the first of February, twenty twenty-three" (depending on locale)
* Time: "14:30" → "two thirty PM"
If you need a specific format, explicitly state it in the prompt.
#### Putting it all together
This prompt will act as a good starting point for most use cases:
```text maxLines=0
Convert the output text into a format suitable for text-to-speech. Ensure that numbers, symbols, and abbreviations are expanded for clarity when read aloud. Expand all abbreviations to their full spoken forms.
Example input and output:
"$42.50" → "forty-two dollars and fifty cents"
"£1,001.32" → "one thousand and one pounds and thirty-two pence"
"1234" → "one thousand two hundred thirty-four"
"3.14" → "three point one four"
"555-555-5555" → "five five five, five five five, five five five five"
"2nd" → "second"
"XIV" → "fourteen" - unless it's a title, then it's "the fourteenth"
"3.5" → "three point five"
"⅔" → "two-thirds"
"Dr." → "Doctor"
"Ave." → "Avenue"
"St." → "Street" (but saints like "St. Patrick" should remain)
"Ctrl + Z" → "control z"
"100km" → "one hundred kilometers"
"100%" → "one hundred percent"
"elevenlabs.io/docs" → "eleven labs dot io slash docs"
"2024-01-01" → "January first, two-thousand twenty-four"
"123 Main St, Anytown, USA" → "one two three Main Street, Anytown, United States of America"
"14:30" → "two thirty PM"
"01/02/2023" → "January second, two-thousand twenty-three" or "the first of February, two-thousand twenty-three", depending on locale of the user
```
### Use Regular Expressions for preprocessing
If using code to prompt an LLM, you can use regular expressions to normalize the text before providing it to the model. This is a more advanced technique and requires some knowledge of regular expressions. Here are some simple examples:
```python title="normalize_text.py" maxLines=0
# Be sure to install the inflect library before running this code
import inflect
import re
# Initialize inflect engine for number-to-word conversion
p = inflect.engine()
def normalize_text(text: str) -> str:
# Convert monetary values
def money_replacer(match):
currency_map = {"$": "dollars", "£": "pounds", "€": "euros", "¥": "yen"}
currency_symbol, num = match.groups()
# Remove commas before parsing
num_without_commas = num.replace(',', '')
# Check for decimal points to handle cents
if '.' in num_without_commas:
dollars, cents = num_without_commas.split('.')
dollars_in_words = p.number_to_words(int(dollars))
cents_in_words = p.number_to_words(int(cents))
return f"{dollars_in_words} {currency_map.get(currency_symbol, 'currency')} and {cents_in_words} cents"
else:
# Handle whole numbers
num_in_words = p.number_to_words(int(num_without_commas))
return f"{num_in_words} {currency_map.get(currency_symbol, 'currency')}"
# Regex to handle commas and decimals
text = re.sub(r"([$£€¥])(\d+(?:,\d{3})*(?:\.\d{2})?)", money_replacer, text)
# Convert phone numbers
def phone_replacer(match):
return ", ".join(" ".join(p.number_to_words(int(digit)) for digit in group) for group in match.groups())
text = re.sub(r"(\d{3})-(\d{3})-(\d{4})", phone_replacer, text)
return text
# Example usage
print(normalize_text("$1,000")) # "one thousand dollars"
print(normalize_text("£1000")) # "one thousand pounds"
print(normalize_text("€1000")) # "one thousand euros"
print(normalize_text("¥1000")) # "one thousand yen"
print(normalize_text("$1,234.56")) # "one thousand two hundred thirty-four dollars and fifty-six cents"
print(normalize_text("555-555-5555")) # "five five five, five five five, five five five five"
```
```typescript title="normalizeText.ts" maxLines=0
// Be sure to install the number-to-words library before running this code
import { toWords } from 'number-to-words';
function normalizeText(text: string): string {
return (
text
// Convert monetary values (e.g., "$1000" → "one thousand dollars", "£1000" → "one thousand pounds")
.replace(/([$£€¥])(\d+(?:,\d{3})*(?:\.\d{2})?)/g, (_, currency, num) => {
// Remove commas before parsing
const numWithoutCommas = num.replace(/,/g, '');
const currencyMap: { [key: string]: string } = {
$: 'dollars',
'£': 'pounds',
'€': 'euros',
'¥': 'yen',
};
// Check for decimal points to handle cents
if (numWithoutCommas.includes('.')) {
const [dollars, cents] = numWithoutCommas.split('.');
return `${toWords(Number.parseInt(dollars))} ${currencyMap[currency] || 'currency'}${cents ? ` and ${toWords(Number.parseInt(cents))} cents` : ''}`;
}
// Handle whole numbers
return `${toWords(Number.parseInt(numWithoutCommas))} ${currencyMap[currency] || 'currency'}`;
})
// Convert phone numbers (e.g., "555-555-5555" → "five five five, five five five, five five five five")
.replace(/(\d{3})-(\d{3})-(\d{4})/g, (_, p1, p2, p3) => {
return `${spellOutDigits(p1)}, ${spellOutDigits(p2)}, ${spellOutDigits(p3)}`;
})
);
}
// Helper function to spell out individual digits as words (for phone numbers)
function spellOutDigits(num: string): string {
return num
.split('')
.map((digit) => toWords(Number.parseInt(digit)))
.join(' ');
}
// Example usage
console.log(normalizeText('$1,000')); // "one thousand dollars"
console.log(normalizeText('£1000')); // "one thousand pounds"
console.log(normalizeText('€1000')); // "one thousand euros"
console.log(normalizeText('¥1000')); // "one thousand yen"
console.log(normalizeText('$1,234.56')); // "one thousand two hundred thirty-four dollars and fifty-six cents"
console.log(normalizeText('555-555-5555')); // "five five five, five five five, five five five five"
```
# Latency optimization
> Learn how to optimize text-to-speech latency.
This guide covers the core principles for improving text-to-speech latency.
While there are many individual techniques, we'll group them into **four principles**.
Four principles
1. [Use Flash models](#use-flash-models)
2. [Leverage streaming](#leverage-streaming)
3. [Consider geographic proximity](#consider-geographic-proximity)
4. [Choose appropriate voices](#choose-appropriate-voices)
Enterprise customers benefit from increased concurrency limits and priority access to our rendering queue. [Contact sales](https://elevenlabs.io/contact-sales) to learn more about our enterprise
plans.
## Use Flash models
[Flash models](/docs/models#flash-v25) deliver \~75ms inference speeds, making them ideal for real-time applications. The trade-off is a slight reduction in audio quality compared to [Multilingual v2](/docs/models#multilingual-v2).
75ms refers to model inference time only. Actual end-to-end latency will vary with factors such as
your location & endpoint type used.
## Leverage streaming
There are three types of text-to-speech endpoints available in our [API Reference](/docs/api-reference):
* **Regular endpoint**: Returns a complete audio file in a single response.
* **Streaming endpoint**: Returns audio chunks progressively using [Server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events).
* **Websockets endpoint**: Enables bidirectional streaming for real-time audio generation.
### Streaming
Streaming endpoints progressively return audio as it is being generated in real-time, reducing the time-to-first-byte. This endpoint is recommended for cases where the input text is available up-front.
Streaming is supported for the [Text to
Speech](/docs/api-reference/text-to-speech/convert-as-stream) API, [Voice
Changer](/docs/api-reference/speech-to-speech/stream) API & [Audio
Isolation](/docs/api-reference/audio-isolation/audio-isolation-stream) API.
### Websockets
The [text-to-speech websocket endpoint](/docs/api-reference#text-to-speech-websocket) supports bidirectional streaming making it perfect for applications with real-time text input (e.g. LLM outputs).
Setting `auto_mode` to true automatically handles generation triggers, removing the need to
manually manage chunk strategies.
If `auto_mode` is disabled, the model will wait for enough text to match the chunk schedule before starting to generate audio.
For instance, if you set a chunk schedule of 125 characters but only 50 arrive, the model stalls until additional characters come in—potentially increasing latency.
For implementation details, see the [text-to-speech websocket guide](/docs/api-reference#text-to-speech-websocket).
## Consider geographic proximity
We serve our models from multiple regions to optimize latency based on your geographic location.
By default all self-serve users use our US region.
For example, using Flash models with Websockets, you can expect the following TTFB latencies via our US region:
| Region | TTFB |
| --------------- | --------- |
| US | 150-200ms |
| EU | 230ms\* |
| North East Asia | 250-350ms |
| South Asia | 380-440ms |
\*European customers can access our dedicated European tech stack for optimal latency of 150-200ms.
Contact your sales representative to get onboarded to our European infrastructure.
We are actively working on deploying our models in Asia. These deployments will bring speeds
closer to those experienced by US and EU customers.
## Choose appropriate voices
We have observed that in some cases, voice selection can impact latency. Here's the order from fastest to slowest:
1. Default voices (formerly premade), Synthetic voices, and Instant Voice Clones (IVC)
2. Professional Voice Clones (PVC)
Higher audio quality output formats can increase latency. Be sure to balance your latency requirements with audio fidelity needs.
We are actively working on optimizing PVC latency for Flash v2.5.
# Secure by design
> Learn how to safely integrate ElevenLabs APIs.
Whether you're building voicemail apps, interactive characters, or audio-driven games, the ElevenLabs API gives you direct access to powerful voice capabilities.
But with that access comes the responsibility to secure your users’ data and manage voice resources carefully.
This guide outlines two critical security practices for developers:
* Isolating environments using **service accounts**
* Implementing **resource-level permissions**
## Use service accounts to isolate environments
Service accounts provide scoped, API-only access to the ElevenLabs platform. Unlike user accounts, they’re not tied to individuals—they’re designed for backend systems and automation.
If a service account creates a resource, only admins can see it by default but it can be shared with other users. Similarly, you can share any resource with a service account just as you would with a user.
Each service account is created at the workspace level and managed by workspace admins. They can create and access resources through the API.
We recommend provisioning a dedicated service account for each environment:
* `production-service-account`
* `testing-service-account`
* `uat-service-account` (if applicable)
This ensures clean separation between environments, reduces accidental data leaks across environments, and simplifies monitoring.
### Why this matters
**Separation of concerns**\
Avoid mixing test and production data. Environment isolation supports auditability and compliance.
**Principle of least privilege**\
Each service account should only have access to the minimum necessary resources. API keys can be scoped further at the time of creation.
**Better observability**\
Track API usage and performance by environment. Separate service accounts make it easier to debug issues and monitor activity.
## Apply resource-level permissions in your backend
If your app allows users to record messages using cloned voices, it is essential to ensure users only access voices they own or have been granted permission to use.
While the ElevenLabs platform supports in-app sharing, you should enforce **resource-level access control** within your own systems when using the API.
A recommended model:
```
user_id | voice_id | permission_level
```
Possible permission\_level values:
* `viewer`: can use the voice for speech generation
* `editor`: can update voice settings
* `admin`: can manage sharing and permissions
This structure lets you control who can access and modify voices and prevents unauthorized use of sensitive resources.
These permissions are suggestions based on controls natively offered if you are directly using the ElevenLabs platform.
### Build securely. Scale confidently.
Security should be foundational, not an afterthought.
By leveraging service accounts and implementing permission controls, you’ll reduce risk and build trust—while giving your users the full potential of AI voice.
# Overview
> Step by step worflow guides.
This section covers everything from account creation to advanced voice cloning, speech synthesis techniques, dubbing, and expert voiceover.
## Product guides
Discover how to create speech from text with text to speech
Discover how to transform your voice with voice changer
Discover how to create cinematic sound effects from text
Manage long-form content with Studio
Discover how to dub your videos in multiple languages
Discover how to create conversational AI agents
Discover how to create instant & professional voice clones
Discover our voice library with over 5,000 community voices
Discover how to craft voices from a single prompt
Discover how to get paid when your voice is used
Easily embed ElevenLabs on any web page
Manage long-form audio generation with voiceover studio
Isolate voices from background noise
Classify AI-generated speech
## Administration
Learn how to manage your account settings
Learn how to manage your billing information
Learn how to manage your enterprise workspaces
Learn how to enable single sign-on for your enterprise
***
## Troubleshooting
1. Explore our troubleshooting section for common issues and solutions.
2. Get help from the Conversational AI widget in the bottom right corner.
3. Ask for help in our [Discord community](https://discord.gg/elevenlabs).
4. Contact our [support team](https://help.elevenlabs.io/hc/en-us/requests/new?ticket_form_id=13145996177937).
# Text to Speech
> A guide on how to turn text to speech with ElevenLabs
## Overview
ElevenLabs' Text to Speech technology is integral to our offerings, powering high-quality AI-generated speech across various applications worldwide. It's likely you've already encountered our voices in action, delivering lifelike audio experiences.
## Guide

Type or paste your text into the input box on the Text to Speech page.
Select the voice you wish to use from your Voices at the bottom left of the screen.
Modify the voice settings for the desired output.
Click the 'Generate' button to create your audio file.
## Settings
Get familiar with the voices, models & settings for creating high-quality speech.
### Voices

We offer many types of voices, including the curated Default Voices library, completely synthetic voices created using our Voice Design tool, and you can create your own collection of cloned voices using our two technologies: Instant Voice Cloning and Professional Voice Cloning. Browse through our voice library to find the perfect voice for your production.
Not all voices are equal, and a lot depends on the source audio used to create that voice. Some voices will perform better than others, while some will be more stable than others. Additionally, certain voices will be more easily cloned by the AI than others, and some voices may work better with one model and one language compared to another. All of these factors are important to consider when selecting your voice.
[Learn more about voices](/docs/capabilities/voices)
### Models

ElevenLabs offers two families of models: standard (high-quality) models and Flash models, which are optimized for low latency. Each family includes both English-only and multilingual models, tailored for specific use cases with strengths in either speed, accuracy, or language diversity.
Eleven v3
} href="/docs/models#eleven-v3-alpha">
Our most emotionally rich, expressive speech synthesis model
Dramatic delivery and performance
70+ languages supported
10,000 character limit
Support for natural multi-speaker dialogue
Lifelike, consistent quality speech synthesis model
Natural-sounding output
29 languages supported
10,000 character limit
Most stable on long-form generations
Our fast, affordable speech synthesis model
Ultra-low latency (~75ms†)
32 languages supported
40,000 character limit
Faster model, 50% lower price per character
High quality, low-latency model with a good balance of quality and speed
High quality voice generation
32 languages supported
40,000 character limit
Low latency (~250ms-300ms†), 50% lower price per character
[Learn more about our models](/docs/models)
### Voice settings

Our users have found different workflows that work for them. The most common setting is stability around 50 and similarity near 75, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you're aiming for.
It's important to note that the AI is non-deterministic; setting the sliders to specific values won't guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation.
#### Speed
The speed setting allows you to either speed up or slow down the speed of the generated speech. The default value is 1.0, which means that the speed is not adjusted. Values below 1.0 will slow the voice down, to a minimum of 0.7. Values above 1.0 will speed up the voice, to a maximum of 1.2. Extreme values may affect the quality of the generated speech.
#### Stability
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. As mentioned before, this is also influenced heavily by the original voice. Setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.
For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like.
On the other hand, if you want a more serious performance, even bordering on monotone at very high values, it is recommended to set the stability slider higher. Since it is more consistent and stable, you usually don't need to generate as many samples to achieve the desired result. Experiment to find what works best for you!
#### Similarity
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.
#### Style exaggeration
With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. It's important to note that using this setting has shown to make the model slightly less stable, as it strives to emphasize and imitate the style of the original voice.
In general, we recommend keeping this setting at 0 at all times.
#### Speaker Boost
This setting boosts the similarity to the original speaker. However, using this setting requires a slightly higher computational load, which in turn increases latency. The differences introduced by this setting are generally rather subtle.
## FAQ
The first factor, and one of the most important, is that good, high-quality, and consistent input will result in good, high-quality, and consistent output.
If you provide the AI with audio that is less than ideal—for example, audio with a lot of noise, reverb on clear speech, multiple speakers, or inconsistency in volume or performance and delivery—the AI will become more unstable, and the output will be more unpredictable.
If you plan on cloning your own voice, we strongly recommend that you go through our guidelines in the documentation for creating proper voice clones, as this will provide you with the best possible foundation to start from. Even if you intend to use only Instant Voice Clones, it is advisable to read the Professional Voice Cloning section as well. This section contains valuable information about creating voice clones, even though the requirements for these two technologies are slightly different.
The second factor to consider is that the voice you select will have a tremendous effect on the output. Not only, as mentioned in the first factor, is the quality and consistency of the samples used to create that specific clone extremely important, but also the language and tonality of the voice.
If you want a voice that sounds happy and cheerful, you should use a voice that has been cloned using happy and cheerful samples. Conversely, if you desire a voice that sounds introspective and brooding, you should select a voice with those characteristics.
However, it is also crucial to use a voice that has been trained in the correct language. For example, all of the professional voice clones we offer as default voices are English voices and have been trained on English samples. Therefore, if you have them speak other languages, their performance in those languages can be unpredictable. It is essential to use a voice that has been cloned from samples where the voice was speaking the language you want the AI to then speak.
This may seem slightly trivial, but it can make a big difference. The AI tries to understand how to read something based on the context of the text itself, which means not only the words used but also how they are put together, how punctuation is applied, the grammar, and the general formatting of the text.
This can have a small but impactful influence on the AI's delivery. If you were to misspell a word, the AI won't correct it and will try to read it as written.
The settings of the AI are nondeterministic, meaning that even with the same initial conditions (voice, settings, model), it will give you slightly different output, similar to how a voice actor will deliver a slightly different performance each time.
This variability can be due to various factors, such as the options mentioned earlier: voice, settings, model. Generally, the breadth of that variability can be controlled by the stability slider. A lower stability setting means a wider range of variability between generations, but it also introduces inter-generational variability, where the AI can be a bit more performative.
A wider variability can often be desirable, as setting the stability too high can make certain voices sound monotone as it does give the AI the same leeway to generate more variable content. However, setting the stability too low can also introduce other issues where the generations become unstable, especially with certain voices that might have used less-than-ideal audio for the cloning process.
The default setting of 50 is generally a great starting point for most applications.
# Voice changer
> A guide on how to transform audio between voices while preserving emotion and delivery.
## Overview
Voice changer (previously Speech-to-Speech) allows you to convert one voice (source voice) into another (cloned voice) while preserving the tone and delivery of the original voice.
Voice changer can be used to complement Text-to-Speech (TTS) by fixing pronunciation errors or infusing that special performance you've been wanting to exude. Voice changer is especially useful for emulating those subtle, idiosyncratic characteristics of the voice that give a more emotive and human feel. Some key features include:
* Greater accuracy with whispering
* The ability to create audible sighs, laughs, or cries
* Greatly improved detection of tone and emotion
* Accurately follows the input speaking cadence
* Language/accent retention
VIDEO
## Guide

Audio can be uploaded either directly with an audio file, or spoken live through a microphone. The audio file must be less than **50mb in size**, and either the audio file or your live recording cannot exceed **5 minutes in length**.
If you have material longer than 5 minutes, we recommend breaking it up into smaller sections and generating them separately. Additionally, if your file size is too large, you may need to compress/convert it to an mp3.
### Existing audio file
To upload an existing audio file, either click the audio box, or drag and drop your audio file directly onto it.
### Record live
Press the **Record Audio** button in the audio box, and then once you are ready to begin recording, press the **Microphone** button to start. After you're finished recording, press the **Stop** button.
You will then see the audio file of this recording, which you can then playback to listen to - this is helpful to determine if you are happy with your performance/recording. The character cost will be displayed on the bottom-left corner, and you will not be charged this quota for recording anything - only when you press "Generate".
**The cost for a voice changer generation is solely duration-based at 1000 characters per minute.**
## Settings

Learn more about the different voice settings [here](/docs/product-guides/playground/text-to-speech#settings).
Voice changer adds an additional setting to automaticaly remove background noise from your
recording.
## Support languages
`eleven_english_sts_v2`
Our v2 models support 29 languages:
*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*
The `eleven_english_sts_v2` model only supports English.
[Learn more about models](/docs/models)
## Best practices
Voice changer excels at **preserving accents** and **natural speech cadences** across various output voices. For instance, if you upload an audio sample with a Portuguese accent, the output will retain that language and accent. The input sample is crucial, as it determines the output characteristics. If you select a British voice like "George" but record with an American accent, the result will be George's voice with an American accent.
* **Expression**: Be expressive in your recordings. Whether shouting, crying, or laughing, the voice changer will accurately replicate your performance. This tool is designed to enhance AI realism, allowing for creative expression.
* **Microphone gain**: Ensure the input gain is appropriate. A quiet recording may hinder AI recognition, while a loud one could cause audio clipping.
* **Background Noise**: Turn on the **Remove Background Noise** option to automatically remove background noise from your recording.
# Sound effects
> A guide on how to create high-quality sound effects from text with ElevenLabs.
## Overview
**Sound effects** enhance the realism and immersion of your audio projects. ElevenLabs offers a variety of sound effects that can be easily integrated into your voiceovers and projects.
## Guide

Head over to the Sound Effects tab in the dashboard.
In the text box, type a description of the sound effect you want (e.g., “person walking on
grass”).

<>
1. Set the duration for the generated sound (or let it automatically pick the best length).
2. Use the prompt influence slider to control how closely the output should matchthe prompt.
>
Click the "Generate" button to start generating.
You should have four different sounds generated. If you like none of them, adjust the prompt or
settings as needed and regenerate.
**Exercise**: Create a Sound Effect using the following prompt: old-school funky brass stabs from
an old vinyl sample, stem, 88bpm in F# minor.
## Explore the library

Check out some of our community-made sound effects in the **Explore** tab.
For more information on prompting & how sound effects work visit our [overview page](/docs/capabilities/sound-effects).
# Speech to Text
> A guide on how to transcribe audio with ElevenLabs
## Overview
With speech to text, you can transcribe spoken audio into text with state of the art accuracy. With automatic language detection, you can transcribe audio in a multitude of languages.
## Creating a transcript
In the ElevenLabs dashboard, navigate to the Speech to Text page and click the "Transcribe files" button. From the modal, you can upload an audio or video file to transcribe.

Select the primary language of the audio and the maximum number of speakers. If you don't know either, you can leave the defaults which will attempt to detect the language and number of speakers automatically.
Finally choose whether you wish to tag audio events like laughter or applause, then click the "Transcribe" button.
Click on the name of the audio file you uploaded in the center pane to view the results. You can click on a word to start a playback of the audio at that point.
Click the "Export" button in the top right to download the results in a variety of formats.
## Transcript Editor
Once you've created a transcript, you can edit it in our Transcript Editor. Learn more about it [in this guide](/docs/product-guides/products/transcripts).
## FAQ
### Supported languages
The Scribe v1 model supports 99 languages, including:
*Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (zho), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).*
Yes, the tool supports uploading both audio and video files. The maximum file size for either is 1GB.
### Renaming speakers
Yes, you can rename speakers by clicking the "edit" button next to the "Speakers" label.
# Studio
> Studio overview
## Overview
Studio is an end-to-end workflow for creating long-form content. With this tool you can upload an entire book, document or webpage and generate a voiceover narration for it. The result can then be downloaded as a single audio file or as individual audio files for each chapter.
## Guide

Select one of the starting options at the top of the Studio page.
Follow the instructions in the pop-up and click
**Create**
.
Make changes in the text editor and adjust voice settings as needed.
Click the **Export** button to compile and download the entire project or specific chapters as a
single audio file.
You can use our [Audio Native](/docs/product-guides/audio-tools/audio-native) feature to easily
and effortlessly embed any narration project onto your website.
## Starting options
Some settings are automatically selected by default when you create a new project.
The default model is Multilingual v2, our highest quality model and the model we recommend for content creation. You can change this setting after you've created your project in **Project Settings**.
The quality setting is automatically selected depending on your subscription plan, and will not increase your credit usage.
For free, Starter and Creator subscriptions the quality will be 128 kbps MP3, or WAV generated from 128 kbps source.
For Pro, Scale, Business and Enterprise plans, the quality will be 16-bit, 44.1 kHz WAV, or 192 kbps MP3 (Ultra Lossless).
#### Start from scratch
This option will automatically create a new blank project, ready for you to enter your own text.
#### Create an audiobook

When you select this option, you will see a pop-up which will allow you to upload a file which will be imported into your new project.
You can upload EPUB, PDF, TXT, HTML and DOCX files.
You can also select a default voice for your project, and have the option to enable **Auto-assign voices**. This will detect characters in your text and assign matching voices to them. This feature will add additional processing time.
#### Create an article

When you select this option, you will see a pop-up which will allow you to enter a URL to import the text from the page into your project.
You can also select a default voice for your project, and have the option to enable **Auto-assign voices**. This will detect characters in your text and assign matching voices to them. This feature will add additional processing time.
#### Create a podcast

This option will use GenFM to automatically create a podcast based on an uploaded document, a webpage via URL, or an existing project.
With this option, GenFM will generate a new script based on the document you upload. If you want to generate a podcast using a script you have written and don't want changed, you should use either **Create an audiobook** or **Start from Scratch**.
In the format settings, you can choose whether to create a conversation between a host and guest, or a more focussed bulletin style podcast with a single host. You can also set the duration to short, default or long.
You can choose your own preferred voices for the podcast host and guest, or go with our suggested voices.
You have the option to set the podcast language. If you don't set this option, the podcast will be generated in the language of the source material.
Finally, if you click the cog icon, you can access the advanced configuration options. This allows you to specify up to three areas you would like the podcast to focus on.
## Generating and Editing
Once you've added text, either by importing it or adding it yourself, you can use the **Export**
button to generate audio for the entire chapter or project in one step.

This will automatically generate and download an audio file, but you can still edit your project after this.
Once you've finished editing, you will need to use the **Export** button again to generate and download a new
version of your project that includes the updated audio.
#### Play

You can use the **Play** button in the player at the bottom of the Studio interface to play audio
that has already been generated, or generate audio if a paragraph has not yet been converted.
Generating audio will cost credits.
If you have already generated audio, then the **Play** button will play the audio that has already
generated and you won't be charged any credits.
There are two modes when using the **Play** button. **Until end** will play existing audio,
or generate new audio for paragraphs that have not yet been generated, from the selected paragraph
to the end of the current chapter. **Selection** will play or generate audio only for the selected
paragraph.
#### Chapters sidebar
When you create a Studio project using the **Create an audiobook** option and import a document that includes chapters, chapters will be automatically enabled for your project. You can toggle the visibility of the Chapters sidebar by clicking **Chapters**.

If you want to add a new chapter, you can do this using the **+** button at the top of the Chapters
sidebar.
If you used the **Start from scratch** option to create your project, or your project didn't originally include chapters, you'll need to enable chapters in **Project settings**. You will find the **Enable chapters** toggle in the general settings.

Once you've enabled chapters, you can click **+ Chapter** to add a new chapter to your project. After you've added one chapter, the Chapters sidebar will be enabled, and you can use the **+** button to add additional chapters.
#### Generate/Regenerate

The **Generate** button will generate audio if you have not yet generated audio for the selected
text, or will generate new audio if you have already generated audio. This will cost credits.
If you have made changes to the paragraph such as changing the text or the voice, then the paragraph
will lose its converted status, and will need to be generated again.
The status of a paragraph (converted or unconverted) is indicated by the bar to the left of the paragraph.
Unconverted paragraphs have a pale grey bar while converted paragraphs have a dark grey bar.
If the button says **Regenerate**, then this means that you won't be charged for the next generation.
You're eligible for two free regenerations provided you don't change the voice or the text.
#### Generation history

If you click the **Generation history** button, this will show all the previously generated audio
for the selected paragraph. This allows you to listen to and download each individual generation.

If you prefer an earlier version of a paragraph, you can restore it to that previous version.
You can also remove generations, but be aware that if you
remove a version, this is permanent and you can't restore it.
#### Undo and Redo

If you accidentally make a change, you can use the **Undo** button to restore the previous
version, and the **Redo** button to restore the change.
#### Breaks

You can add a pause by using the **Insert break** button. This inserts a break tag. By default, this will be set to 1 second, but you can change the length of the break up to a maximum of 3 seconds.
Using too many breaks within a paragraph can cause stability issues. We are working on resolving this, but in the meantime, we recommend limiting the number of breaks in any single paragraph to 2-3.
#### Actor Mode

Actor Mode allows you to specify exactly how you would like a section of text to be delivered by uploading a recording, or by recording yourself directly. You can either highlight a selection of text that you want to work on, or select a whole paragraph. Once you have selected the text you want to use Actor Mode with, click the **Actor Mode** button, and the Actor Mode pop-up will appear.

Either upload or record your audio, and you will then see the option to listen back to the audio or remove it. You will also see how many credits it will cost to generate the selected text using the audio you've provided.

If you're happy with the audio, click **Generate**, and your audio will be used to guide the delivery of the selected text.
Actor Mode will replicate all aspects of the audio you provide, including the accent.
#### Sound effects

You can add sound effects directly into your project by placing your cursor where you want the effect and clicking **Insert sound effect**. This inserts a new sound effect at that point, which you can reposition by dragging and dropping.
Click the sound effect to open a pop-up where you can enter a prompt and set the duration. The maximum duration depends on the remaining length of the paragraph, up to 10 seconds. If you leave the duration set to **Auto**, the AI will determine the appropriate length.
Click **Generate preview** to create four versions of the sound effect. Credit usage is as follows:
* Auto: 200 credits
* Manual: based on selected duration
You can click **Regenerate** to generate new effects using the same prompt, or an edited prompt. Each regeneration uses credits.
When you're happy with your sound effect, click **Apply** to finalize your choice. It will play alongside the narration unless you enable the **Blocking sound effect** toggle, which pauses the narration during playback. You can toggle this on or off for each sound effect.
To delete a sound effect, press **Backspace**. You can undo the deletion with the **Undo** button unless the tab has been refreshed or the project has been exited.
To duplicate a sound effect, you can copy and paste it.
Sound effects are not supported in ElevenReader exports, or when streaming the project using the Studio API.
#### Lock paragraph

Once you're happy with the performance of a paragraph, you can use the **Lock paragraph** button
to prevent any further changes.

Locked paragraphs are indicated by a lock icon to the left of the paragraph. If you want to unlock
a paragraph, you can do this by clicking the **Lock paragraph** button again.
#### Voices sidebar

The Voices sidebar is where you will find the voices used in your project, along with the voice
settings. You can toggle the visibility of the Voices sidebar by clicking **Voices**. For more
information on voices and voice settings, see the **Settings** section below.
#### Keyboard shortcuts

There are a range of keyboard shortcuts that can be used in Studio to speed up your workflow. To
see a list of all available keyboard shortcuts, click the **Project options** button, then select
**Keyboard shortcuts**.
## Settings
### Voices
We offer many types of voices, including the curated Default Voices library; completely synthetic voices created using our Voice Design tool; and you can create your own collection of cloned voices using our two technologies: Instant Voice Cloning and Professional Voice Cloning. Browse through our voice library to find the perfect voice for your production.
Not all voices are equal, and a lot depends on the source audio used to create that voice. Some voices will perform better than others, while some will be more stable than others. Additionally, certain voices will be more easily cloned by the AI than others, and some voices may work better with one model and one language compared to another. All of these factors are important to consider when selecting your voice.
[Learn more about voices](/docs/capabilities/voices)
### Voice settings

Our users have found different workflows that work for them. The most common setting is stability around 50 and similarity near 75, with minimal changes thereafter. Of course, this all depends on the original voice and the style of performance you're aiming for.
It's important to note that the AI is non-deterministic; setting the sliders to specific values won't guarantee the same results every time. Instead, the sliders function more as a range, determining how wide the randomization can be between each generation.
If you have a paragraph or text selected, you can use the **Override settings** toggle to change the settings for just the current selection. If you change the settings for the voice without enabling this, then this will change the settings for this voice across the whole of your project. This will mean that you will need to regenerate any audio that you had previously generated using different settings. If you have any locked paragraphs that use this voice, you won't be able to change the settings unless you unlock them.
#### Alias
You can use this setting to give the voice an alias that applies only for this project. For example, if you're using a different voice for each character in your audiobook, you could use the character's name as the alias.
#### Volume
If you find the generated audio for the voice to be either too quiet or too loud, you can adjust the volume. The default value is 0.00, which means that the audio will be unchanged. The minimum value is -30dN and the maximum is +5dB.
#### Speed
The speed setting allows you to either speed up or slow down the speed of the generated speech. The default value is 1.0, which means that the speed is not adjusted. Values below 1.0 will slow the voice down, to a minimum of 0.7. Values above 1.0 will speed up the voice, to a maximum of 1.2. Extreme values may affect the quality of the generated speech.
#### Stability
The stability slider determines how stable the voice is and the randomness between each generation. Lowering this slider introduces a broader emotional range for the voice. This is influenced heavily by the original voice. Setting the slider too low may result in odd performances that are overly random and cause the character to speak too quickly. On the other hand, setting it too high can lead to a monotonous voice with limited emotion.
For a more lively and dramatic performance, it is recommended to set the stability slider lower and generate a few times until you find a performance you like.
On the other hand, if you want a more serious performance, even bordering on monotone at very high values, it is recommended to set the stability slider higher. Since it is more consistent and stable, you usually don't need to generate as many samples to achieve the desired result. Experiment to find what works best for you!
#### Similarity
The similarity slider dictates how closely the AI should adhere to the original voice when attempting to replicate it. If the original audio is of poor quality and the similarity slider is set too high, the AI may reproduce artifacts or background noise when trying to mimic the voice if those were present in the original recording.
#### Style exaggeration
With the introduction of the newer models, we also added a style exaggeration setting. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. It's important to note that using this setting has shown to make the model slightly less stable, as it strives to emphasize and imitate the style of the original voice.
In general, we recommend keeping this setting at 0 at all times.
#### Speaker boost
This setting boosts the similarity to the original speaker. However, using this setting requires a slightly higher computational load, which in turn increases latency. The differences introduced by this setting are generally rather subtle.
### Pronunciation dictionaries
Sometimes you may want to specify the pronunciation of certain words, such as character or brand names, or specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that includes rules about how specified words should be pronounced, either using a phonetic alphabet (phoneme tags) or word substitutions (alias tags).
Phoneme tags are only compatible with "Eleven Flash v2", "Eleven Turbo v2" and "Eleven English v1"
[models](/docs/models).
Whenever one of these words is encountered in a project, the AI will pronounce the word using the specified replacement. When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the first replacement is used.
You can add a pronunciation dictionary to your project from the General tab in Project settings.
For more information on pronunciation dictionaries, please see our [prompting best practices guide](/docs/best-practices/prompting#pronunciation-dictionaries).
### Export settings
Within the **Export** tab under **Project settings** you can add additional metadata such as Title, Author, ISBN and a Description to your project. This information will automatically be added to the downloaded audio files. You can also access previous versions of your project, and enable volume normalization.
## Exporting
When you're happy with your chapter or project, you will need to use the **Export** button to generate a new version that you can download. If you've already generated audio for every paragraph in either your chapter or project, you won't be charged any additional credits to export. If there are any paragraphs that do need converting as part of the export process, you will see a notification of how many credits it will cost to export.
#### Export options
If your project only has one chapter, you will just see the option to export as either MP3 or WAV.
If your project has multiple chapters, you will have the option to export each chapter individually, or export the full project. If you're exporting the full project, you can either export as a single file, or as a ZIP file containing individual files for each chapter. You can also choose whether to download as MP3 or WAV.
#### Quality setting
The quality of the export depends on your subscription plan. For newly created projects, the quality will be:
* Free, Starter and Creator: 128 kbps MP3, or WAV converted from 128 kbps source.
* Pro, Scale, Business and Enterprise plans: 16-bit, 44.1 kHz WAV, or 192 kbps MP3 (Ultra Lossless).
If you have an older project, you may have set the quality setting when you created the project, and this can't be changed. You can check the quality setting for your project in the Export menu by hovering over **Format**
#### Downloading
Once your export is ready, it will be automatically downloaded.
You can access and download all previous exports, of both chapters and projects, by clicking the **Project options** button and selecting **Exports**.
## FAQ

In Studio, provided you don't change the text or voice, you can regenerate a selected paragraph or section of text twice for free.
If free regenerations are available for the selected paragraph or text, you will see **Regenerate**. If you hover over the **Regenerate** button, the number of free regenerations remaining will be displayed.
Once your free regenerations have been used, the button will display **Generate**, and you will be charged for subsequent generations.
When using **Export** to generate audio for a full chapter or project, auto-regeneration automatically checks the output for volume issues, voice similarity, and mispronunciations. If any issues are detected, the tool will automatically regenerate the audio up to twice, at no extra cost.
This feature may increase the processing time but helps ensure higher quality output for your bulk conversions.
# Dubbing
> Translate audio and video files with ElevenLabs dubbing and dubbing studio.
**Dubbing** allows you to translate content across 29 languages in seconds with voice translation, speaker detection, and audio dubbing.
Automated Dubbing or Video Translation is a process for translating and replacing the original audio of a video with a new language, while preserving the unique characteristics of the original speakers' voices.
We offer two different types of Dubbing: Automatic and Studio.
**Automatic Dubbing** is the default, so let’s see the step by step for this type of dubbing.

### Step by step for Automatic Dubbing
1. Go to the Dubbing Studio in your Navigation Menu.
2. Enter the dub name and select the original and target languages.
3. Upload a video/audio file or import one via URL (YouTube, TikTok, etc.).
4. Opt for watermarking if needed.
5. Leave the Create a Dubbing Studio project box unchecked.
6. Click on the **Advanced Settings** option:
* Choose the number of speakers and video resolution.
* Select the specific range for dubbing if needed.
7. Click on **Create** and sit back while ElevenLabs does its magic.
**Exercise**: Dub the video found [here](https://www.youtube.com/watch?v=WnNFZt0qjD0) from English to Spanish (or a language of your choosing). Select 6 speakers and keep the watermark.
See the API reference for dubbing.
A Python flask example app for dubbing.
### Dubbing Studio Project
* This is the advanced Dubbing version, which you can access by checking the **Create a Dubbing Studio project** box. Read more about it in the [Dubbing Studio guide](/docs/product-guides/products/dubbing/dubbing-studio).
# Dubbing Studio
> Fine-grained control over your dubs.
Your browser does not support the video tag.
## Create a Dubbing Studio project
1. Check the 'Create Dubbing Studio' box when creating a dub.

2. Click on **Create Dub**. Once the Dubbing Studio project is created, you will be able to open it.
## Core Concepts
## Speaker Cards

Speaker cards show the original transcription and translation (if you add one) of dialogue from the source video. You can click 'Transcribe Audio' to retranscribe
the original speech, or click the arrow to re-translate an existing transcription.
### Edit Transcripts and Translations
Both transcriptions and translations can be edited freely - just click inside a speaker card and start typing to edit the text.
### Speaker Identification
You can see the name of each speaker in the top left of the speaker card. To change the name of a speaker or reassign a clip to a different speaker,
you'll need to use the Timeline.
## Timeline
The timeline contains many important elements of Dubbing Studio, covered in more detail in different sections below:
### Basic navigation
There are 3 main ways to navigate the timeline:
1. Click and drag the cursor
2. Horizontally scroll
3. Input a specific timecode on the right side of the timeline
### Adjust clips and regenerate audio
1. Drag the handles on the left or right side of a clip to adjust its length.
2. Click the refresh icon to regenerate the audio for that clip.

##### Dynamic vs. Fixed Generations
NOTE: By default, all regenerations in Dubbing Studio are Fixed Generations , which means that the system will keep the duration of the clip fixed regardless
of how much text it contains. This can lead to speech speeding up or slowing down significantly if you adjust the length of a clip without changing the text, or if you add/remove
a large number of words to a clip.
Consider a clip with the phrase 'I'm doing well.' If that clip were set to last 10 seconds and the audio were generated using Fixed Generations, the speech would sound
slow and drawn out.
Alternatively, you can use Dynamic Generations by right clicking a segment and selecting it from the options. This will attempt to adjust the
length of the clip to the length of the text and make the audio sound more natural.
But be careful – using Dynamic Generations could affect sync and timing in your videos. If, for example, you select Dynamic Generation for a clip with many words in it,
and there is not enough room before the next clip for it to properly expand, the audio may not generate properly.

##### Stale Audio
Stale audio refers to audio that needs to be regenerated for one of many reasons (clip length changes, settings changes, transcription/translation changes, etc). You can regenerate stale
clips individually or click 'Generate Stale Audio' to bulk generate all stale audio clips.
##### Clip History
You can right click a clip and select 'Clip History' to view previous generations and select the one that sounds best.
### Split and Merge clips
1. To split a clip, move the cursor to a specific timecode and click 'Split'.
2. To merge two clips, drag the ends of the clips together and click 'Merge.'

As you split and merge clips, the speaker cards above the timeline will update to reflect these changes.
### Reassign clips to different speakers
To reassign a clips to a different speaker, click the segment and drag it to another track.

### Add additional audio tracks
Use the action buttons at the bottom of the timeline to add new audio tracks

## Voice Settings
### Voice Selection
To select the voice that will be used to generate audio on a specific speaker track, click the settings cog icon on the left side of the timeline near the speaker name.
There are 3 main types of voices to choose from in Dubbing Studio:
1. Clip clone - this creates a unique voice clone for each clip based on the source audio for that clip
2. Track clone - this creates a single voice clone for the whole track based on all source audio for a given speaker
3. Other voices - you can also choose from thousands of voices available in our Voice Library, each with detailed metadata and tags to help you choose the right one
You can also create, save, and reuse a voice from a specific clip by right clicking the clip and selecting 'Create Voice from Selection.'
### Setting Track vs. Clip Level Settings
You can set voice settings at two levels:
1. Track Level - changes will apply across all clips in the track, which can help with stability and consistency.
2. Clip Level - changes will only apply to a specific clip. To set clip-level settings, use the panel on the right side of the timeline.
Disable the 'inherit track settings' toggle and configure your desired settings.

## Exports
Click 'Export' in the bottom right of Dubbing Studio to open the export menu.
Dubbing Studio currently supports the following export formats:
* AAC (audio)
* MP3 (audio)
* WAV (audio)
* .zip of audio tracks
* .zip of audio clips
* AAF (timeline data)
* SRT (subtitles/captions)
* CSV (speaker, start\_time, end\_time, transcription, translation)
Make sure you select the correct language when exporting.
## Additional Features
* **Voiceover Tracks:** Voiceover tracks create new Speakers. You can click and add clips on the timeline wherever you like. After creating a clip, start writing your desired text on the speaker cards above. You'll first need to translate that text, then you can press "Generate". You can also use our voice changer tool by clicking on the microphone icon on the right side of the screen to use your own voice and then change it into the selected voice.
* **SFX Tracks:** Add a SFX track, then click anywhere on that track to create a SFX clip. Similar to our independent SFX feature, simply start writing your prompt in the Speaker card above and click "Generate" to create your new SFX audio. You can lengthen or shorten SFX clips and move them freely around your timeline to fit your project - make sure to press the "stale" button if you do so.
* **Upload Audio:** This option allows you to upload a non voiced track such as sfx, music or background track. Please keep in mind that if voices are present in this track, they won't be detected so it will not be possible to translate or correct them.
## Manual Dub
In cases where you already have an accurate dubbing script prepared and want to ensure your Dubbing Studio project sticks to your
exact clips and speaker assignment, you can use the Manual Dub option during creation.
To create a Manual Dub, you'll need:
1. Video file
2. Background audio file
3. Foreground audio file
4. CSV where each row contains a speaker, start\_time, end\_time, transcription, and translation field
The CSV file must strictly follow the predefined format in order to be processed correctly. Please see below for samples in the three supported timecodes:
* seconds
* hours:minutes:seconds:frame
* hours:minutes:seconds,milliseconds
### Example CSV files
```csv seconds
speaker,start_time,end_time,transcription,translation
Adam,"0.10000","1.15000","Hello, how are you?","Hola, ¿cómo estás?"
Adam,"1.50000","3.50000","I'm fine, thank you.","Estoy bien, gracias."
```
```csv hours:minutes:seconds:frame
speaker,start_time,end_time,transcription,translation
Adam,"0:00:01:01","0:00:05:01","Hello, how are you?","Hola, ¿cómo estás?"
Adam,"0:00:06:01","0:00:10:01","I'm fine, thank you.","Estoy bien, gracias."
```
```csv hours:minutes:seconds,milliseconds
speaker,start_time,end_time,transcription,translation
Adam,"0:00:01,000","0:00:05,000","Hello, how are you?","Hola, ¿cómo estás?"
Adam,"0:00:06,000","0:00:10,000","I'm fine, thank you.","Estoy bien, gracias."
```
| speaker | start\_time | end\_time | transcription | translation |
| ------- | ----------- | ----------- | --------------------------------- | -------------------------------------------- |
| Joe | 0:00:00.000 | 0:00:02.000 | Hey! | Hallo! |
| Maria | 0:00:02.000 | 0:00:06.000 | Oh, hi, Joe. It has been a while. | Oh, hallo, Joe. Es ist schon eine Weile her. |
| Joe | 0:00:06.000 | 0:00:11.000 | Yeah, I know. Been busy. | Ja, ich weiß. War beschäftigt. |
| Maria | 0:00:11.000 | 0:00:17.000 | Yeah? What have you been up to? | Ja? Was hast du gemacht? |
| Joe | 0:00:17.000 | 0:00:23.000 | Traveling mostly. | Hauptsächlich gereist. |
| Maria | 0:00:23.000 | 0:00:30.000 | Oh, anywhere I would know? | Oh, irgendwo, das ich kenne? |
| Joe | 0:00:30.000 | 0:00:36.000 | Spain. | Spanien. |
# Transcripts
> Using the ElevenLabs Transcript Editor
## Transcript Editor
In the ElevenLabs dashboard, navigate to the Speech to Text page and click any transcript to open the Transcript Editor.

You can rename your transcript in the panel on the right side of the screen.

Our transcript editor is WYSIWYG. Click anywhere in the transcript and start typing to edit the text.
Tip: Use command+z to undo changes easily.

Drag the handles on the timeline to adjust the start and end timestamps for a segment. You can also type in exact timestamps in the panel on the right side of the screen.

To split a segment, click in the text where you want to split and press **Enter.**
To merge two segments, click the 'merge segments' button. Note that two conditions must be fulfilled for a merge to be possible:
1. Both segments must belong to the same speaker
2. The segments must be adjacent to each other

To add a segment, click on the 'Add Segment' icon and select a location on the timeline.
To delete a segment, select the segment and click ‘Delete’ in the panel on the right side of the screen or press the Delete key

Click ‘align words’ after making changes to a segment to recompute word-level timestamps.

There are 2 ways to assign segments to different speakers:
1. **Individually**: click the orb next to the speaker name for a segment, and select a new speaker from the dropdown list.
2. **Bulk:** to reassign all segments from one speaker to another, click on the three dots (⋮) and select 'Move Segments To'. Then select the new speaker.

To add a speaker, click the '+' icon above the speaker tracks. To delete a speaker, click on the three dots (⋮) next to the speaker’s track and click ‘Delete.’
**Important note**: if you delete a speaker, all of their associated segments will also be deleted.

Click and drag to reorder speaker tracks on the timeline. You can also change the color of a speaker track (which also applies to all its segments) by clicking the orb next to the speaker name.

You can adjust the playback speed of the source media by clicking the indicator next to the play button.

Click the export button in the top right of the screen and select one of the transcript export formats:
* Plain text
* JSON
* HTML
* SRT
* VTT

## FAQ
Yes – you can add subtitles by clicking the '+' next to 'Subtitles' in the panel on the right
side of the screen.
To learn more about editing subtitles, please see our [Subtitle guide](/docs/product-guides/products/subtitles).
Yes – our Productions team offers human transcription services from \$2.00 per minute of audio. What you get from us:
* Expert review by a native speaker
* Optional 'Verbatim Mode' for maximum coverage of non-verbal sounds (\[cough], \[sigh], etc.) and other environmental sounds and audio events (\[dog barking], \[car horn], etc.)
For more information please see the 'Productions' section of your ElevenLabs account (currently in alpha and available to select users) or contact us at [productions@elevenlabs.io](mailto:productions@elevenlabs.io).
# Subtitles
> Using the ElevenLabs Subtitle Editor
## Subtitle Editor
You can use the subtitling mode of our transcript editor to edit your subtitles. Navigate to the Speech to Text page of your ElevenLabs account and click any transcript to get started.

You can rename your subtitles in the panel on the right side of the screen.

If you didn't add subtitles when creating the transcript, you can do so by clicking the "+" next to "Subtitles" in the panel on the right side of the screen.
You can switch between transcription and subtitling mode at any time using the tabs at the top of the editor.
Tip: you can also add subtitles during the transcript creation process by enabling the 'Include subtitles' toggle.


Our subtitle editor uses red and green colors to give you real-time feedback on whether your subtitles respect formatting rules like characters per line, lines on screen, and cue length.
To edit these rules, click the three dots next to 'Subtitles' in the panel on the right side of the screen and select 'Edit rules'


Our subtitle editor is WYSIWYG. Click anywhere and start typing to edit the text.
Tip: Use command+z to undo changes easily.

Drag the handles on the timeline to adjust the start and end timestamps for a cue. You can also type in exact timestamps in the panel on the right side of the screen.
Important: the transcript and subtitles for a video are completely separate from each other. Changes you make to subtitles (e.g. changing cue start/end times, adding/removing words, etc.) do NOT affect the transcription, and vice versa.

To split a cue, click in the text where you want to split and press **Enter.**
To merge two cues, click the 'merge cues' button.

To add a cue, click 'Add cue' and select a location on the timeline.
To delete a cue, select the cue and click ‘Delete’ in the panel on the right side of the screen, or press the Delete key.

You can adjust the playback speed of the source media by clicking the indicator next to the play button.

Click the export button in the top right of the screen and select one of the subtitle export formats:
* SRT
* VTT

## FAQ
Yes – subtitles have specific formatting rules and requirements that do not apply to transcripts.
Below is a summary of some (but not all) of the major differences between the two:
| Feature | Transcripts | Subtitles |
| ------------------------------ | ----------- | ---------------------------------------------------------------- |
| Word-level timestamps | Yes | No - only start/end times of cues |
| Speaker names/labels | Yes | No |
| Constraints | No | Yes - characters per line, lines on screen at once, cue duration |
| Overlapping segments supported | Yes | No |
For more information about transcripts, please see our [Transcripts guide](/docs/product-guides/products/transcripts).
No – transcripts and subtitles are completely separate from each other in our editor. That means that changes you make to one will NOT affect the other.
Yes – our Productions team offers human subtitling services from \$2.50 per minute. What you get from us:
* A subtitling expert edits your subtitles to ensure they adhere to all formatting rules and requirements
* If you choose, our language teams translate your subtitles into different languages
For more information please see the 'Productions' section of your ElevenLabs account (currently in alpha and available to select users) or contact us at [productions@elevenlabs.io](mailto:productions@elevenlabs.io).
# Voice Library
> A guide on how to use voices from the Voice Library.
## Overview
The [Voice Library](https://elevenlabs.io/app/voice-library) is a marketplace where our community can share Professional Voice Clones and earn rewards when others use them. Currently, only Professional Voice Clones can be shared. Instant Voice Clones and voices created with Voice Design are not shareable.
To access the Voice Library, click **Voices** in the sidebar and select **Explore**.
### Finding voices
You can browse the Voice Library in several ways:
#### Handpicked Collections
Our Handpicked Collections highlight top voices across use cases, genres, and languages. These collections are updated regularly to include new standout voices.
#### Search
Use the search bar to find voices by name, keyword, or voice ID. You can also search by uploading or dragging and dropping an audio file. This will help you find the original voice, if available, along with similar voices.
#### Sort options
You can sort voices by:
* Trending: voices ranked by popularity
* Latest: newly added voices
* Most users
* Character usage
#### Filters
Use filters to refine your search:
##### Language
The language filter returns voices that have been trained on a specific language. While all voices can be used with any supported language, voices tagged with a specific language will perform best in that language. Some voices have been assessed as performing well in multiple languages, and these voices will also be returned when you search for a specific language.
##### Accent
When you select a language, the Accent filter will also become available, allowing you to filter for specific accents.
##### Category
Filter voices by their suggested use case:
* Narrative & Story
* Conversational
* Characters & Animation
* Social Media
* Entertainment & TV
* Advertisement
* Informative & Educational
##### Gender
* Male
* Female
* Neutral
##### Age
* Young
* Middle Aged
* Old
##### Notice period
Some voices have a notice period. This is how long you'll continue to have access to the voice if the voice owner decides to remove it from the Voice Library. If the voice's owner stops sharing their voice, you'll receive advance notice through email and in-app notifications. These notifications specify when the voice will become unavailable and recommend similar voices from the Voice Library. If the owner of a voice without a notice period decides to stop sharing their voice, you'll lose access to the voice immediately.
This filter allows you to only return voices that have a notice period, and search for voices with a specific notice period. The maximum notice period is 2 years.
##### Live Moderation enabled
Some voices have Live Moderation enabled. This is indicated with a label with a shield icon. When you generate using a voice with Live Moderation enabled, we use tools to check whether the text being generated belongs to a number of prohibited categories. This may introduce extra latency when using the voice.
This filter allows you to exclude voices that have Live Moderation enabled.
##### Custom rate
Some voices have a credit multiplier in place. This is shown by a \$ icon. This means that the voice owner has set a custom rate for use of their voice. Please pay close attention as voices that have a custom rate will cost more to generate with.
### Using voices from the Voice Library
To use a voice from the Voice Library, you'll need to add it to My Voices. To do this, click the **+** button.
A pop-up will appear which will give you more information about the voice. You can choose to add the voice to an existing personal collection, create a new collection, or add the voice to My Voices without including it in a collection. To confirm, click **Add voice**. This will save it to My Voices using the default name for the voice.
Voices you've added to My Voices will become available for selection in all voice selection menus. You can also use a voice directly from My Voices by clicking the **T** button, which will open Text to Speech with the voice selected.
### My Voices
You can find all the voices you've created yourself, as well as voices you've saved from the Voice Library, in **My Voices**.
You will see the following information about each voice:
* the language it was trained on.
* the category, for example, "Narrative & Story".
* how long the notice period is, if the voice has one.
The voice type is indicated by an icon:
* Yellow tick: Professional Voice Clone.
* Black tick: High Quality Professional Voice Clone.
* Lightning icon: Instant Voice Clone.
* || icon: ElevenLabs Default voice.
* No icon: voice created with Voice Design.
#### More actions
Click **More actions** (three dots) to:
* Copy voice ID: copies the voice ID to your clipboard.
* Edit voice: allows you to change the name and description of the voice. These changes are only visible to you.
* Share voice: generates a link which you can share with others. When they use the link, the voice will be added to My Voices for their account.
* View history: view your previous Text to Speech generations using this voice.
* Delete voice: deleting voices is permanent and you will be asked to confirm the deletion.
#### Collections
To help organize voices you've saved, you can create your own collections and add voices to them.
To create a new collection, click **Collections** and select **Create collection**. Give your new collection a name, and choose from the available icons.
To add individual voices to a collection, click **More actions** (three dots) and select **Add to collection**. You can choose to add the voice to an existing collection, or create a new one.
#### Select multiple voices
You can **Shift + Click** to select multiple voices at once.
#### Drag and drop voices
Both individual voices and multiple voice selections can also be dragged **Collections** and added to an existing collection, or deleted by dragging to the **trash can** icon.
### Sharing a Professional Voice Clone:
In [My Voices](https://elevenlabs.io/app/voice-lab) find your voice and click **More actions**
(three dots), then select **Share voice**.
In the pop-up, enable the
**Sharing**
toggle.
For private sharing, copy the sharing link. This will allow other users to save your voice to their account.
You can restrict access to specific users by adding emails to the **Allowlist**. If this is left blank, all users with the link will be able to access your voice.
To share publicly, enable **Publish to the Voice Library**. This doesn’t make your voice automatically discoverable.

Before proceeding with the sharing process, you'll have a number of options including setting a notice period and enabling Live Moderation. Please see the [Voice Library Addendum](https://elevenlabs.io/vla) to our [Terms of Service](https://elevenlabs.io/terms) for more information about these options.
You also have the option to select a custom voice preview. Any generations you've made of 70-150 characters will be available to select. If you don't see any options in the selection menu, there are no eligible generations available.

Enter a name and description for your voice.
Make sure the name you give your voice follows our **naming guidelines**:
#### Naming guidelines
* The naming pattern should be a one-word name followed by a 2-4 word description, separated by a hyphen (-).
* Your name should NOT include the following:
* Names of public individuals or entities (company names, band names, influencers or famous people, etc).
* Social handles (Twitter, Instagram, you name it, etc).
* ALL CAPS WORDS.
* Emojis and any other non-letter characters.
* Explicit or harmful words.
* The word “voice”.
* Some examples of names following our guidelines:
* Anna - calm and kind
* Robert - friendly grandpa
* Steve - wise teacher
* Harmony - soothing serenader
* Jasper - jovial storyteller
* Maya - confident narrator
Set labels (language, accent, gender, age, use case, tone, and style) to help others find your
voice.
Review and accept the [Voice Library Addendum](https://elevenlabs.io/terms#VLA) to our [Terms of
Service](https://elevenlabs.io/terms) and provide the required consents and confirmations. Please
do this carefully and ensure you fully understand our service before sharing. If you have any
questions at this stage, you can reach out to us at [legal@elevenlabs.io](mailto:legal@elevenlabs.io).
After submission, your voice will be reviewed by our team. If minor adjustments are needed, we may make these for you. Your request to share your voice may be declined if it doesn't meet our guidelines, and repeated uploads that consistently violate our guidelines may lead to restrictions on uploading and sharing voices.
We currently do not have an estimate for the review time, as it depends on the queue.
# Voice Cloning
> Learn how to clone your voice to using our best-in-class models.
## Overview
When cloning a voice, there are two main options: Instant Voice Cloning (IVC) and Professional Voice Cloning (PVC). IVC is a quick and easy way to clone your voice, while PVC is a more accurate and customizable option.
## Instant Voice Cloning

IVC allows you to create voice clones from shorter samples near instantaneously. Creating an instant voice clone does not train or create a custom AI model. Instead, it relies on prior knowledge from training data to make an educated guess rather than training on the exact voice. This works extremely well for a lot of voices.
However, the biggest limitation of IVC is if you are trying to clone a very unique voice with a very unique accent where the AI might not have heard a similar voices before during training. In such cases, creating a custom model with explicit training using PVC might be the best option.
## Professional Voice Cloning

A PVC is a special feature that is available to our Creator+ plans. PVC allows you to train a hyper-realistic model of a voice. This is achieved by training a dedicated model on a large set of voice data to produce a model that’s indistinguishable from the original voice.
Since the custom models require fine-tuning and training, it will take a bit longer to train these PVCs compared to an IVC. Giving an estimate is challenging as it depends on the number of people in the queue before you and a few other factors.
Here are the current estimates for PVC:
* **English:** \~3 hours
* **Multilingual:** \~6 hours
## Beginner's guide to audio recording
If you're new to audio recording, here are some tips to help you get started.
### Recording location
When recording audio, choose a suitable location and set up to minimize room echo/reverb.
So, we want to "deaden" the room as much as possible. This is precisely what a vocal booth that is acoustically treated made for, and if you do not have a vocal booth readily available, you can experiment with some ideas for a DIY vocal booth, “blanket fort”, or closet.
Here are a few YouTube examples of DIY acoustics ideas:
* [I made a vocal booth for \$0.00!](https://www.youtube.com/watch?v=j4wJMDUuHSM)
* [How to Record GOOD Vocals in a BAD Room](https://www.youtube.com/watch?v=TsxdHtu-OpU)
* [The 5 BEST Vocal Home Recording TIPS!](https://www.youtube.com/watch?v=K96mw2QBz34)
### Microphone, pop-filter, and audio interface
A good microphone is crucial. Microphones can range from \$100 to \$10,000, but a professional XLR microphone costing \$150 to \$300 is sufficient for most voiceover work.
For an affordable yet high-quality setup for voiceover work, consider a **Focusrite** interface paired with an **Audio-Technica AT2020** or **Rode NT1 microphone**. This setup, costing between \$300 to \$500, offers high-quality recording suitable for professional use, with minimal self-noise for clean results.
Please ensure that you have a proper **pop-filter** in front of the microphone when recording to avoid plosives as well as breaths and air hitting the diaphragm/microphone directly, as it will sound poor and will also cause issues with the cloning process.
### Digital Audio Workstation (DAW)
There are many different recording solutions out there that all accomplish the same thing: recording audio. However, they are not all created equally. As long as they can record WAV files at 44.1kHz or 48kHz with a bitrate of at least 24 bits, they should be fine. You don't need any fancy post-processing, plugins, denoisers, or anything because we want to keep audio recording simple.
If you want a recommendation, we would suggest something like **REAPER**, which is a fantastic DAW with a tremendous amount of flexibility. It is the industry standard for a lot of audio work. Another good free option is **Audacity**.
Maintain optimal recording levels (not too loud or too quiet) to avoid digital distortion and excessive noise. Aim for peaks of -6 dB to -3 dB and an average loudness of -18 dB for voiceover work, ensuring clarity while minimizing the noise floor. Monitor closely and adjust levels as needed for the best results based on the project and recording environment.
### Positioning
One helpful guideline to follow is to maintain a distance of about two fists away from the microphone, which is approximately 20cm (7-8 in), with a pop filter placed between you and the microphone. Some people prefer to position the pop filter all the way back so that they can press it up right against it. This helps them maintain a consistent distance from the microphone more easily.
Another common technique to avoid directly breathing into the microphone or causing plosive sounds is to speak at an angle. Speaking at an angle ensures that exhaled air is less likely to hit the microphone directly and, instead, passes by it.
### Performance
The performance you give is one of the most crucial aspects of this entire recording session. The AI will try to clone everything about your voice to the best of its ability, which is very high. This means that it will attempt to replicate your cadence, tonality, performance style, the length of your pauses, whether you stutter, take deep breaths, sound breathy, or use a lot of "uhms" and "ahs" – it can even replicate those. Therefore, what we want in the audio file is precisely the performance and voice that we want to clone, nothing less and nothing more. That is also why it's quite important to find a script that you can read that fits the tonality we are aiming for.
When recording for AI, it is very important to be consistent. if you are recording a voice either keep it very animated throughout or keep it very subdued throughout you can't mix and match or the AI can become unstable because it doesn't know what part of the voice to clone. same if you're doing an accent keep the same accent throughout the recording. Consistency is key to a proper clone!
# Instant Voice Cloning
> Learn how to clone your voice instantly using our best-in-class models.
## Creating an Instant Voice Clone
When cloning a voice, it's important to consider what the AI has been trained on: which languages and what type of dataset. You can find more information about which languages each model has been trained on in our [help center](https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them).
Read more about each individual model and their strengths in the [Models page](/docs/models)).
## Guide
If you are unsure about what is permissible from a legal standpoint, please consult the [Terms of
Service](https://elevenlabs.io/terms-of-use) and our [AI Safety
information](https://elevenlabs.io/safety) for more information.
### Navigate to the Instant Voice Cloning page
In the ElevenLabs dashboard, select the "Voices" section on the left, then click "Add a new voice".
From the modal, select "Instant Voice Clone".
### Upload or record your audio
Follow the on-screen instructions to upload or record your audio.
### Confirm voice details
Name and label your voice clone, confirm that you have the right and consent to clone the voice, then click "Save voice".
### Use your voice clone
Under the "Voices" section in the dashboard, select the "Personal" tab, then click on your voice clone to begin using it.
## Best practices
#### Record at least 1 minute of audio
Avoid recording more than 3 minutes, this will yield little improvement and can, in some cases, even be detrimental to the clone.
How the audio was recorded is more important than the total length (total runtime) of the samples. The number of samples you use doesn't matter; it is the total combined length (total runtime) that is the important part.
Approximately 1-2 minutes of clear audio without any reverb, artifacts, or background noise of any kind is recommended. When we speak of "audio or recording quality," we do not mean the codec, such as MP3 or WAV; we mean how the audio was captured. However, regarding audio codecs, using MP3 at 128 kbps and above is advised. Higher bitrates don't have a significant impact on the quality of the clone.
#### Keep the audio consistent
The AI will attempt to mimic everything it hears in the audio. This includes the speed of the person talking, the inflections, the accent, tonality, breathing pattern and strength, as well as noise and mouth clicks. Even noise and artefacts which can confuse it are factored in.
Ensure that the voice maintains a consistent tone throughout, with a consistent performance. Also, make sure that the audio quality of the voice remains consistent across all the samples. Even if you only use a single sample, ensure that it remains consistent throughout the full sample. Feeding the AI audio that is very dynamic, meaning wide fluctuations in pitch and volume, will yield less predictable results.
#### Replicate your performance
Another important thing to keep in mind is that the AI will try to replicate the performance of the voice you provide. If you talk in a slow, monotone voice without much emotion, that is what the AI will mimic. On the other hand, if you talk quickly with much emotion, that is what the AI will try to replicate.
It is crucial that the voice remains consistent throughout all the samples, not only in tone but also in performance. If there is too much variance, it might confuse the AI, leading to more varied output between generations.
#### Find a good balance for the volume
Find a good balance for the volume so the audio is neither too quiet nor too loud. The ideal would be between -23 dB and -18 dB RMS with a true peak of -3 dB.
# Professional Voice Cloning
> Learn how to clone your voice professionally using our best-in-class models.
## Creating a Professional Voice Clone
When cloning a voice, it's important to consider what the AI has been trained on: which languages and what type of dataset. You can find more information about which languages each model has been trained on in our [help center](https://help.elevenlabs.io/hc/en-us/articles/17883183930129-What-models-do-you-offer-and-what-is-the-difference-between-them).
Read more about each individual model and their strengths in the [Models page](/docs/models)).
## Guide
If you are unsure about what is permissible from a legal standpoint, please consult the [Terms of
Service](https://elevenlabs.io/terms-of-use) and our [AI Safety
information](https://elevenlabs.io/safety) for more information.
### Navigate to the Professional Voice Cloning page
In the ElevenLabs dashboard, select the **Voices** section on the left, then click **Add a new voice**.
From the pop-up, select **Professional Voice Clone**.
### Upload your audio

Upload your audio samples by clicking **Upload samples**.
If you don't already have pre-recorded training audio, you can also record directly into the interface by selecting **Record yourself**. We've included sample scripts for narrative, conversational and advertising purposes. You can also upload your own script.
### Check the feedback on sample length
Once your audio has been uploaded, you will see feedback on the length of your samples. For the best results, we recommend uploading at least an hour of training audio, and ideally as close to three hours as possible.

### Process your audio
Once your audio samples have been uploaded, you can process them to improve the quality. You can remove any background noise, and you can also separate out different speakers, if your audio includes more than one speaker. To access these options, click the **Audio settings** button next to the clip you want to process.

### Verify your voice
Once everything is recorded and uploaded, you will be asked to verify your voice. To ensure a smooth experience, please try to verify your voice using the same or similar equipment used to record the samples and in a tone and delivery that is similar to those present in the samples. If you do not have access to the same equipment, try verifying the best you can. If it fails, you can either wait 24 hours to try verification again, or reach out to support for help.
### Wait for your voice to complete fine tuning
Before you can use your voice, it needs to complete the fine tuning process. You can check the status of your voice in My Voices while it's processing. You'll be notified when it's ready to use.
### Use your voice clone
Under the **Voices** section in the dashboard, select the **Personal** tab, then click **Use** next to your voice clone to begin using it.
There are a few things to be mindful of before you start uploading your samples, and some steps that you need to take to ensure the best possible results.
### Record high quality audio
Professional Voice Cloning is highly accurate in cloning the samples used for its training. It will create a near-perfect clone of what it hears, including all the intricacies and characteristics of that voice, but also including any artifacts and unwanted audio present in the samples. This means that if you upload low-quality samples with background noise, room reverb/echo, or any other type of unwanted sounds like music or multiple people speaking, the AI will try to replicate all of these elements in the clone as well.
### Ensure there's only a single speaking voice
Make sure there's only a single speaking voice throughout the audio, as more than one speaker or excessive noise or anything of the above can confuse the AI. This confusion can result in the AI being unable to discern which voice to clone or misinterpreting what the voice actually sounds like because it is being masked by other sounds, leading to a less-than-optimal clone.
### Provide enough material
Make sure you have enough material to clone the voice properly. The bare minimum we recommend is 30 minutes of audio, but for the optimal result and the most accurate clone, we recommend closer to 2-3 hours of audio. The more audio provided the better the quality of the resulting clone.
### Keep the style consistent
The speaking style in the samples you provide will be replicated in the output, so depending on what delivery you are looking for, the training data should correspond to that style (e.g. if you are looking to voice an audiobook with a clone of your voice, the audio you submit for training should be a recording of you reading a book in the tone of voice you want to use). It is better to just include one style in the uploaded samples for consistencies sake.
### Use samples speaking the language you want the PVC to be used for
It is best to use samples speaking where you are speaking the language that the PVC will mainly be used for. Of course, the AI can speak any language that we currently support. However, it is worth noting that if the voice itself is not native to the language you want the AI to speak - meaning you cloned a voice speaking a different language - it might have an accent from the original language and might mispronounce words and inflections. For instance, if you clone a voice speaking English and then want it to speak Spanish, it will very likely have an English accent when speaking Spanish. We only support cloning samples recorded in one of our supported languages, and the application will reject your sample if it is recorded in an unsupported language.
See the examples below for what to expect from a good and bad recording.
For now, we only allow you to clone your own voice. You will be asked to go through a verification process before submitting your fine-tuning request.
## Tips and suggestions
#### Professional Recording Equipment
Use high-quality recording equipment for optimal results as the AI will clone everything about the audio. High-quality input = high-quality output. Any microphone will work, but an XLR mic going into a dedicated audio interface would be our recommendation. A few general recommendations on low-end would be something like an Audio Technica AT2020 or a Rode NT1 going into a Focusrite interface or similar.
#### Use a Pop-Filter
Use a Pop-Filter when recording. This will minimize plosives when recording.
#### Microphone Distance
Position yourself at the right distance from the microphone - approximately two fists away from the mic is recommended, but it also depends on what type of recording you want.
#### Noise-Free Recording
Ensure that the audio input doesn't have any interference, like background music or noise. The AI cloning works best with clean, uncluttered audio.
#### Room Acoustics
Preferably, record in an acoustically-treated room. This reduces unwanted echoes and background noises, leading to clearer audio input for the AI. You can make something temporary using a thick duvet or quilt to dampen the recording space.
#### Audio Pre-processing
Consider editing your audio beforehand if you're aiming for a specific sound you want the AI to output. For instance, if you want a polished podcast-like output, pre-process your audio to match that quality, or if you have long pauses or many "uhm"s and "ahm"s between words as the AI will mimic those as well.
#### Volume Control
Maintain a consistent volume that's loud enough to be clear but not so loud that it causes distortion. The goal is to achieve a balanced and steady audio level. The ideal would be between -23dB and -18dB RMS with a true peak of -3dB.
#### Sufficient Audio Length
Provide at least 30 minutes of high-quality audio that follows the above guidelines for best results - preferably closer to 2+ hours of audio. The more quality data you can feed into the AI, the better the voice clone will be. The number of samples is irrelevant; the total runtime is what matters. However, if you plan to upload multiple hours of audio, it is better to split it into multiple \~30-minute samples. This makes it easier to upload.
# Voice design
> A guide on how to craft voices from a text prompt.
## Overview
Voice Design helps creators fill the gaps when the exact voice they are looking for isn’t available in the [Voice Library](/app/voice-library). If you can’t find a suitable voice for your project, you can create one. Note that Voice Design is highly experimental and [Professional Voice Clones (PVC)](/docs/product-guides/voices/voice-cloning) are currently the highest quality voices on our platform. If there is a PVC available in our library that fits your needs, we recommend using it instead.
You can find Voice Design by heading to Voices -> My Voices -> Add a new voice -> Voice Design in the [ElevenLabs app](/app/voice-lab?create=true\&creationType=voiceDesign) or via the [API](/docs/api-reference/text-to-voice).
When you hit generate, we'll generate three voice options for you. The only charge for using voice design is the number of credits to generate your preview text, which you are only charged once even though we are generating three samples for you. You can see the number of characters that will be deducted in the "Text to preview" text box.
After generating, you'll have the option to select and save one of the generations, which will take up one of your voice slots.
See the API reference for Voice Design
A Next.js example app for Voice Design
## Prompting guide
The prompt is the foundation of your voice. It tells the model what kind of voice you’re trying to create — everything from the accent and character-type to the gender and vibe of the voice. A well-crafted prompt can be the difference between a generic voice and one that truly fits your vision. In general, more descriptive and granular prompts tend to yield more accurate and nuanced results. The more detail you provide — including age, gender, tone, accent, pacing, emotion, style, and more - the better the model can interpret and deliver a voice that feels intentional and tailored.
However, sometimes short and simple prompts can also work, especially when you're aiming for a more neutral or broadly usable voice. For example, “A calm male narrator” might give you exactly what you need without going into detail — particularly if you're not trying to create a very specific character or style. The right level of detail depends on your use case. Are you building a fantasy character? A virtual assistant? A tired New Yorker in her 60s with a dry sense of humor? The more clearly you define it in your prompt, the closer the output will be to what you're imagining.
### Audio Quality
Audio quality refers to the clarity, cleanliness, and overall fidelity of the generated voice. By default, ElevenLabs aims to produce clean and natural-sounding audio — but if your project requires a specific level of quality, it's best to explicitly include it in your prompt.
For high-quality results, you can help the model by adding a phrase such as “**perfect audio quality**” or “**studio-quality recording**” to your voice description. This helps ensure the voice is rendered with maximum clarity, minimal distortion, and a polished finish.
Including these types of phrases can sometimes reduce the accuracy of the prompt in general if the
voice is very specific or niche.
There may also be creative cases where lower audio quality is intentional, such as when simulating a phone call, old radio broadcast, or found footage. In those situations, either leave out quality descriptors entirely or explicitly include phrases like:
* “Low-fidelity audio”
* “Poor audio quality”
* “Sounds like a voicemail”
* “Muffled and distant, like on an old tape recorder”
The placement of this phrase in your prompt is flexible — it can appear at the beginning or end, though we’ve found it works well at either.
### Age, Tone/Timbre and Gender
These three characteristics are the foundation of voice design, shaping the overall identity and emotional resonance of the voice. The more detail you provide, the easier it is for the AI to produce a voice that fits your creative vision — whether you're building a believable character, crafting a compelling narrator, or designing a virtual assistant.
#### Age
Describing the perceived age of the voice helps define its maturity, vocal texture, and energy. Use specific terms to guide the AI toward the right vocal quality.
**Useful descriptors:**
* “Adolscent male” / “adolescent female”
* “Young adult” / “in their 20s” / “early 30s”
* “Middle-aged man” / “woman in her 40s”
* “Elderly man” / “older woman” / “man in his 80s”
#### Tone/Timbre
Refers to the physical quality of the voice, shaped by pitch, resonance, and vocal texture. It’s distinct from emotional delivery or attitude.
**Common tone/timbre descriptors:**
* “Deep” / “low-pitched”
* “Smooth” / “rich”
* “Gravelly” / “raspy”
* “Nasally” / “shrill”
* “Airy” / “breathy”
* “Booming” / “resonant”
* “Light” / “thin”
* “Warm” / “mellow”
* “Tinny” / “metallic”
#### Gender
Gender often typically influences pitch, vocal weight, and tonal presence — but you can push beyond simple categories by describing the sound instead of the identity.
**Examples:**
* “A lower-pitched, husky female voice”
* “A masculine male voice, deep and resonant”
* “A neutral gender — soft and mid-pitched”
#### Accent
Accent plays a critical role in defining a voice’s regional, cultural, and emotional identity. If your project depends on an authentic sound — whether it's grounded in realism or stylized for character — being clear and deliberate about the desired accent is essential.
Phrase choice matters - certain terms tend to produce more consistent results. For example, “thick” often yields better results than “strong” when describing how prominent an accent should be. There is lots of trial and error to be had, and we encourage you to experiment with the wording and to be as creative and descriptive as possible.
* **Examples of clear accent prompts:**
* “A middle-aged man with a thick French accent”
* “A young woman with a slight Southern drawl”
* “An old man with a heavy Eastern European accent”
* “A cheerful woman speaking with a crisp British accent”
* “A younger male with a soft Irish lilt”
* “An authoritative voice with a neutral American accent”
* “A man with a regional Australian accent, laid-back and nasal”
Avoid overly vague descriptors like “foreign” or “exotic” — they’re imprecise and can produce inconsistent results.
Combine accent with other traits like tone, age, or pacing for better control. E.g., “A sarcastic old woman with a thick New York accent, speaking slowly.”
For fantasy or fictional voices, you can suggest real-world accents as inspiration:
* “An elf with a proper thick British accent. He is regal and lyrical.”
* “A goblin with a raspy Eastern European accent.”
### Pacing
Pacing refers to the speed and rhythm at which a voice speaks. It's a key component in shaping the personality, emotional tone, and clarity of the voice. Being explicit about pacing is essential, especially when designing voices for specific use cases like storytelling, advertising, character dialogue, or instructional content.
Use clear language to describe how fast or slow the voice should speak. You can also describe how the pacing feels — whether it's steady, erratic, deliberate, or breezy. If the pacing shifts, be sure to indicate where and why.
**Examples of pacing descriptors:**
* “Speaking quickly” / “at a fast pace”
* “At a normal pace” / “speaking normally”
* “Speaking slowly” / “with a slow rhythm”
* “Deliberate and measured pacing”
* “Drawn out, as if savoring each word”
* “With a hurried cadence, like they’re in a rush”
* “Relaxed and conversational pacing”
* “Rhythmic and musical in pace”
* “Erratic pacing, with abrupt pauses and bursts”
* “Even pacing, with consistent timing between words”
* “Staccato delivery”
### Text to preview
Once you've written a strong voice prompt, the text you use to preview that voice plays a crucial role in shaping how it actually sounds. The preview text acts like a performance script — it sets the tone, pacing, and emotional delivery that the voice will attempt to match.
To get the best results, your preview text should complement the voice description, not contradict it. For example, if your prompt describes a “calm and reflective younger female voice with a slight Japanese accent,” using a sentence like “Hey! I can't stand what you’ve done with the darn place!!!” will clash with that intent. The model will try to reconcile that mismatch, often leading to unnatural or inconsistent results.
Instead, use sample text that reflects the voice’s intended personality and emotional tone. For the example above, something like “It’s been quiet lately... I’ve had time to think, and maybe that’s what I needed most.” supports the prompt and helps generate a more natural, coherent voice.
Additionally, we’ve found that longer preview texts tend to produce more stable and expressive results. Short phrases can sometimes sound abrupt or inconsistent, especially when testing subtle qualities like tone or pacing. Giving the model more context — a full sentence or even a short paragraph — allows it to deliver a smoother and more accurate representation of the voice.
### Parameters
#### Loudness
Controls the volume of the Text to Preview generation, and ultimately the voice once saved.
#### Guidance Scale
Dictates how closely the Prompt is adhered to. higher/lower values will stick to the prompt more strictly but could result in poorer audio quality if the prompt is very niche, while higher/lower/ values will allow the model to be more creative at the cost of prompt accuracy. Use a high value in this case if the performance and audio quality is more important than nailing the prompt. High/low values are recommended when accent or tone accuracy is of paramount importance
### Attributes and Examples
Experiment with the way in which these descriptors are written. For example, “Perfect audio
quality” can also be written as “the audio quality is perfect”. These can sometimes produce
different results!
| Attribute | Examples |
| ---------------------- | ------------------------------------------------------------------------------------------------ |
| Age | Young, younger, adult, old, elderly, in his/her 40s |
| Accent | "thick" Scottish accent, "slight" Asian-American accent, Southern American accent |
| Gender | Male, female, gender-neutral, ambiguous gender |
| Tone/Timbre/pitch | Deep, warm, gravelly, smooth, shrill, buttery, raspy, nasally, throaty, harsh, robotic, ethereal |
| Pacing | Normal cadence, fast-paced, quickly, slowly, drawn out, calm pace, natural/conversational pace |
| Audio Quality | Perfect audio quality, audio quality is 'ok', poor audio quality |
| Character / Profession | Pirate, businessman, farmer, politician, therapist, ogre, godlike being, TV announcer |
| Emotion | Energetic, excited, sad, emotional, sarcastic, dry |
| Pitch | Low-pitched, high-pitched, normal pitch |
### Example Prompts and Text Previews
| Voice Type | Prompt/Description | Text Preview | Guidance Scale |
| :----------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------- |
| Female Sports Commentator | A high-energy female sports commentator with a thick British accent, passionately delivering play-by-play coverage of a football match in a very quick pace. Her voice is lively, enthusiastic, and fully immersed in the action. | OH MY WORD — WHAT A GOAL! She picks it up just past midfield, dances through TWO defenders like they're not even THERE, and absolutely SMASHES it into the top corner! The goalkeeper had NO CHANCE! That is WORLD-CLASS from the young forward, and the crowd is on their FEET! This match has come ALIVE, and you can FEEL the momentum SHIFTING! | 25% |
| Drill Sergeant | An army drill sergeant shouting at his team of soldiers. He sounds angry and is speaking at a fast pace. | LISTEN UP, you sorry lot! I didn't come here to babysit — I came to BUILD SOLDIERS! You move when I say move, and you breathe when I say breathe! You've got ten seconds to fall in line or you'll regret it!! | 25% |
| Evil Ogre | A massive evil ogre speaking at a quick pace. He has a silly and resonant tone. | "Your weapons are but toothpicks to me. \[laughs] Surrender now and I may grant you a swift end. I've toppled kingdoms and devoured armies. What hope do you have against me?" | 30% |
| Relatable British Entrepreneur | Excellent audio quality. A man in his 30s to early 40s with a thick British accent speaking at a natural pace like he's talking to a friend. | \[laughs] See, that's the thing. YOU see a company, while I see... \[lip smacks] I see a promise, ya know what I mean? \[exhales] We don't build just to profit, we build to, to UPLIFT! If our technology doesn't leave the world kinder, smarter, and more connected than we found it… \[sighs] then what are we even doing here? | 40% |
| Southern Woman | An older woman with a thick Southern accent. She is sweet and sarcastic. | "Well sugar, if all we ever do is chase titles and trophies, we're gonna miss the whole darn point. \[light chuckle] I'd rather build somethin' that makes folks' lives easier—and if I can do it in heels with a smile and a touch of sass, even better." | 35% |
| Movie Trailer Voice | Dramatic voice, used to build anticipation in movie trailers, typically associated with action or thrillers | "In a world on the brink of chaos, one hero will rise. Prepare yourself for a story of epic proportions, coming soon to the big screen." | 20% |
| Squeaky Mouse | A cute little squeaky mouse | "I may be small, but my attitude is anything but! \[giggles] Watch it, big feet, or I'll give your toes a nibble you won't forget!" | 20% |
| Angry Pirate | An angry old pirate, loud and boisterous | "I've faced storms that would turn your hair white and sea monsters that would make your knees quake. You think you can cross Captain Blackheart and live to tell the tale?" | 30% |
| New Yorker | Deep, gravelly thick New York accent, tough and world-weary, often cynical | "I've been walking these streets longer than you can imagine, kid. There's nothing you can say or do that'll surprise me anymore." | 40% |
| Mad Scientist | A voice of an eccentric scientific genius with rapid, erratic speech patterns that accelerate with excitement. His German-tinged accent becomes more pronounced when agitated. The pitch varies widely from contemplative murmurs to manic exclamations, with frequent eruptions of maniacal laughter. | "I am doctor Heinrich, revolutionary genius rejected by the narrow-minded scientific establishment! Bah! They called my theories impossible, my methods unethical—but who is laughing now? (maniacal laughter) For twenty years I have perfected my creation in this mountain laboratory, harnessing energies beyond mortal comprehension! The fools at the academy will regret their mockery when my quantum destabilizer reveals the multiverse. Or perhaps new life forms... the experiment has certain unpredictable variables... FASCINATING ones!" | 38% |
# Payouts
> Earn rewards by sharing your voice in the Voice Library.
## Overview
The [Payouts](https://elevenlabs.io/payouts) system allows you to earn rewards for sharing voices in the [Voice library](/docs/product-guides/voices/voice-library). ElevenLabs uses Stripe Connect to process reward payouts.
## Account setup
To set up your Payouts account:
* Click on your account in the bottom left and select ["Payouts"](/app/payouts).

* Follow the prompts from Stripe Connect to complete the account setup.
## Tracking usage and earnings
* You can track the usage of your voices by going to ["My Voices"](/app/voice-lab), clicking "View" to open the detailed view for your voice, then clicking the sharing icon at the bottom. Once you have the Sharing Options open, click "View Metrics".
* The rewards you earn are based on the options you selected when [sharing your voice in the Voice Library](/docs/product-guides/voices/voice-library#sharing-voices).
* You can also see your all-time earnings and past payouts by going back to your Payouts page.
## Reader app rewards
* If your voice is marked as **[High-Quality](/docs/product-guides/voices/voice-library#category)** and you have activated the "Available in ElevenReader" toggle, your voice will made be available in the [ElevenReader app](/text-reader). Rewards for ElevenReader are reported separately – to view your Reader App rewards, check the "ElevenReader" box on your "View Metrics" screen.
## Things to know
* Rewards accumulate frequently throughout the day, but payouts typically happen once a week as long as you have an active paid subscription and your accrued payouts exceed the minimum threshold. In most cases this is \$10, but some countries may have a higher threshold.
* You can see your past payouts by going to your [Payouts](/app/payouts) page in the sidebar.
## Supported countries
* Currently, Stripe Connect is not supported in all countries. We are constantly working to expand our reach for Payouts and plan to add availability in more countries when possible.
- Argentina
- Australia
- Austria
- Belgium
- Bulgaria
- Canada
- Chile
- Colombia
- Croatia
- Cyprus
- Czech Republic
- Denmark
- Estonia
- Finland
- France
- Germany
- Greece
- Hong Kong SAR
- China
- Hungary
- Iceland
- India
- Indonesia
- Ireland
- Israel
- Italy
- Japan
- Latvia
- Liechtenstein
- Lithuania
- Luxembourg
- Malaysia
- Malta
- Mexico
- Monaco
- Netherlands
- New Zealand
- Nigeria
- Norway
- Peru
- Philippines
- Poland
- Portugal
- Qatar
- Romania
- Saudi Arabia
- Singapore
- Slovakia
- Slovenia
- South Africa
- South Korea
- Spain
- Sweden
- Switzerland
- Thailand
- Taiwan
- Turkey
- United Arab Emirates
- United Kingdom
- United States
- Uruguay
- Vietnam
# Audio Native
> Easily embed ElevenLabs on any web page.
## Overview
Audio Native is an embedded audio player that automatically voices content of a web page using ElevenLab’s [Text to Speech](/docs/product-guides/playground/text-to-speech) service. It can also be used to embed pre-generated content from a project into a web page. All it takes to embed on your site is a small HTML snippet. In addition, Audio Native provides built-in metrics allowing you to precisely track audience engagement via a listener dashboard.
The end result will be a Audio Native player that can narrate the content of a page, or, like in the case below, embed pre-generated content from a project:
## Guide
In the ElevenLabs dashboard, under "Audio Tools" navigate to ["Audio Native"](/app/audio-native).
Customize the player appearance by selecting background and text colors.
The URL allowlist is the list of web pages that will be permitted to play your content.
You can choose to add a specific web page (e.g. `https://elevenlabs.io/blog`) or add a whole domain to the allowlist (e.g. `http://elevenlabs.io`). If a player is embedded on a page that is not in the allowlist, it will not work as intended.
Once you've finished configuring the player and allowlist, copy the embed code and paste it into your website's source code.
## Technology-specific guides
To integrate Audio Native into your web techology of choice, see the following guides:
## Using the API
You can use the [Audio Native API](/docs/api-reference/audio-native/create) to programmatically create an Audio Native player for your existing content.
```python title="Python"
from elevenlabs import ElevenLabs
elevenlabs = ElevenLabs(
api_key="YOUR_API_KEY",
)
response = elevenlabs.audio_native.create(
name="name",
)
# Use the snippet in response.html_snippet to embed the player on your website
```
```javascript title="JavaScript"
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
const elevenlabs = new ElevenLabsClient({ apiKey: "YOUR_API_KEY" });
const { html_snippet } = await elevenlabs.audioNative.create({
name: "my-audio-native-player"
});
// Use the HTML code in html_snippet to embed the player on your website
```
## Settings
### Voices
To configure the voice and model that will be used to read the content of the page, navigate to the "Settings" tab and select the voice and model you want to use.
### Pronunciation dictionaries
Sometimes you may want to specify the pronunciation of certain words, such as character or brand names, or specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that includes rules about how specified words should be pronounced, either using a phonetic alphabet (phoneme tags) or word substitutions (alias tags).
Whenever one of these words is encountered in a project, the AI will pronounce the word using the specified replacement. When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the first replacement is used.
# Audio Native with React
> Integrate Audio Native into your React apps.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
This guide will show how to integrate Audio Native into React apps. The focus will be on a Next.js project, but the underlying concepts will work for any React based application.
After completing the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native), you'll have an embed code snippet. Here's an example snippet:
```html title="Embed code snippet"
```
We can extract the data from the snippet to create a customizable React component.
```tsx title="ElevenLabsAudioNative.tsx" maxLines=0
// ElevenLabsAudioNative.tsx
'use client';
import { useEffect } from 'react';
export type ElevenLabsProps = {
publicUserId: string;
textColorRgba?: string;
backgroundColorRgba?: string;
size?: 'small' | 'large';
children?: React.ReactNode;
};
export const ElevenLabsAudioNative = ({
publicUserId,
size,
textColorRgba,
backgroundColorRgba,
children,
}: ElevenLabsProps) => {
useEffect(() => {
const script = document.createElement('script');
script.src = 'https://elevenlabs.io/player/audioNativeHelper.js';
script.async = true;
document.body.appendChild(script);
return () => {
document.body.removeChild(script);
};
}, []);
return (
{children ? children : 'Elevenlabs AudioNative Player'}
);
};
export default ElevenLabsAudioNative;
```
The above component can be found on [GitHub](https://github.com/elevenlabs/elevenlabs-examples/blob/main/examples/audio-native/react/ElevenLabsAudioNative.tsx).
Before using the component on your page, you need to retrieve your public user ID from the original code snippet. Copy the contents of `data-publicuserid` from the embed code snippet and insert it into the `publicUserId` prop of the component.
```tsx title="page.tsx" maxLines=0
import { ElevenLabsAudioNative } from './path/to/ElevenLabsAudioNative';
export default function Page() {
return (
Your Page Title
// Insert the public user ID from the embed code snippet
Your page content...
);
}
```
The component props can be used to customize the player. For example, you can change the size, text color, and background color.
```tsx title="page.tsx" maxLines=0
import { ElevenLabsAudioNative } from './path/to/ElevenLabsAudioNative';
export default function Page() {
return (
Your Page Title
Your page content...
);
}
```
# Audio Native with Ghost
> Integrate Audio Native into your Ghost blog.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
Navigate to your Ghost blog, sign in and open the settings page for the blog post you wish to narrate.
Click the "+" symbol on the left and select "HTML" from the menu.
Paste the Audio Native embed code into the HTML box and press enter.
```html title="Embed code snippet"
```
Click the "Update" button in the top right corner of the editor, which should now be highlighted in green text.
Finally, navigate to the live version of the blog post. You should see a message to let you know that the Audio Native project is being created. After a few minutes the text in your blog will be converted to an audio article and the embedded audio player will appear.
# Audio Native with Squarespace
> Integrate Audio Native into your Squarespace sites.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
Navigate to your Squarespace site, sign in and open the page you wish to add narration to.
Click the "+" symbol on the spot you want to place the Audio Native player and select "Code" from the menu.
Paste the Audio Native embed code into the HTML box and press enter.
```html title="Embed code snippet"
```
Click the "Save" button in the top right corner of the editor, which should now be highlighted.
Finally, navigate to the live version of the blog post. You should see a message to let you know that the Audio Native project is being created. After a few minutes the text in your blog will be converted to an audio article and the embedded audio player will appear.
# Audio Native with Framer
> Integrate Audio Native into your Framer websites.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
Navigate to your Framer page, sign in and go to your site settings. From the Audio Native embed code, extract the `
```
On your Framer blog page, add an Embed Element from Utilities.
In the config for the Embed Element, switch the type to HTML and paste the `` snippet from the Audio Native embed code into the HTML box.
```html title="Embed div"
```
Finally, publish your changes and navigate to the live version of your page. You should see a message to let you know that the Audio Native project is being created. After a few minutes the text in your blog will be converted to an audio article and the embedded audio player will appear.
# Audio Native with Webflow
> Integrate Audio Native into your Webflow sites.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
Navigate to your Webflow blog, sign in and open the editor for the blog post you wish to narrate.
Click the "+" symbol in the top left and select "Code Embed" from the Elements menu.
Paste the Audio Native embed code into the HTML box and click "Save & Close".
```html title="Embed code snippet"
```
In the Navigator, place the code embed where you want it to appear on the page.
Finally, publish your changes and navigate to the live version of the blog post. You should see a message to let you know that the Audio Native project is being created. After a few minutes the text in your blog will be converted to an audio article and the embedded audio player will appear.
# Audio Native with WordPress
> Integrate Audio Native into your WordPress sites.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
Install the [WPCode plugin](https://wpcode.com/) into your WordPress website to embed HTML code.
In the WordPress admin console, click on "Code Snippets". Add the Audio Native embed code to the new code snippet.
```html title="Embed code snippet"
```
Pick "Auto Insert" for the insert method and set the location to be "Insert Before Content".
Finally, publish your changes and navigate to the live version of the blog post. You should see a message to let you know that the Audio Native project is being created. After a few minutes the text in your blog will be converted to an audio article and the embedded audio player will appear.
# Audio Native with Wix
> Integrate Audio Native into your Wix sites.
Follow the steps in the [Audio Native overview](/docs/product-guides/audio-tools/audio-native) to
get started with Audio Native before continuing with this guide.
Navigate to your Wix site, sign in and open the settings page for the page you wish to narrate.
Click the "+" symbol at the top of your content and select "HTML Code" from the menu.
Paste the Audio Native embed code into the HTML box and click "Save".
```html title="Embed code snippet"
```
Click the "Publish" button in the top right corner of the editor.
Finally, navigate to the live version of the blog post. You should see a message to let you know that the Audio Native project is being created. After a few minutes the text in your blog will be converted to an audio article and the embedded audio player will appear.
# Voiceover studio
> A guide on how to create long-form content with ElevenLabs Voiceover Studio
## Overview
Voiceover Studio combines the audio timeline with our Sound Effects feature, giving you the ability to write a dialogue between any number of speakers, choose those speakers, and intertwine your own creative sound effects anywhere you like.
VIDEO
## Guide
In the ElevenLabs dashboard, click on the "Voiceover Studio" option in the sidebar under "Audio
Tools".
Click the "Create a new voiceover" button to begin. You can optionally upload video or audio to
create a voiceover from.
On the bottom half of your screen, use the timeline to add and edit voiceover clips plus add
sound effects.
Once you're happy with your voiceover, click the "Export" button in the bottom right, choose the
format you want and either view or download your voiceover.
## FAQ
### Timeline
The timeline is a linear representation of your Voiceover project. Each row represents a track, and on the far left section you have the track information for voiceover or SFX tracks. In the middle, you can create the clips that represent when a voice is speaking or a SFX is playing. On the right-hand side, you have the settings for the currently selected clip.
### Adding Tracks
To add a track, click the "Add Track" button in the bottom left of the timeline. You can choose to add a voiceover track or an SFX track.
There are three types of tracks you can add in the studio: Voiceover tracks, SFX tracks and uploaded audio.
* **Voiceover Tracks:** Voiceover tracks create new Speakers. You can click and add clips on the timeline wherever you like. After creating a clip, start writing your desired text on the speaker cards above and click "Generate". Similar to Dubbing Studio, you will also see a little cogwheel on each Speaker track - simply click on it to adjust the voice settings or replace any speaker with a voice directly from your Voices - including your own Professional Voice Clone if you have created one.
* **SFX Tracks:** Add a SFX track, then click anywhere on that track to create a SFX clip. Similar to our independent SFX feature, simply start writing your prompt in the Speaker card above and click "Generate" to create your new SFX audio. You can lengthen or shorten SFX clips and move them freely around your timeline to fit your project - make sure to press the "stale" button if you do so.
* **Uploaded Audio:** Add an audio track including background music or sound effects. It's best to avoid uploading audio with speakers, as any speakers in this track will not be detected, so you won't be able to translate or correct them.
### Key Differences from Dubbing Studio
If you chose not to upload a video when you created your Voiceover project, then the entire timeline is yours to work with and there are no time constraints. This differs from Dubbing Studio as it gives you a lot more freedom to create what you want and adjust the timing more easily.
When you Add a Voiceover Track, you will instantly be able to create clips on your timeline. Once you create a Voiceover clip, begin by writing in the Speaker Card above. After generating that audio, you will notice your clip on the timeline will automatically adjust its length based on the text prompt - this is called "Dynamic Generation". This option is also available in Dubbing Studio by right-clicking specific clips, but because syncing is more important with dubbed videos, the default generation type there is "Fixed Generation," meaning the clips' lengths are not affected.
### Credit Costs
Voiceover Studio does not deduct credits to create your initial project. Credits are deducted every time material is generated. Similar to Speech-Synthesis, credit costs for Voiceover Clips are based on the length of the text prompt. SFX clips will deduct 80 credits per generation.
If you choose to Dub (translate) your Voiceover Project into different languages, this will also cost additional credits depending on how much material needs to be generated. The cost is 1 credit per character for the translation, plus the cost of generating the new audio for the additional languages.
### Uploading Scripts
With Voiceover Studio, you have the option to upload a script for your project as a CSV file. You can either include speaker name and line, or speaker name, line, start time and end time. To upload a script, click on the cog icon in the top right hand corner of the page and select "Import Script".
Scripts should be provided in the following format:
```
speaker,line
```
Example input:
```
speaker,line
Joe,"Hey!"
Maria,"Oh, hi Joe! It's been a while."
```
You can also provide start and end times for each line in the following format:
```
speaker,line,start_time,end_time
```
Example input:
```
speaker,line,start_time,end_time
Joe,"Hey!",0.1,1.5
Maria,"Oh, hi Joe! It's been a while.",1.6,2.0
```
Once your script has imported, make sure to assign voices to each speaker before you generate the audio. To do this, click the cog icon in the information for each track, on the left.
If you don't specify start and end times for your clips, Voiceover Studio will estimate how long each clip will be, and distribute them along your timeline.
### Dynamic Duration
By default, Voiceover Studio uses Dynamic Duration, which means that the length of the clip will vary depending on the text input and the voice used. This ensures that the audio sounds as natural as possible, but it means that the length of the clip might change after the audio has been generated. You can easily reposition your clips along the timeline once they have been generated to get a natural sounding flow. If you click "Generate Stale Audio", or use the generate button on the clip, the audio will be generated using Dynamic Duration.
This also applies if you do specify the start and end time for your clips. The clips will generate based on the start time you specify, but if you use the default Dynamic Duration, the end time is likely to change once you generate the audio.
### Fixed Duration
If you need the clip to remain the length specified, you can choose to generate with Fixed Duration instead. To do this, you need to right click on the clip and select "Generate Audio Fixed Duration". This will adjust the length of the generated audio to fit the specified length of the clip. This could lead to the audio sounding unnaturally quick or slow, depending on the length of your clip.
If you want to generate multiple clips at once, you can use shift + click to select multiple clips for a speaker at once, then right click on one of them to select "Generate Audio Fixed Duration" for all selected clips.
# Voice isolator
> A guide on how to remove background noise from audio recordings.
## Overview
Voice isolator is a tool that allows you to remove background noise from audio recordings.
## Guide
To use the voice isolator app, navigate to [Voice Isolator](/app/voice-isolator) under the Audio Tools section. Here you can upload or drag and drop your audio file into the app, or record a new audio file with your device's microphone.
Click "Isolate voice" to start the process. The app will isolate the voice from the background noise and return a new audio file with the isolated voice. Once the process is complete, you can download the audio file or play it back in the app.
The voice isolator functionality is also available via the [API](/docs/api-reference/audio-isolation/audio-isolation) to easily integrate this functionality into your own applications.
Use the voice isolator app.
Use the voice isolator API.
# AI speech classifier
> A guide on how to detect AI audio
## Overview
The AI speech classifier is a tool that allows you to detect if an audio file was generated by ElevenLabs.
## Guide
Select the "AI speech classifier" option from the sidebar under "Audio Tools" in the ElevenLabs
dashboard.
Click the "Upload audio" button upload an audio file and begin scanning.
The AI speech classifier will analyze the audio file and provide a result.
## FAQ
Our classifier maintains high accuracy (99% precision, 80% recall) for audio files generated
with ElevenLabs that have not been modified. We will continue to improve this tool, while
exploring other detection tools that provide transparency about how content was created.
No, the tool is free for all to use.
A [web version](https://elevenlabs.io/ai-speech-classifier) of the tool is available for you to
use without having to log in.
# Account
To begin using ElevenLabs, you'll need to create an account. Follow these steps:
* **Sign Up**: Visit the [ElevenLabs website](https://elevenlabs.io/sign-up) and click on the 'Get started free' button. You can register using your email or through one of the OAuth providers.
* **Verify Email**: Check your email for a verification link from ElevenLabs. Click the link to verify your account.
* **Initial Setup**: After verification, you'll be directed to the Speech Synthesis page where you can start generating audio from text.
**Exercise**: Try out an example to get started or type something, select a voice and click generate!
You can sign up with traditional email and password or using popular OAuth providers like Google, Facebook, and GitHub.
If you choose to sign up with your email, you will be asked to verify your email address before you can start using the service. Once you have verified your email, you will be taken to the Speech Synthesis page, where you can immediately start using the service. Simply type anything into the box and press “generate” to convert the text into voiceover narration. Please note that each time you press “generate” anywhere on the website, it will deduct credits from your quota.
If you sign up using Google OAuth, your account will be intrinsically linked to your Google account, meaning you will not be able to change your email address, as it will always be linked to your Google email.
# Billing
View the pricing page
View your subscription details
When signing up, you will be automatically assigned to the free tier. To view your subscription, click on "My Account" in the bottom left corner and select ["Subscription"](https://elevenlabs.io/app/subscription). You can read more about the different plans [here](https://elevenlabs.io/pricing). At the bottom of the page, you will find a comparison table to understand the differences between the various plans.
We offer five public plans: Free, Starter, Creator, Pro, Scale, and Business. In addition, we also offer an Enterprise option that's specifically tailored to the unique needs and usage of large organizations.
You can see details of all our plans on the subscription page. This includes information about the total monthly credit quota, the number of custom voices you can have saved simultaneously, and the quality of audio produced.
Cloning is only available on the Starter tier and above. The free plan offers three custom voices that you can create using our [Voice Design tool](/docs/product-guides/voices/voice-design), or you can add voices from the [Voice Library](/docs/product-guides/voices/voice-library) if they are not limited to the paid tiers.
You can upgrade your subscription at any time, and any unused quota from your previous plan will roll over to the new one. As long as you don’t cancel or downgrade, unused credits at the end of the month will carry over to the next month, up to a maximum of two months’ worth of credits. For more information, please visit our Help Center articles:
* ["How does credit rollover work?"](https://help.elevenlabs.io/hc/en-us/articles/27561768104081-How-does-credit-rollover-work)
* ["What happens to my subscription and quota at the end of the month?"](https://help.elevenlabs.io/hc/en-us/articles/13514114771857-What-happens-to-my-subscription-and-quota-at-the-end-of-the-month)
From the [subscription page](/app/subscription), you can also downgrade your subscription at any point in time if you would like. When downgrading, it won't take effect until the current cycle ends, ensuring that you won't lose any of the monthly quota before your month is up.
When generating content on our paid plans, you get commercial rights to use that content. If you are on the free plan, you can use the content non-commercially with attribution. Read more about the license in our [Terms of Service](https://elevenlabs.io/terms) and in our Help Center [here](https://help.elevenlabs.io/hc/en-us/articles/13313564601361-Can-I-publish-the-content-I-generate-on-the-platform-).
For more information on payment methods, please refer to the [Help Center](https://help.elevenlabs.io/).
# Usage analytics
Usage analytics lets you view all the activity on the platform for your account or workspace.
To access usage analytics, click on “My Account” in the bottom left corner and select [Usage Analytics](https://elevenlabs.io/app/usage)
There are two tabs for usage analytics. On an Enterprise plan, the account tab shows data for your individual account, whereas the workspace tab covers all accounts under your workspace.
If you're not on an Enterprise plan, the data will be the same for your account and your workspace, but some information will only be available in your workspace tab, such as your Voice Add/Edit Operations quota.
## Credit usage
In the Credit Usage section, you can filter your usage data in a number of different ways.
In the account tab, you can break your usage down by voice, product or API key, for example.
In the workspace you have additional options allowing you to break usage down by individual user or workspace group.
You can view the data by day, week, month or cumulatively. If you want to be more specific, you can use filters to show only your usage for specific voices, products or API keys.
This feature is quite powerful, allowing you to gain great insights into your usage or understand your customers' usage if you've implemented us in your product.
## API requests
In the API Requests section, you'll find not only the total number of requests made within a specific timeframe but also the number of concurrent requests during that period.
You can view data by different time periods, for example, hour, day, month and year, and at different levels of granularity.
## Export data
You also have the option to export your usage data as a CSV file. To do this, just click the "Export as CSV" button, and the data from your current view will be exported and downloaded.
# Workspaces
> An overview on how teams can collaborate in a shared Workspace.
Workspaces are currently only available for Scale, Business and Enterprise customers.
## Overview
For teams that want to collaborate in ElevenLabs, we offer shared Workspaces. Workspaces offer the following benefits:
* **Shared billing** - Rather than having each of your team members individually create & manage subscriptions, all of your team’s character usage and billing is centralized under one Workspace.
* **Shared resources** - Within a Workspace, your team can share: voices, studio instances, conversational AI agents, dubbings and more.
* **Access management** - Your Workspace admin can easily add and remove team members.
* **API Key management** - You can issue and revoke unlimited API keys for your team.
## FAQ
### Creating a Workspace
Workspaces are automatically enabled on all accounts with Scale, Business and Enterprise subscriptions. On the Scale and Business plans, the account owner will be the Workspace admin by default. They will have the power to add more team members as well as nominate others to be an admin. When setting up your Enterprise account, you’ll be asked to nominate a Workspace admin.
### Adding a team member to a Workspace
Only administrators can add and remove team members.
Once you are logged in, select your profile in the bottom left of the dashboard and choose **Workspace settings** and then navigate to the **Members** tab. From there you'll be able to add team members, assign roles and remove members from the Workspace.
#### Bulk Invites
Enterprise customers can invite their users in bulk once their domain has been verified following the [Verify Domain step](/docs/product-guides/administration/workspaces/sso#verify-your-email-domain) from the SSO configuration process.
#### User Auto Provisioning
Enterprise customers can enable user auto provisioning via the **Security & SSO** tab in workspace settings. When this is enabled, new users with an email domain matching one of your verified domains will automatically join your workspace and take up a seat.
### Roles
There are two roles, Admins and Members. Members have full access to your Workspace and can generate an unlimited number of characters (within your current overall plan’s limit).
Admins have all of the access of Members, with the added ability to add/remove teammates and permissions to manage your subscription.
### Managing Billing
Only admins can manage billing.
To manage your billing, select your profile in the bottom left of the dashboard and choose **Subscription**. From there, you’ll be able to update your payment information and access past invoices.
### Managing Service Accounts
To manage Service Accounts, select your profile in the bottom left of the dashboard and choose **Workspace settings**. Navigate to the **Service Accounts** tab and you’ll be able to create / delete service accounts as well as issue new API keys for those service accounts.
"Workspace API keys" were formerly a type of Service Account with a single API key.
### Managing the Workspace owner
Each Workspace can have one owner. By default, this will be the account owner for Scale and Business subscriptions. Ownership can be transferred to another account.
If you downgrade your subscription and exceed the available number of seats on your new plan, all users apart from the owner will be locked out. The admin can also lock users in advance of the downgrade.
# Single Sign-On (SSO)
> An overview on how to set up SSO for your Workspace.
## Overview
SSO is currently only available for Enterprise customers, and only Workspace admins can use this
feature. To upgrade, [get in touch with our sales team](https://elevenlabs.io/contact-sales).
Single Sign-On (SSO) allows your team to log in to ElevenLabs by using your existing identity provider. This allows your team to use the same credentials they use for other services to log in to ElevenLabs.
## Guide
Click on your profile icon located at the bottom left of the dashboard, select **Workspace settings**, and then navigate to the **Security & SSO** tab.
You can choose from a variety of pre-configured identity providers, including Google, Apple, GitHub, etc. Custom organization SSO providers will only appear in this list after they have been configured, as shown in the "SSO Provider" section.
Next, you need to verify your email domain for authentication. This lets ElevenLabs know that you own the domain you are configuring for SSO. This is a security measure to prevent unauthorized access to your Workspace.
Click the **Verify domain** button and enter the domain name you want to verify. After completing this step, click on the domain pending verification. You will be prompted to add a DNS TXT record to your domain's DNS settings. Once the DNS record has been added, click on the **Verify** button.
If you want to configure your own SSO provider, select the SSO provider dropdown to select between OIDC (OpenID Connect) and SAML (Security Assertion Markup Language).
Only Service Provider (SP) initiated SSO is supported for SAML. To ease the sign in process, you can create a bookmark app in your SSO provider linking to
[https://elevenlabs.io/app/sign-in?use_sso=true](https://elevenlabs.io/app/sign-in?use_sso=true)
. You can include the user's email as an additional query parameter to pre-fill the field. For example
[https://elevenlabs.io/app/sign-in?use_sso=true&email=test@test.com](https://elevenlabs.io/app/sign-in?use_sso=true&email=test@test.com)
Once you've filled out the required fields, click the **Update SSO** button to save your changes.
Configuring a new SSO provider will log out all Workspace members currently logged in with SSO.
## FAQ
What shall I fill for Identifier (Entity ID)?
* Use Service Provider Entity Id
What shall I fill for Reply URL (Assertion Consumer Service) URL in SAML?
* Use Redirect URL
What is ACS URL?
* Same as Assertion Consumer Service URL
Which fields should I use to provide ElevenLabs?
* Use *Microsoft Entra Identifier* for IdP Entity ID
* Use *Login URL* for IdP Sign-In URL
**What to fill in on the Okta side**:
* **Audience Restriction**: This is the Service Provider Entity ID from the ElevenLabs SSO configuration page.
* **Single Sign-On URL/Recipient URL/Destination**: This is the Redirect URL from the ElevenLabs SSO configuration page.
**What to fill in on the ElevenLabs side**:
* Create the application in Okta and then fill out these fields using the results
* **Identity Provider Entity Id**: Use the SAML Issuer ID
* **Identity Provider Sign-In URL**: Use the Sign On URL from Okta
* This can generally be found in the Metadata details within the Sign On tab of the Okta application
* It will end in **/sso/saml**
* Please fill Recipient field with the value of Redirect URL.
Please ensure that `email` and `email_verified` are included in the custom attributes returned in the OIDC response. Without these, the following errors may be hit:
* *No email address was received*: Fixed by adding **email** to the response.
* *Account exists with different credentials*: Fixed by adding **email\_verified** to the response
* One known error: Inside the `` field of the SAML response, make sure `` is set to the email address of the user.
# Sharing resources
> An overview on how to share resources within a Workspace.
## Overview
If your subscription plan includes multiple seats, you can share resources with your members. Resources you
can share include: voices, conversational AI agents, studio projects and more. Check the
[Workspaces API](/docs/api-reference/workspace/share-workspace-resource) for an up-to-date list of resources you can share.
## Sharing
You can share a **resource** with a **principal**. A principal is one of the following:
* A user
* A user group
* A service account
A resource can be shared with at most 100 principals.
Service Accounts behave like individual users. They don't have access to anything in the Workspace when they are created, but they can be added to resources by resource admins.
#### Default Sharing
If you would like to share with specific principals for each new resource by default, this can be enabled in your personal settings page under **Default Sharing Preferences**.
Every new resource created after this is enabled will be automatically shared with the principals that you add here.
## Roles
When you share a resource with a principal, you can assign them a **role**. We support the following roles:
* **Viewer**: Viewers can discover the resource and its contents. They can also "use" the resource, e.g., generate TTS with a voice or listen to the audio of a studio instance.
* **Editor**: Everything a viewer can do, plus they can also edit the contents of the resource.
* **Admin**: Everything an editor can do, plus they can also delete the resource and manage sharing permissions.
When you create a resource, you have admin permissions on it. Other resource admins cannot remove your admin permissions on the resources you created.
Workspace admins have admin permissions on all resources in the workspace. This can be removed
from them only by removing their Workspace admin role.
# User groups
> An overview on how to create and manage user groups.
## Overview
Only Workspace admins can create, edit, and delete user groups.
User groups allow you to manage permissions for multiple users at once.
## Creating a user group
You can create a user group from **Workspace settings**. You can then [share resources](/docs/product-guides/administration/workspaces/sharing-resources) with the group directly.
If access to a user group is lost, access to resources shared with that group is also lost.
## Multiple groups
User groups cannot be nested, but you can add users to multiple groups. If a user is part of multiple groups, they will have the union of all the permissions of the groups they are part of.
For example, you can create a voice and grant the **Sales** and **Marketing** groups viewer and editor roles on the voice, respectively.
If a user is part of both groups, they will have editor permissions on the voice. Losing access to the **Marketing** group will downgrade the user's permissions to viewer.
## Disabling platform features
Permissions for groups can be revoked for specific product features, such as Professional Voice Cloning or Sound Effects.
To do this, you first have to remove the relevant permissions from the **Everyone** group. Afterwards, enable the permissions for each group that should have access.
# Webhooks
> Enable external integrations by receiving webhook events.
## Overview
Certain events within ElevenLabs can be configured to trigger webhooks, allowing external applications and systems to receive and process these events as they occur. Currently supported event types include:
| Event type | Description |
| -------------------------------- | -------------------------------------------------------------- |
| `post_call_transcription` | A conversational AI call has finished and analysis is complete |
| `voice_removal_notice` | A shared voice is scheduled to be removed |
| `voice_removal_notice_withdrawn` | A shared voice is no longer scheduled for removal |
| `voice_removed` | A shared voice has been removed and is no longer useable |
## Configuration
Webhooks can be created, disabled and deleted from the general settings page. For users within [Workspaces](/docs/product-guides/administration/workspaces/overview), only the workspace admins can configure the webhooks for the workspace.

After creation, the webhook can be selected to listen for events within product settings such as [Conversational AI](/docs/conversational-ai/customization/personalization/post-call-webhooks).
Webhooks can be disabled from the general settings page at any time. Webhooks that repeatedly fail are auto disabled if there are 10 or more consecutive failures and the last successful delivery was more than 7 days ago or has never been successfully delivered. Auto-disabled webhooks require re-enabling from the settings page. Webhooks can be deleted if not in use by any products.
## Integration
To integrate with webhooks, the listener should create an endpoint handler to receive the webhook event data POST requests. After validating the signature, the handler should quickly return HTTP 200 to indicate successful receipt of the webhook event, repeat failure to correctly return may result in the webhook becoming automatically disabled.
Each webhook event is dispatched only once, refer to the [API](/docs/api-reference/introduction) for methods to poll and get product specific data.
### Top-level fields
| Field | Type | Description |
| ----------------- | ------ | ------------------------ |
| `type` | string | Type of event |
| `data` | object | Data for the event |
| `event_timestamp` | string | When this event occurred |
## Example webhook payload
```json
{
"type": "post_call_transcription",
"event_timestamp": 1739537297,
"data": {
"agent_id": "xyz",
"conversation_id": "abc",
"status": "done",
"transcript": [
{
"role": "agent",
"message": "Hey there angelo. How are you?",
"tool_calls": null,
"tool_results": null,
"feedback": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null
},
{
"role": "user",
"message": "Hey, can you tell me, like, a fun fact about 11 Labs?",
"tool_calls": null,
"tool_results": null,
"feedback": null,
"time_in_call_secs": 2,
"conversation_turn_metrics": null
},
{
"role": "agent",
"message": "I do not have access to fun facts about Eleven Labs. However, I can share some general information about the company. Eleven Labs is an AI voice technology platform that specializes in voice cloning and text-to-speech...",
"tool_calls": null,
"tool_results": null,
"feedback": null,
"time_in_call_secs": 9,
"conversation_turn_metrics": {
"convai_llm_service_ttfb": {
"elapsed_time": 0.3704247010173276
},
"convai_llm_service_ttf_sentence": {
"elapsed_time": 0.5551181449554861
}
}
}
],
"metadata": {
"start_time_unix_secs": 1739537297,
"call_duration_secs": 22,
"cost": 296,
"deletion_settings": {
"deletion_time_unix_secs": 1802609320,
"deleted_logs_at_time_unix_secs": null,
"deleted_audio_at_time_unix_secs": null,
"deleted_transcript_at_time_unix_secs": null,
"delete_transcript_and_pii": true,
"delete_audio": true
},
"feedback": {
"overall_score": null,
"likes": 0,
"dislikes": 0
},
"authorization_method": "authorization_header",
"charging": {
"dev_discount": true
},
"termination_reason": ""
},
"analysis": {
"evaluation_criteria_results": {},
"data_collection_results": {},
"call_successful": "success",
"transcript_summary": "The conversation begins with the agent asking how Angelo is, but Angelo redirects the conversation by requesting a fun fact about 11 Labs. The agent acknowledges they don't have specific fun facts about Eleven Labs but offers to provide general information about the company. They briefly describe Eleven Labs as an AI voice technology platform specializing in voice cloning and text-to-speech technology. The conversation is brief and informational, with the agent adapting to the user's request despite not having the exact information asked for."
},
"conversation_initiation_client_data": {
"conversation_config_override": {
"agent": {
"prompt": null,
"first_message": null,
"language": "en"
},
"tts": {
"voice_id": null
}
},
"custom_llm_extra_body": {},
"dynamic_variables": {
"user_name": "angelo"
}
}
}
}
```
## Authentication
It is important for the listener to validate all incoming webhooks. Webhooks currently support authentication via HMAC signatures. Set up HMAC authentication by:
* Securely storing the shared secret generated upon creation of the webhook
* Verifying the ElevenLabs-Signature header in your endpoint using the shared secret
The ElevenLabs-Signature takes the following format:
```json
t=timestamp,v0=hash
```
The hash is equivalent to the hex encoded sha256 HMAC signature of `timestamp.request_body`. Both the hash and timestamp should be validated, an example is shown here:
Example python webhook handler using FastAPI:
```python
from fastapi import FastAPI, Request
import time
import hmac
from hashlib import sha256
app = FastAPI()
# Example webhook handler
@app.post("/webhook")
async def receive_message(request: Request):
payload = await request.body()
headers = request.headers.get("elevenlabs-signature")
if headers is None:
return
timestamp = headers.split(",")[0][2:]
hmac_signature = headers.split(",")[1]
# Validate timestamp
tolerance = int(time.time()) - 30 * 60
if int(timestamp) < tolerance
return
# Validate signature
full_payload_to_sign = f"{timestamp}.{payload.decode('utf-8')}"
mac = hmac.new(
key=secret.encode("utf-8"),
msg=full_payload_to_sign.encode("utf-8"),
digestmod=sha256,
)
digest = 'v0=' + mac.hexdigest()
if hmac_signature != digest:
return
# Continue processing
return {"status": "received"}
```
Example javascript webhook handler using node express framework:
```javascript
const crypto = require('crypto');
const secret = process.env.WEBHOOK_SECRET;
const bodyParser = require('body-parser');
// Ensure express js is parsing the raw body through instead of applying it's own encoding
app.use(bodyParser.raw({ type: '*/*' }));
// Example webhook handler
app.post('/webhook/elevenlabs', async (req, res) => {
const headers = req.headers['ElevenLabs-Signature'].split(',');
const timestamp = headers.find((e) => e.startsWith('t=')).substring(2);
const signature = headers.find((e) => e.startsWith('v0='));
// Validate timestamp
const reqTimestamp = timestamp * 1000;
const tolerance = Date.now() - 30 * 60 * 1000;
if (reqTimestamp < tolerance) {
res.status(403).send('Request expired');
return;
} else {
// Validate hash
const message = `${timestamp}.${req.body}`;
const digest = 'v0=' + crypto.createHmac('sha256', secret).update(message).digest('hex');
if (signature !== digest) {
res.status(401).send('Request unauthorized');
return;
}
}
// Validation passed, continue processing ...
res.status(200).send();
});
```
Example javascript webhook handler using Next.js API route:
```javascript app/api/convai-webhook/route.js
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import crypto from "crypto";
export async function GET() {
return NextResponse.json({ status: "webhook listening" }, { status: 200 });
}
export async function POST(req: NextRequest) {
const secret = process.env.ELEVENLABS_CONVAI_WEBHOOK_SECRET; // Add this to your env variables
const { event, error } = await constructWebhookEvent(req, secret);
if (error) {
return NextResponse.json({ error: error }, { status: 401 });
}
if (event.type === "post_call_transcription") {
console.log("event data", JSON.stringify(event.data, null, 2));
}
return NextResponse.json({ received: true }, { status: 200 });
}
const constructWebhookEvent = async (req: NextRequest, secret?: string) => {
const body = await req.text();
const signature_header = req.headers.get("ElevenLabs-Signature");
console.log(signature_header);
if (!signature_header) {
return { event: null, error: "Missing signature header" };
}
const headers = signature_header.split(",");
const timestamp = headers.find((e) => e.startsWith("t="))?.substring(2);
const signature = headers.find((e) => e.startsWith("v0="));
if (!timestamp || !signature) {
return { event: null, error: "Invalid signature format" };
}
// Validate timestamp
const reqTimestamp = Number(timestamp) * 1000;
const tolerance = Date.now() - 30 * 60 * 1000;
if (reqTimestamp < tolerance) {
return { event: null, error: "Request expired" };
}
// Validate hash
const message = `${timestamp}.${body}`;
if (!secret) {
return { event: null, error: "Webhook secret not configured" };
}
const digest =
"v0=" + crypto.createHmac("sha256", secret).update(message).digest("hex");
console.log({ digest, signature });
if (signature !== digest) {
return { event: null, error: "Invalid signature" };
}
const event = JSON.parse(body);
return { event, error: null };
};
```
# Productions (alpha)
> Human-edited transcripts, subtitles, dubs and audiobooks at scale.
## Overview
Productions is a service that lets you order human-edited transcripts, subtitles, dubs, and audiobooks directly on the ElevenLabs platform. A team of expert linguists and localization professionals vetted and trained by ElevenLabs works on your content and delivers you polished final assets.

## Why use Productions?
*
Quality at scale : Your audience cares – let native speakers ensure your multilingual
content looks, sounds, and feels natural.
*
Speed and cost : 5-10x cheaper than traditional LSP services and ready in days vs. weeks or
months.
*
Ease of use : No more email chains or procurement threads – get your content polished and
ready for your audiences in just a few clicks.
## Services
Click the cards below to learn more about our different Productions services:
Reviewed by native speakers for maximum accuracy
Adapted to formatting and accessibility requirements
Script translation and audio generation by localization professionals
Support for single and multi-speaker voice casting
## How it works
**Ordering a new asset**: head to the [Productions](https://elevenlabs.io/app/productions) page of your ElevenLabs account and create a new order. You may also see a *Productions* option when using the order dialog for other products like [Speech to Text](https://elevenlabs.io/app/speech-to-text) or [Dubbing](https://elevenlabs.io/app/dubbing)
**Starting from an existing asset**: you can also order human-edited versions of existing assets in your ElevenLabs account. Look for the 'Get human review' button in the top right of the editor view for this option.

Once you upload a file, select a language, and choose your style guide options, you'll see a quote with an **estimated** price for the settings you've chosen.
When you click *Continue*, the file will be analyzed and the final price will be returned.

You may see an error message that there is no capacity available for the language you're interested in. If this happens, please check back later! Productions is a new service, and additional capacity will be added as it scales up.
After reviewing the final quote, click *Checkout* and follow the dialog instructions to complete your payment.
Enterprise orders are deducted from workspace credits instead of going through our payment processor. If you have any questions or run into access issues, please contact your workspace admin or reach out to us at
[productions@elevenlabs.io](mailto:productions@elevenlabs.io)
.
Head to the [Productions](https://elevenlabs.io/app/productions) page of your ElevenLabs account and click any order to open a side panel with more details.
You'll also receive an email when your order is ready.

Open a completed Production and click the *View* button to open a read only copy. You can also download an invoice for your order by clicking the link next to *Details*.
To export your completed assets, use the export menu in the sidebar or inside the read only copy.

Productions has a folder system to help you organize your assets. Click *New folder* to create a new folder. Click *Manage* and use the *Move to Folder* option in the toolbar to nest folders inside other folders.

## Enterprise
We offer a white glove service to enterprise customers that can make volume commitments, including:
* Discounted per minute rates on each of our services
* Expedited turnaround times
* Advanced pre and post-processing services
Email us at [productions@elevenlabs.io](mailto:productions@elevenlabs.io) or [contact sales](https://elevenlabs.io/contact-sales) to learn more.
## FAQ
All Productions prices are presented to you in USD (\$) per minute of source audio. Exact prices depend on the type of asset you want to order (transcript, subtitles, dub, etc.), the source and target languages, and any custom style guide options you choose.
We **always** show you up front how much a Production will cost before asking you to confirm and complete a checkout process.
We currently support the following languages for Productions jobs, both source and target:
* English
* French
* Spanish
* German
* Italian
* Portuguese
* Hindi
We're working hard to expand our language coverage quickly and will update this list as new languages become available.
You can leave feedback on a completed production by opening it (use the *View* option in the sidebar) and clicking the *Feedback* button.
No. You can export a completed Production and make changes off platform. We plan to add support for this soon.
Yes, Productions is powered by a network of expert linguists and localization professionals
vetted and trained by ElevenLabs.
If you'd like to join our Producer network, please check the Productions openings on our [Careers page](https://elevenlabs.io/careers)
Yes, please contact us at [productions@elevenlabs.io](mailto:productions@elevenlabs.io).
# Transcripts
> Human-edited transcripts from ElevenLabs Productions
## General
Transcripts ordered from Productions are reviewed and corrected by native speakers for maximum accuracy.
We offer 2 types of human transcripts:
| **Option** | **When to use it** | **Description** |
| -------------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Non‑verbatim (”clean”)** | Podcasts, webinars, marketing, personal use | Removes filler words, stutters, audio event tags for smoother reading. Focuses on transcribing the core meaning. Most suitable for the majority of use-cases. |
| **Verbatim** | Legal, research | Attempts to capture *exactly* what is said, including all filler words, stutters and audio event tags. |
* For a more detailed breakdown of non-verbatim vs. verbatim transcription options, please see the [**Style guides**](#style-guides) section below.
* For more information about other Productions services, please see the [Overview](/docs/services/productions/overview) page.
## How it works
### Productions page
The easiest way to order a new transcript from Productions is from the [Productions](https://elevenlabs.io/app/productions) page in your ElevenLabs account.
### Speech to Text Order Dialog
You can also select the *Human Transcript* option in the [Speech to Text](/docs/capabilities/speech-to-text) order dialog.
Open an existing transcript and click the *Get human review* button to create a new Productions order for that transcript.
You will receive an email notification when your transcript is ready and see it marked as 'Done' on your Productions page.
Open a transcript on your [Productions](https://elevenlabs.io/app/productions) page and click the three dots, then the *Export* button.

Open a transcript on your [Productions](https://elevenlabs.io/app/productions) page and click the *View* icon to open the transcript viewer.

## Pricing
All prices are in USD (\$) and per minute of source audio.
| **Language** | **Non-verbatim (per minute)** | **Verbatim** (per minute) |
| ------------------- | ----------------------------- | ------------------------- |
| English | \$2.00 | \$2.60 |
| French | \$3.00 | \$3.90 |
| Spanish | \$3.00 | \$3.90 |
| German | \$3.00 | \$3.90 |
| Italian | \$3.00 | \$3.90 |
| Portuguese (Brazil) | \$3.00 | \$3.90 |
| Hindi | \$2.00 | \$2.60 |
Prices are subject to change. You will always see the final price for an order during the checkout
process.
## SLAs / Delivery Time
We aim to deliver all transcripts **within 48 hours.** If you are an enterprise interested in achieving quicker turnaround times, please contact us at [productions@elevenlabs.io](mailto:productions@elevenlabs.io).
## Style guides
When ordering a Productions transcript, you will see the option to activate 'Verbatim' mode for an extra 30% fee. Please read the breakdown below for more information about this option.
Non-verbatim transcription, also called *clean* or *intelligent verbatim*, focuses on clarity and readability. Unlike verbatim transcriptions, it removes unnecessary elements like filler words, stutters, and irrelevant sounds while preserving the speaker’s message.
This is the default option for Productions transcriptions. Unless you explicitly select 'Verbatim' mode, we will deliver a non-verbatim transcript.
What gets left out in non-verbatim transcripts:
* **Filler words and verbal tics** like “um,” “like,” “you know,” or “I mean”
* **Repetitions** including intentional and unintentional (e.g. stuttering)
* **Audio event tags,** including non-verbal sounds like \[coughing] or \[throat clearing] as well as environmental sounds like \[dog barking]
* **Slang or incorrect grammar** (e.g. ‘ain’t’ → ‘is not’)
In verbatim transcription, the goal is to capture ***everything that can be heard,***, meaning:
* All detailed verbal elements: stutters, repetitions, etc
* All non-verbal elements like human sounds (\[cough]) and environmental sounds (\[dog barking])
The following table provides a comprehensive breakdown of our non-verbatim vs. verbatim transcription services.
| **Feature** | **Verbatim Transcription** | **Verbatim Example** | **Non-Verbatim (Clean) Transcription** | **Non-Verbatim Example** |
| --------------------------- | ------------------------------------------------------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| **Filler words** | All filler words are included exactly as spoken. | "So, um, I was like, you know, maybe we should wait." | Filler words like "um," "like," "you know" are removed. | "I was thinking maybe we should wait." |
| **Stutters** | Stutters and repeated syllables are transcribed with hyphens. | "I-I-I don't know what to say." | Stutters are removed for smoother reading. | "I don't know what to say." |
| **Repetitions** | Repeated words are retained even when unintentional. | "She, she, she told me not to come." | Unintentional repetitions are removed. | "She told me not to come." |
| **False Starts** | False starts are included using double hyphens. | "I was going to—no, actually—let's wait." | False starts are removed unless they show meaningful hesitation. | "Let's wait." |
| **Interruptions** | Speaker interruptions are marked with a single hyphen. | Speaker 1: "Did you see—" Speaker 2: "Yes, I did." | Interruptions are simplified or smoothed. | Speaker 1: "Did you see it?" Speaker 2: "Yes, I did." |
| **Informal Contractions** | Informal speech is preserved as spoken. | "She was gonna go, but y'all called." | Standard grammar should be used for clarity, outside of exceptions. Please refer to your [language style guide](https://www.notion.so/Transcription-1e5506eacaa280678598cf06de67802d?pvs=21) to know which contractions to keep vs. when to resort to standard grammar. | "She was going to go, but you all called." |
| **Emphasized Words** | Elongated pronunciations are reflected with extended spelling. | "That was amaaazing!" | Standard spelling is used. | "That was amazing!" |
| **Interjections** | Interjections and vocal expressions are included. | "Ugh, this is terrible. Wow, I can't believe it!" | Only meaningful interjections are retained. | "This is terrible. Wow, I can't believe it!" |
| **Swear Words** | Swear words are fully transcribed. | "Fuck this, I'm not going." | Swear words should be fully transcribed, unless indicated otherwise. | "Fuck this, I'm not going." |
| **Pronunciation Mistakes** | Mispronounced words are corrected. | **Example (spoken):** "ecsetera" **Transcribed:** "etcetera" | Mispronounced words are corrected here as well. | **Example (spoken):** "ecsetera" **Transcribed:** "etcetera" |
| **Non-verbal human sounds** | Human non-verbal sounds like \[laughing], \[sighing], \[swallowing] are transcribed inline. | "I—\[sighs]—don't know." | Most non-verbal sounds are excluded unless they impact meaning. | "I don't know." |
| **Environmental Sounds** | Environmental sounds are described in square brackets. | "\[door slams], \[birds chirping], \[phone buzzes]" | Omit unless essential to meaning. **Include if:** 1. The sound impacts emotion or meaning 2. The sound is directly referenced by the speaker | "What was that noise? \[dog barking]" "Hang on, I hear something \[door slamming]" |
## FAQ
You can leave feedback on a completed transcript by opening it (use the *View* option in the sidebar) and clicking the *Feedback* button.
No. You can export a completed transcript and make changes off platform. We plan to add support for this soon.
# Troubleshooting
> Explore common issues and solutions.
Our models are non-deterministic, meaning outputs can vary based on inputs. While we strive to enhance predictability, some variability is inherent. This guide outlines common issues and preventive measures.
## General
If the generated voice output varies in volume or tone, it is often due to inconsistencies in the voice clone training audio.
* **Apply compression**: Compress the training audio to reduce dynamic range and ensure consistent audio. Aim for a RMS between -23 dB and -18 dB and the true peak below -3 dB.
* **Background noise**: Ensure the training audio contains only the voice you want to clone — no music, noise, or pops. Background noise, sudden bursts of energy or consistent low-frequency energy can make the AI less stable.
* **Speaker consistency**: Ensure the speaker maintains a consistent distance from the microphone and avoids whispering or shouting. Variations can lead to inconsistent volume or tonality.
* **Audio length**:
* **Instant Voice Cloning**: Use 1–2 minutes of consistent audio. Consistency in tonality, performance, accent, and quality is crucial.
* **Professional Voice Cloning**: Use at least 30 minutes, ideally 2+ hours, of consistent audio for best results.
To minimize issues, consider breaking your text into smaller segments. This approach helps maintain consistent volume and reduces degradation over longer audio generations. Utilize our Studio feature to generate several smaller audio segments simultaneously, ensuring better quality and consistency.
Refer to our guides for optimizing Instant and Professional Voice Clones for best practices and
advice.
The multilingual models may rarely mispronounce certain words, even in English. This issue appears to be somewhat arbitrary but seems to be voice and text-dependent. It occurs more frequently with certain voices and text, especially when using words that also appear in other languages.
* **Use Studio**: This feature helps minimize mispronunciation issues, which are more prevalent in longer text sections when using Speech Synthesis. While it won't completely eliminate the problem, it can help avoid it and make it easier to regenerate specific sections without redoing the entire text.
* **Properly cloned voices**: Similar to addressing inconsistency issues, using a properly cloned voice in the desired languages can help reduce mispronunciation.
* **Specify pronunciation**: When using our Studio feature, consider specifying the pronunciation of certain words, such as character names and brand names, or how acronyms should be read. For more information, refer to the Pronunciation Dictionary section of our guide to Studio.
The AI can sometimes switch languages or accents throughout a single generation, especially if that generation is longer in length. This issue is similar to the mispronunciation problem and is something we are actively working to improve.
* **Use properly cloned voices**: Using an Instant Voice Clone or a Professional Voice Clone trained on high-quality, consistent audio in the desired language can help mitigate this issue. Pairing this with the Studio feature can further enhance stability.
* **Understand voice limitations**: Default and generated voices are primarily English and may carry an English accent when used for other languages. Cloning a voice that speaks the target language with the desired accent provides the AI with better context, reducing the likelihood of language switching.
* **Language selection**: Currently, the AI determines the language based on the input text. Writing in the desired language is crucial, especially when using pre-made voices that are English-based, as they may introduce an English accent.
* **Optimal text length**: The AI tends to maintain a consistent accent over shorter text segments. For best results, keep text generations under 800-900 characters when using Text-to-Speech. The Studio workflow can help manage longer texts by breaking them into smaller, more manageable segments.
The models may mispronounce certain numbers, symbols and acronyms. For example, the numbers "1, 2, 3" might be pronounced as "one," "two," "three" in English. To ensure correct pronunciation in another language, write them out phonetically or in words as you want them to be spoken.
* **Example**: For the number "1" to be pronounced in French, write "un."
* **Symbols**: Specify how symbols should be read, e.g., "\$" as "dollar" or "euro."
* **Acronyms**: Spell out acronyms phonetically.
Corrupt speech is a rare issue where the model generates muffled or distorted audio. This occurs
unpredictably, and we have not identified a cause. If encountered, regenerate the section to
resolve the issue.
Audio quality may degrade during extended text-to-speech conversions, especially with the Multilingual v1 model. To mitigate this, break text into sections under 800 characters.
* **Voice Selection**: Some voices are more susceptible to degradation. Use high-quality samples for cloned voices to minimize artifacts.
* **Stability and Similarity**: Adjust these settings to influence voice behavior and artifact prominence. Hover over each setting for more details.
For some voices, this voice setting can lead to instability, including inconsistent speed,
mispronunciation and the addition of extra sounds. We recommend keeping this setting at 0,
especially if you find you are experiencing these issues in your generated audio.
## Studio (formerly Projects)
The import function attempts to import the file you provide to the website. Given the variability in website structures and book formatting, including images, always verify the import for accuracy.
* **Chapter images**: If a book's chapters start with an image as the first letter, the AI may not recognize the letter. Manually add the letter to each chapter.
* **Paragraph structure**: If text imports as a single long paragraph instead of following the original book's structure, it may not function correctly. Ensure the text maintains its original line breaks. If issues persist, try copying and pasting. If this fails, the text format may need conversion or rewriting.
* **Preferred format**: EPUB is the recommended file format for creating a project in Studio. A well-structured EPUB will automatically split each chapter in Studio, facilitating navigation. Ensure each chapter heading is formatted as "Heading 1" for proper recognition.
Always double-check imported content for accuracy and structure.
Occasionally, glitches or sharp breaths may occur between paragraphs. This is rare and differs
from standard Text to Speech issues. If encountered, regenerate the preceding paragraph, as the
problem often originates there.
If an issue persists after following this troubleshooting guide, please [contact our support
team](https://help.elevenlabs.io/hc/en-us/requests/new?ticket_form_id=13145996177937).
# Zero Retention Mode (Enterprise)
> Learn how to use Zero Retention Mode to protect sensitive data.
## Background
By default, we retain data, in accordance with our Privacy Policy, to enhance our services, troubleshoot issues, and ensure the security of our systems. However, for some enterprise customers, we offer a "Zero Retention Mode" option for specific products. In this Zero Retention Mode, most data in requests and responses are immediately deleted once the request is completed.
ElevenLabs has agreements in place with each third-party LLM provider which expressly prohibit such providers from training their models on customer content, whether or not Zero Retention Mode is enabled.
## What is Zero Retention Mode?
Zero Retention Mode provides an additional level of security and peace of mind for especially sensitive workflows. When enabled, logging of certain data points is restricted, including:
* TTS text input
* TTS audio output
* Voice Changer audio input
* Voice Changer audio output
* STT audio input
* STT text output
* Conversational AI all input and output
* Email associated with the account generating the input in our logs
This data is related to the processing of the request, and can only be seen by the user doing the request and the volatile memory of the process serving the request. None of this data is sent at any point to a database where data is stored long term.
## Who has access to Zero Retention Mode?
Enterprise customers can use Zero Retention Mode. It is primarily intended for use by our customers in the healthcare and banking sector, and other customers who may use our services to process sensitive information.
## When can a customer use Zero Retention Mode?
Zero Retention Mode is available to select enterprise customers. However, access to this feature may be restricted if ElevenLabs determines a customer's use case to be high risk, if an account is flagged by an automated system for additional moderation or at ElevenLabs' sole discretion. In such cases, the enterprise administrator will be promptly notified of the restriction.
## How does Zero Retention Mode work?
Zero Retention Mode only works for API requests, specifically:
* **Text to Speech**: this covers the Text-to-Speech (TTS) API, including all endpoints beginning with `/v1/text-to-speech/` and the TTS websocket connection.
* **Voice Changer**: this covers the Voice Changer API, including all endpoints starting with `/v1/speech-to-speech/`.
After setup, check the request history to verify Zero Retention Mode is enabled. If enabled, there should be no requests in the history.
Zero Retention Mode can be used by sending `enable_logging=false` with the product which supports it.
For example, in the Text to Speech API, you can set the query parameter [enable\_logging](https://elevenlabs.io/docs/api-reference/text-to-speech#parameter-enable-logging) to a `false` value:
```python title="Python" {12}
from elevenlabs import ElevenLabs
elevenlabs = ElevenLabs(
api_key="YOUR_API_KEY",
)
response = elevenlabs.text_to_speech.convert(
voice_id=voice_id,
output_format="mp3_22050_32",
text=text,
model_id="eleven_turbo_v2",
enable_logging=False,
)
```
```javascript title="JavaScript" {9}
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
const elevenlabs = new ElevenLabsClient({ apiKey: 'YOUR_API_KEY' });
await elevenlabs.textToSpeech.convert(voiceId, {
outputFormat: 'mp3_44100_128',
text: text,
modelId: 'eleven_turbo_v2',
enableLogging: false,
});
```
```bash title="cURL"
curl --request POST \
--url 'https://api.elevenlabs.io/v1/text-to-speech/{voice_id}?enable_logging=false' \
--header 'Content-Type: application/json'
```
## What products are configured for Zero Retention Mode?
| Product | Type | Default Retention | Eligible for zero Retention |
| -------------------------- | -------------------- | ----------------- | --------------------------- |
| Text to Speech | Text Input | Enabled | Yes |
| | Audio Output | Enabled | Yes |
| Voice Changer | Audio Input | Enabled | Yes |
| | Audio Output | Enabled | Yes |
| Speech to Text | Audio Input | Enabled | Yes |
| | Text Output | Enabled | Yes |
| Instant Voice Cloning | Audio Samples | Enabled | No |
| Professional Voice Cloning | Audio Samples | Enabled | No |
| Dubbing | Audio/Video Input | Enabled | No |
| | Audio Output | Enabled | No |
| Projects | Text Input | Enabled | No |
| | Audio Output | Enabled | No |
| Conv AI | All Input and Output | Enabled | Yes |
For Conversational AI, Gemini and Claude LLMs can be used in Zero Retention Mode.
## FAQ
Troubleshooting and support for Zero Retention Mode is limited. Because of the configuration, we
will not be able to diagnose issues with TTS/STS generations. Debugging will be more difficult
as a result.
Customers by default have history preservation enabled. All customers can use the API to delete
generations at any time. This action will immediately remove the corresponding audio and text
from our database; however, debugging and moderation logs may still retain data related to the
generation.
For any retained data, we regularly back up such data to prevent data loss in the event of any
unexpected incidents. Following data deletion, database items are retained in backups for up to
30 days After this period, the data expires and is not recoverable.
All data is deleted from our systems permanently when you delete your account. This includes all
data associated with your account, such as API keys, request history, and any other data stored
in your account. We also take commercially reasonable efforts to delete debugging data related
to your account.
# Conversational AI overview
> Deploy customized, conversational voice agents in minutes.
## What is Conversational AI?
ElevenLabs [Conversational AI](https://elevenlabs.io/conversational-ai) is a platform for deploying customized, conversational voice agents. Built in response to our customers' needs, our platform eliminates months of development time typically spent building conversation stacks from scratch. It combines these building blocks:
Our fine tuned ASR model that transcribes the caller's dialogue.
Choose from Gemini, Claude, OpenAI and more, or bring your own.
Our low latency, human-like TTS across 5k+ voices and 31 languages.
Our custom turn taking model that understands when to speak, like a human would.
Altogether it is a highly composable AI Voice agent solution that can scale to thousands of calls per day. With [server](/docs/conversational-ai/customization/tools/server-tools) & [client side](/docs/conversational-ai/customization/tools/client-tools) tools, [knowledge](/docs/conversational-ai/customization/knowledge-base) bases, [dynamic](/docs/conversational-ai/customization/personalization/dynamic-variables) agent instantiation and [overrides](/docs/conversational-ai/customization/personalization/overrides), plus built-in monitoring, it's the complete developer toolkit.
15 minutes to get started on the free plan. Get 13,750 minutes included on the Business plan at
\$0.08 per minute on the Business plan, with extra minutes billed at \$0.08, as well as
significantly discounted pricing at higher volumes.
**Setup & Prompt Testing**: billed at half the cost.
Usage is billed to the account that created the agent. If authentication is not enabled, anybody
with your agent's id can connect to it and consume your credits. To protect against this, either
enable authentication for your agent or handle the agent id as a secret.
## Pricing tiers
| Tier | Price | Minutes included | Cost per extra minute |
| -------- | ------- | ---------------- | ---------------------------------- |
| Free | \$0 | 15 | Unavailable |
| Starter | \$5 | 50 | Unavailable |
| Creator | \$22 | 250 | \~\$0.12 |
| Pro | \$99 | 1100 | \~\$0.11 |
| Scale | \$330 | 3,600 | \~\$0.10 |
| Business | \$1,320 | 13,750 | \$0.08 (annual), \$0.096 (monthly) |
| Tier | Price | Credits included | Cost in credits per extra minute |
| -------- | ------- | ---------------- | -------------------------------- |
| Free | \$0 | 10,000 | Unavailable |
| Starter | \$5 | 30,000 | Unavailable |
| Creator | \$22 | 100,000 | 400 |
| Pro | \$99 | 500,000 | 454 |
| Scale | \$330 | 2,000,000 | 555 |
| Business | \$1,320 | 11,000,000 | 800 |
In multimodal text + voice mode, text message pricing per message. LLM costs are passed through separately, see here for estimates of [LLM cost](/docs/conversational-ai/customization/llm#supported-llms).
| Plan | Price per text message |
| ---------- | ---------------------- |
| Free | 0.4 cents |
| Starter | 0.4 cents |
| Creator | 0.3 cents |
| Pro | 0.3 cents |
| Scale | 0.3 cents |
| Business | 0.3 cents |
| Enterprise | Custom pricing |
### Pricing during silent periods
When a conversation is silent for longer than ten seconds, ElevenLabs reduces the inference of the turn-taking model and speech-to-text services until voice activity is detected again. This optimization means that extended periods of silence are charged at 5% of the usual per-minute cost.
This reduction in cost:
* Only applies to the period of silence.
* Does not apply after voice activity is detected again.
* Can be triggered at multiple times in the same conversation.
## Models
Currently, the following models are natively supported and can be configured via the agent settings:
| Provider | Model |
| ------------- | --------------------- |
| **Google** | Gemini 2.5 Flash |
| | Gemini 2.0 Flash |
| | Gemini 2.0 Flash Lite |
| | Gemini 1.5 Flash |
| | Gemini 1.5 Pro |
| **OpenAI** | GPT-4.1 |
| | GPT-4.1 Mini |
| | GPT-4.1 Nano |
| | GPT-4o |
| | GPT-4o Mini |
| | GPT-4 Turbo |
| | GPT-4 |
| | GPT-3.5 Turbo |
| **Anthropic** | Claude Sonnet 4 |
| | Claude 3.5 Sonnet |
| | Claude 3.5 Sonnet v1 |
| | Claude 3.7 Sonnet |
| | Claude 3.0 Haiku |
Using your own Custom LLM is also supported by specifying the endpoint we should make requests to and providing credentials through our secure secret storage.

You can start with our [free tier](https://elevenlabs.io/app/sign-up), which includes 15 minutes of conversation per month.
Need more? Upgrade to a [paid plan](https://elevenlabs.io/pricing/api) instantly - no sales calls required. For enterprise usage (6+ hours of daily conversation), [contact our sales team](https://elevenlabs.io/contact-sales) for custom pricing tailored to your needs.
## Popular applications
Companies and creators use our Conversational AI orchestration platform to create:
* **Customer service**: Assistants trained on company documentation that can handle customer queries, troubleshoot issues, and provide 24/7 support in multiple languages.
* **Virtual assistants**: Assistants trained to manage scheduling, set reminders, look up information, and help users stay organized throughout their day.
* **Retail support**: Assistants that help customers find products, provide personalized recommendations, track orders, and answer product-specific questions.
* **Personalized learning**: Assistants that help students learn new topics & enhance reading comprehension by speaking with books and [articles](https://elevenlabs.io/blog/time-brings-conversational-ai-to-journalism).
* **Multi-character storytelling**: Interactive narratives with distinct voices for different characters, powered by our new [multi-voice support](/docs/conversational-ai/customization/voice/multi-voice-support) feature.
Ready to get started? Check out our [quickstart guide](/docs/conversational-ai/quickstart) to
create your first AI agent in minutes.
## FAQ
Plan limits
Your subscription plan determines how many calls can be made simultaneously.
| Plan | Concurrency limit |
| ---------- | ----------------- |
| Free | 4 |
| Starter | 6 |
| Creator | 10 |
| Pro | 20 |
| Scale | 30 |
| Business | 30 |
| Enterprise | Elevated |
To increase your concurrency limit [upgrade your subscription plan](https://elevenlabs.io/pricing/api)
or [contact sales](https://elevenlabs.io/contact-sales) to discuss enterprise plans.
The following audio output formats are supported in the Conversational AI platform:
* PCM (8 kHz / 16 kHz / 22.05 kHz / 24 kHz / 44.1 kHz)
* μ-law 8000Hz
# Conversational AI dashboard
> Monitor and analyze your agents' performance effortlessly.
## Overview
The Agents Dashboard provides real-time insights into your Conversational AI agents. It displays performance metrics over customizable time periods. You can review data for individual agents or across your entire workspace.
## Analytics
You can monitor activity over various daily, weekly, and monthly time periods.
The dashboard can be toggled to show different metrics, including: number of calls, average duration, total cost, and average cost.
## Language Breakdown
A key benefit of Conversational AI is the ability to support multiple languages.
The Language Breakdown section shows the percentage of calls (overall, or per-agent) in each language.
## Active Calls
At the top left of the dashboard, the current number of active calls is displayed. This real-time counter reflects ongoing sessions for your workspace's agents, and is also accessible via the API.
# Tools
> Enhance Conversational AI agents with custom functionalities and external integrations.
## Overview
Tools allow Conversational AI agents to perform actions beyond generating text responses.
They enable agents to interact with external systems, execute custom logic, or access specific functionalities during a conversation.
This allows for richer, more capable interactions tailored to specific use cases.
ElevenLabs Conversational AI supports the following kinds of tools:
Tools executed directly on the client-side application (e.g., web browser, mobile app).
Custom tools executed on your server-side infrastructure via API calls.
Built-in tools provided by the platform for common actions.
# Client tools
> Empower your assistant to trigger client-side operations.
**Client tools** enable your assistant to execute client-side functions. Unlike [server-side tools](/docs/conversational-ai/customization/tools), client tools allow the assistant to perform actions such as triggering browser events, running client-side functions, or sending notifications to a UI.
## Overview
Applications may require assistants to interact directly with the user's environment. Client-side tools give your assistant the ability to perform client-side operations.
Here are a few examples where client tools can be useful:
* **Triggering UI events**: Allow an assistant to trigger browser events, such as alerts, modals or notifications.
* **Interacting with the DOM**: Enable an assistant to manipulate the Document Object Model (DOM) for dynamic content updates or to guide users through complex interfaces.
To perform operations server-side, use
[server-tools](/docs/conversational-ai/customization/tools/server-tools) instead.
## Guide
### Prerequisites
* An [ElevenLabs account](https://elevenlabs.io)
* A configured ElevenLabs Conversational Agent ([create one here](https://elevenlabs.io/app/conversational-ai))
Navigate to your agent dashboard. In the **Tools** section, click **Add Tool**. Ensure the **Tool Type** is set to **Client**. Then configure the following:
| Setting | Parameter |
| ----------- | ---------------------------------------------------------------- |
| Name | logMessage |
| Description | Use this client-side tool to log a message to the user's client. |
Then create a new parameter `message` with the following configuration:
| Setting | Parameter |
| ----------- | ---------------------------------------------------------------------------------- |
| Data Type | String |
| Identifier | message |
| Required | true |
| Description | The message to log in the console. Ensure the message is informative and relevant. |

Unlike server-side tools, client tools need to be registered in your code.
Use the following code to register the client tool:
```python title="Python" focus={4-16}
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation, ClientTools
def log_message(parameters):
message = parameters.get("message")
print(message)
client_tools = ClientTools()
client_tools.register("logMessage", log_message)
conversation = Conversation(
client=ElevenLabs(api_key="your-api-key"),
agent_id="your-agent-id",
client_tools=client_tools,
# ...
)
conversation.start_session()
```
```javascript title="JavaScript" focus={2-10}
// ...
const conversation = await Conversation.startSession({
// ...
clientTools: {
logMessage: async ({message}) => {
console.log(message);
}
},
// ...
});
```
```swift title="Swift" focus={2-10}
// ...
var clientTools = ElevenLabsSDK.ClientTools()
clientTools.register("logMessage") { parameters async throws -> String? in
guard let message = parameters["message"] as? String else {
throw ElevenLabsSDK.ClientToolError.invalidParameters
}
print(message)
return message
}
```
The tool and parameter names in the agent configuration are case-sensitive and **must** match those registered in your code.
Initiate a conversation with your agent and say something like:
> *Log a message to the console that says Hello World*
You should see a `Hello World` log appear in your console.
Now that you've set up a basic client-side event, you can:
* Explore more complex client tools like opening modals, navigating to pages, or interacting with the DOM.
* Combine client tools with server-side webhooks for full-stack interactions.
* Use client tools to enhance user engagement and provide real-time feedback during conversations.
### Passing client tool results to the conversation context
When you want your agent to receive data back from a client tool, ensure that you tick the **Wait for response** option in the tool configuration.
Once the client tool is added, when the function is called the agent will wait for its response and append the response to the conversation context.
```python title="Python"
def get_customer_details():
# Fetch customer details (e.g., from an API or database)
customer_data = {
"id": 123,
"name": "Alice",
"subscription": "Pro"
}
# Return the customer data; it can also be a JSON string if needed.
return customer_data
client_tools = ClientTools()
client_tools.register("getCustomerDetails", get_customer_details)
conversation = Conversation(
client=ElevenLabs(api_key="your-api-key"),
agent_id="your-agent-id",
client_tools=client_tools,
# ...
)
conversation.start_session()
```
```javascript title="JavaScript"
const clientTools = {
getCustomerDetails: async () => {
// Fetch customer details (e.g., from an API)
const customerData = {
id: 123,
name: "Alice",
subscription: "Pro"
};
// Return data directly to the agent.
return customerData;
}
};
// Start the conversation with client tools configured.
const conversation = await Conversation.startSession({ clientTools });
```
In this example, when the agent calls **getCustomerDetails**, the function will execute on the client and the agent will receive the returned data, which is then used as part of the conversation context.
### Troubleshooting
* Ensure the tool and parameter names in the agent configuration match those registered in your code.
* View the conversation transcript in the agent dashboard to verify the tool is being executed.
* Open the browser console to check for any errors.
* Ensure that your code has necessary error handling for undefined or unexpected parameters.
## Best practices
Name tools intuitively, with detailed descriptions
If you find the assistant does not make calls to the correct tools, you may need to update your tool names and descriptions so the assistant more clearly understands when it should select each tool. Avoid using abbreviations or acronyms to shorten tool and argument names.
You can also include detailed descriptions for when a tool should be called. For complex tools, you should include descriptions for each of the arguments to help the assistant know what it needs to ask the user to collect that argument.
Name tool parameters intuitively, with detailed descriptions
Use clear and descriptive names for tool parameters. If applicable, specify the expected format for a parameter in the description (e.g., YYYY-mm-dd or dd/mm/yy for a date).
Consider providing additional information about how and when to call tools in your assistant's
system prompt
Providing clear instructions in your system prompt can significantly improve the assistant's tool calling accuracy. For example, guide the assistant with instructions like the following:
```plaintext
Use `check_order_status` when the user inquires about the status of their order, such as 'Where is my order?' or 'Has my order shipped yet?'.
```
Provide context for complex scenarios. For example:
```plaintext
Before scheduling a meeting with `schedule_meeting`, check the user's calendar for availability using check_availability to avoid conflicts.
```
LLM selection
When using tools, we recommend picking high intelligence models like GPT-4o mini or Claude 3.5
Sonnet and avoiding Gemini 1.5 Flash.
It's important to note that the choice of LLM matters to the success of function calls. Some LLMs can struggle with extracting the relevant parameters from the conversation.
# Server tools
> Connect your assistant to external data & systems.
**Tools** enable your assistant to connect to external data and systems. You can define a set of tools that the assistant has access to, and the assistant will use them where appropriate based on the conversation.
## Overview
Many applications require assistants to call external APIs to get real-time information. Tools give your assistant the ability to make external function calls to third party apps so you can get real-time information.
Here are a few examples where tools can be useful:
* **Fetching data**: enable an assistant to retrieve real-time data from any REST-enabled database or 3rd party integration before responding to the user.
* **Taking action**: allow an assistant to trigger authenticated actions based on the conversation, like scheduling meetings or initiating order returns.
To interact with Application UIs or trigger client-side events use [client
tools](/docs/conversational-ai/customization/tools/client-tools) instead.
## Tool configuration
Conversational AI assistants can be equipped with tools to interact with external APIs. Unlike traditional requests, the assistant generates query, body, and path parameters dynamically based on the conversation and parameter descriptions you provide.
All tool configurations and parameter descriptions help the assistant determine **when** and **how** to use these tools. To orchestrate tool usage effectively, update the assistant’s system prompt to specify the sequence and logic for making these calls. This includes:
* **Which tool** to use and under what conditions.
* **What parameters** the tool needs to function properly.
* **How to handle** the responses.
Define a high-level `Name` and `Description` to describe the tool's purpose. This helps the LLM understand the tool and know when to call it.
If the API requires path parameters, include variables in the URL path by wrapping them in curly
braces `{}`, for example: `/api/resource/{id}` where `id` is a path parameter.

Assistant secrets can be used to add authentication headers to requests.

Specify any headers that need to be included in the request.

Include variables in the URL path by wrapping them in curly braces `{}`:
* **Example**: `/api/resource/{id}` where `id` is a path parameter.

Specify any body parameters to be included in the request.

Specify any query parameters to be included in the request.

## Guide
In this guide, we'll create a weather assistant that can provide real-time weather information for any location. The assistant will use its geographic knowledge to convert location names into coordinates and fetch accurate weather data.
First, on the **Agent** section of your agent settings page, choose **Add Tool**. Select **Webhook** as the Tool Type, then configure the weather API integration:
| Field | Value |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Name | get\_weather |
| Description | Gets the current weather forecast for a location |
| Method | GET |
| URL | [https://api.open-meteo.com/v1/forecast?latitude=\{latitude}\&longitude=\{longitude}\¤t=temperature\_2m,wind\_speed\_10m\&hourly=temperature\_2m,relative\_humidity\_2m,wind\_speed\_10m](https://api.open-meteo.com/v1/forecast?latitude=\{latitude}\&longitude=\{longitude}\¤t=temperature_2m,wind_speed_10m\&hourly=temperature_2m,relative_humidity_2m,wind_speed_10m) |
| Data Type | Identifier | Value Type | Description |
| --------- | ---------- | ---------- | --------------------------------------------------- |
| string | latitude | LLM Prompt | The latitude coordinate for the requested location |
| string | longitude | LLM Prompt | The longitude coordinate for the requested location |
An API key is not required for this tool. If one is required, this should be passed in the headers and stored as a secret.
Configure your assistant to handle weather queries intelligently with this system prompt:
```plaintext System prompt
You are a helpful conversational AI assistant with access to a weather tool. When users ask about
weather conditions, use the get_weather tool to fetch accurate, real-time data. The tool requires
a latitude and longitude - use your geographic knowledge to convert location names to coordinates
accurately.
Never ask users for coordinates - you must determine these yourself. Always report weather
information conversationally, referring to locations by name only. For weather requests:
1. Extract the location from the user's message
2. Convert the location to coordinates and call get_weather
3. Present the information naturally and helpfully
For non-weather queries, provide friendly assistance within your knowledge boundaries. Always be
concise, accurate, and helpful.
First message: "Hey, how can I help you today?"
```
Test your assistant by asking about the weather in different locations. The assistant should
handle specific locations ("What's the weather in Tokyo?") and ask for clarification after general queries ("How's
the weather looking today?").
## Best practices
Name tools intuitively, with detailed descriptions
If you find the assistant does not make calls to the correct tools, you may need to update your tool names and descriptions so the assistant more clearly understands when it should select each tool. Avoid using abbreviations or acronyms to shorten tool and argument names.
You can also include detailed descriptions for when a tool should be called. For complex tools, you should include descriptions for each of the arguments to help the assistant know what it needs to ask the user to collect that argument.
Name tool parameters intuitively, with detailed descriptions
Use clear and descriptive names for tool parameters. If applicable, specify the expected format for a parameter in the description (e.g., YYYY-mm-dd or dd/mm/yy for a date).
Consider providing additional information about how and when to call tools in your assistant's
system prompt
Providing clear instructions in your system prompt can significantly improve the assistant's tool calling accuracy. For example, guide the assistant with instructions like the following:
```plaintext
Use `check_order_status` when the user inquires about the status of their order, such as 'Where is my order?' or 'Has my order shipped yet?'.
```
Provide context for complex scenarios. For example:
```plaintext
Before scheduling a meeting with `schedule_meeting`, check the user's calendar for availability using check_availability to avoid conflicts.
```
LLM selection
When using tools, we recommend picking high intelligence models like GPT-4o mini or Claude 3.5
Sonnet and avoiding Gemini 1.5 Flash.
It's important to note that the choice of LLM matters to the success of function calls. Some LLMs can struggle with extracting the relevant parameters from the conversation.
# Agent tools deprecation
> Migrate from legacy `prompt.tools` to the new `prompt.tool_ids` field.
## Overview
The way you wire tools into your ConvAI agents is getting a refresh.
### What's changing?
* The old request field `body.conversation_config.agent.prompt.tools` is **deprecated**.
* Use `body.conversation_config.agent.prompt.tool_ids` to list the IDs of the client or server tools your agent should use.
* **New field** `prompt.built_in_tools` is introduced for **system tools** (e.g., `end_call`, `language_detection`). These tools are referenced by **name**, not by ID.
### Critical deadlines
**July 14, 2025** - Last day for full backwards compatibility. You can continue using
`prompt.tools` until this date.
**July 15, 2025** - GET endpoints will stop returning the `tools` field. Only `prompt.tool_ids`
will be included in responses.
**July 23, 2025** - Legacy `prompt.tools` field will be permanently removed. All requests
containing this field will be rejected.
## Why the change?
Decoupling tools from agents brings several advantages:
* **Re-use** – the same tool can be shared across multiple agents.
* **Simpler audits** – inspect, update or delete a tool in one place.
* **Cleaner payloads** – agent configurations stay lightweight.
## What has already happened?
Good news — we've already migrated your data! Every tool that previously lived in `prompt.tools`
now exists as a standalone record, and its ID is present in the agent's `prompt.tool_ids` array.
No scripts required.
We have **automatically migrated all existing data**:
* Every tool that was previously in an agent's `prompt.tools` array now exists as a standalone record.
* The agent's `prompt.tool_ids` array already references those new tool records.
No one-off scripts are required — your agents continue to work unchanged.
## Deprecation timeline
| Date | Status | Behaviour |
| ----------------- | ------------------------ | -------------------------------------------------------------------------------- |
| **July 14, 2025** | ✅ Full compatibility | You may keep sending `prompt.tools`. GET responses include the `tools` field. |
| **July 15, 2025** | ⚠️ Partial compatibility | GET endpoints stop returning the `tools` field. Only `prompt.tool_ids` included. |
| **July 23, 2025** | ❌ No compatibility | POST and PATCH endpoints **reject** any request containing `prompt.tools`. |
## Toolbox endpoint
All tool management lives under a dedicated endpoint:
```http title="Tool management"
POST | GET | PATCH | DELETE https://api.elevenlabs.io/v1/convai/tools
```
Use it to:
* **Create** a tool and obtain its ID.
* **Update** it when requirements change.
* **Delete** it when it is no longer needed.
Anything that once sat in the old `tools` array now belongs here.
## Migration guide
System tools are **not** supported in `prompt.tool_ids`. Instead, specify them in the **new**
`prompt.built_in_tools` field.
If you are still using the legacy field, follow the steps below.
### 1. Stop sending `prompt.tools`
Remove the `tools` array from your agent configuration.
### 2. Send the tool IDs instead
Replace it with `prompt.tool_ids`, containing the IDs of the client or server tools the agent
should use.
### 3. (Optional) Clean up
After 23 July, delete any unused standalone tools via the toolbox endpoint.
## Example payloads
A request must include **either** `prompt.tool_ids` **or** the legacy `prompt.tools` array —
**never both**. Sending both fields results in an error.
```json title="Legacy format (deprecated)"
{
"conversation_config": {
"agent": {
"prompt": {
"tools": [
{
"type": "client",
"name": "open_url",
"description": "Open a provided URL in the user's browser."
},
{
"type": "system",
"name": "end_call",
"description": "",
"response_timeout_secs": 20,
"params": {
"system_tool_type": "end_call"
}
}
]
}
}
}
}
```
```json title="New format (recommended) – client tool via ID + system tool"
{
"conversation_config": {
"agent": {
"prompt": {
"tool_ids": ["tool_123456789abcdef0"],
"built_in_tools": {
"end_call": {
"name": "end_call",
"description": "",
"response_timeout_secs": 20,
"type": "system",
"params": {
"system_tool_type": "end_call"
}
},
"language_detection": null,
"transfer_to_agent": null,
"transfer_to_number": null,
"skip_turn": null
}
}
}
}
}
```
## FAQ
No. Until July 23, the API will silently migrate any `prompt.tools` array you send. However,
starting July 15, GET and PATCH responses will no longer include full tool objects. After July
23, any POST/PATCH requests containing `prompt.tools` will be rejected.
No. A request must use **either** `prompt.tool_ids` **or** `prompt.tools` — never both.
List your tools via `GET /v1/convai/tools` or inspect the response when you create one.
{' '}
# System tools
> Update the internal state of conversations without external requests.
**System tools** enable your assistant to update the internal state of a conversation. Unlike [server tools](/docs/conversational-ai/customization/tools/server-tools) or [client tools](/docs/conversational-ai/customization/tools/client-tools), system tools don't make external API calls or trigger client-side functions—they modify the internal state of the conversation without making external calls.
## Overview
Some applications require agents to control the flow or state of a conversation.
System tools provide this capability by allowing the assistant to perform actions related to the state of the call that don't require communicating with external servers or the client.
### Available system tools
Let your agent automatically terminate a conversation when appropriate conditions are met.
Enable your agent to automatically switch to the user's language during conversations.
Seamlessly transfer conversations between AI agents based on defined conditions.
Seamlessly transfer the user to a human operator.
Enable the agent to skip their turns if the LLM detects the agent should not speak yet.
## Implementation
When creating an agent via API, you can add system tools to your agent configuration. Here's how to implement both the end call and language detection tools:
## Custom LLM integration
When using a custom LLM with ElevenLabs agents, system tools are exposed as function definitions that your LLM can call. Each system tool has specific parameters and trigger conditions:
### Available system tools
## Custom LLM integration
**Purpose**: Automatically terminate conversations when appropriate conditions are met.
**Trigger conditions**: The LLM should call this tool when:
* The main task has been completed and user is satisfied
* The conversation reached natural conclusion with mutual agreement
* The user explicitly indicates they want to end the conversation
**Parameters**:
* `reason` (string, required): The reason for ending the call
* `message` (string, optional): A farewell message to send to the user before ending the call
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "end_call",
"arguments": "{\"reason\": \"Task completed successfully\", \"message\": \"Thank you for using our service. Have a great day!\"}"
}
}
```
**Implementation**: Configure as a system tool in your agent settings. The LLM will receive detailed instructions about when to call this function.
Learn more: [End call tool](/docs/conversational-ai/customization/tools/end-call)
## Custom LLM integration
**Purpose**: Automatically switch to the user's detected language during conversations.
**Trigger conditions**: The LLM should call this tool when:
* User speaks in a different language than the current conversation language
* User explicitly requests to switch languages
* Multi-language support is needed for the conversation
**Parameters**:
* `reason` (string, required): The reason for the language switch
* `language` (string, required): The language code to switch to (must be in supported languages list)
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "language_detection",
"arguments": "{\"reason\": \"User requested Spanish\", \"language\": \"es\"}"
}
}
```
**Implementation**: Configure supported languages in agent settings and add the language detection system tool. The agent will automatically switch voice and responses to match detected languages.
Learn more: [Language detection tool](/docs/conversational-ai/customization/tools/language-detection)
## Custom LLM integration
**Purpose**: Transfer conversations between specialized AI agents based on user needs.
**Trigger conditions**: The LLM should call this tool when:
* User request requires specialized knowledge or different agent capabilities
* Current agent cannot adequately handle the query
* Conversation flow indicates need for different agent type
**Parameters**:
* `reason` (string, optional): The reason for the agent transfer
* `agent_number` (integer, required): Zero-indexed number of the agent to transfer to (based on configured transfer rules)
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "transfer_to_agent",
"arguments": "{\"reason\": \"User needs billing support\", \"agent_number\": 0}"
}
}
```
**Implementation**: Define transfer rules mapping conditions to specific agent IDs. Configure which agents the current agent can transfer to. Agents are referenced by zero-indexed numbers in the transfer configuration.
Learn more: [Agent transfer tool](/docs/conversational-ai/customization/tools/agent-transfer)
## Custom LLM integration
**Purpose**: Seamlessly hand off conversations to human operators when AI assistance is insufficient.
**Trigger conditions**: The LLM should call this tool when:
* Complex issues requiring human judgment
* User explicitly requests human assistance
* AI reaches limits of capability for the specific request
* Escalation protocols are triggered
**Parameters**:
* `reason` (string, optional): The reason for the transfer
* `transfer_number` (string, required): The phone number to transfer to (must match configured numbers)
* `client_message` (string, required): Message read to the client while waiting for transfer
* `agent_message` (string, required): Message for the human operator receiving the call
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "transfer_to_number",
"arguments": "{\"reason\": \"Complex billing issue\", \"transfer_number\": \"+15551234567\", \"client_message\": \"I'm transferring you to a billing specialist who can help with your account.\", \"agent_message\": \"Customer has a complex billing dispute about order #12345 from last month.\"}"
}
}
```
**Implementation**: Configure transfer phone numbers and conditions. Define messages for both customer and receiving human operator. Works with both Twilio and SIP trunking.
Learn more: [Transfer to human tool](/docs/conversational-ai/customization/tools/human-transfer)
## Custom LLM integration
**Purpose**: Allow the agent to pause and wait for user input without speaking.
**Trigger conditions**: The LLM should call this tool when:
* User indicates they need a moment ("Give me a second", "Let me think")
* User requests pause in conversation flow
* Agent detects user needs time to process information
**Parameters**:
* `reason` (string, optional): Free-form reason explaining why the pause is needed
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "skip_turn",
"arguments": "{\"reason\": \"User requested time to think\"}"
}
}
```
**Implementation**: No additional configuration needed. The tool simply signals the agent to remain silent until the user speaks again.
Learn more: [Skip turn tool](/docs/conversational-ai/customization/tools/skip-turn)
```python
from elevenlabs import (
ConversationalConfig,
ElevenLabs,
AgentConfig,
PromptAgent,
PromptAgentInputToolsItem_System,
)
# Initialize the client
elevenlabs = ElevenLabs(api_key="YOUR_API_KEY")
# Create system tools
end_call_tool = PromptAgentInputToolsItem_System(
name="end_call",
description="" # Optional: Customize when the tool should be triggered
)
language_detection_tool = PromptAgentInputToolsItem_System(
name="language_detection",
description="" # Optional: Customize when the tool should be triggered
)
# Create the agent configuration with both tools
conversation_config = ConversationalConfig(
agent=AgentConfig(
prompt=PromptAgent(
tools=[end_call_tool, language_detection_tool]
)
)
)
# Create the agent
response = elevenlabs.conversational_ai.agents.create(
conversation_config=conversation_config
)
```
```javascript
import { ElevenLabs } from '@elevenlabs/elevenlabs-js';
// Initialize the client
const elevenlabs = new ElevenLabs({
apiKey: 'YOUR_API_KEY',
});
// Create the agent with system tools
await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
prompt: {
tools: [
{
type: 'system',
name: 'end_call',
description: '',
},
{
type: 'system',
name: 'language_detection',
description: '',
},
],
},
},
},
});
```
## FAQ
Yes, system tools can be used alongside server tools and client tools in the same assistant.
This allows for comprehensive functionality that combines internal state management with
external interactions.
```
```
# End call
> Let your agent automatically hang up on the user.
The **End Call** tool is added to agents created in the ElevenLabs dashboard by default. For
agents created via API or SDK, if you would like to enable the End Call tool, you must add it
manually as a system tool in your agent configuration. [See API Implementation
below](#api-implementation) for details.

## Overview
The **End Call** tool allows your conversational agent to terminate a call with the user. This is a system tool that provides flexibility in how and when calls are ended.
## Functionality
* **Default behavior**: The tool can operate without any user-defined prompts, ending the call when the conversation naturally concludes.
* **Custom prompts**: Users can specify conditions under which the call should end. For example:
* End the call if the user says "goodbye."
* Conclude the call when a specific task is completed.
## Custom LLM integration
**Purpose**: Automatically terminate conversations when appropriate conditions are met.
**Trigger conditions**: The LLM should call this tool when:
* The main task has been completed and user is satisfied
* The conversation reached natural conclusion with mutual agreement
* The user explicitly indicates they want to end the conversation
**Parameters**:
* `reason` (string, required): The reason for ending the call
* `message` (string, optional): A farewell message to send to the user before ending the call
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "end_call",
"arguments": "{\"reason\": \"Task completed successfully\", \"message\": \"Thank you for using our service. Have a great day!\"}"
}
}
```
**Implementation**: Configure as a system tool in your agent settings. The LLM will receive detailed instructions about when to call this function.
### API Implementation
When creating an agent via API, you can add the End Call tool to your agent configuration. It should be defined as a system tool:
```python
from elevenlabs import (
ConversationalConfig,
ElevenLabs,
AgentConfig,
PromptAgent,
PromptAgentInputToolsItem_System
)
# Initialize the client
elevenlabs = ElevenLabs(api_key="YOUR_API_KEY")
# Create the end call tool
end_call_tool = PromptAgentInputToolsItem_System(
name="end_call",
description="" # Optional: Customize when the tool should be triggered
)
# Create the agent configuration
conversation_config = ConversationalConfig(
agent=AgentConfig(
prompt=PromptAgent(
tools=[end_call_tool]
)
)
)
# Create the agent
response = elevenlabs.conversational_ai.agents.create(
conversation_config=conversation_config
)
```
```javascript
import { ElevenLabs } from '@elevenlabs/elevenlabs-js';
// Initialize the client
const elevenlabs = new ElevenLabs({
apiKey: 'YOUR_API_KEY',
});
// Create the agent with end call tool
await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
prompt: {
tools: [
{
type: 'system',
name: 'end_call',
description: '', // Optional: Customize when the tool should be triggered
},
],
},
},
},
});
```
```bash
curl -X POST https://api.elevenlabs.io/v1/convai/agents/create \
-H "xi-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"conversation_config": {
"agent": {
"prompt": {
"tools": [
{
"type": "system",
"name": "end_call",
"description": ""
}
]
}
}
}
}'
```
Leave the description blank to use the default end call prompt.
## Example prompts
**Example 1: Basic End Call**
```
End the call when the user says goodbye, thank you, or indicates they have no more questions.
```
**Example 2: End Call with Custom Prompt**
```
End the call when the user says goodbye, thank you, or indicates they have no more questions. You can only end the call after all their questions have been answered. Please end the call only after confirming that the user doesn't need any additional assistance.
```
# Language detection
> Let your agent automatically switch to the language
## Overview
The `language detection` system tool allows your Conversational AI agent to switch its output language to any the agent supports.
This system tool is not enabled automatically. Its description can be customized to accommodate your specific use case.
Where possible, we recommend enabling all languages for an agent and enabling the language
detection system tool.
Our language detection tool triggers language switching in two cases, both based on the received audio's detected language and content:
* `detection` if a user speaks a different language than the current output language, a switch will be triggered
* `content` if the user asks in the current language to change to a new language, a switch will be triggered
## Custom LLM integration
**Purpose**: Automatically switch to the user's detected language during conversations.
**Trigger conditions**: The LLM should call this tool when:
* User speaks in a different language than the current conversation language
* User explicitly requests to switch languages
* Multi-language support is needed for the conversation
**Parameters**:
* `reason` (string, required): The reason for the language switch
* `language` (string, required): The language code to switch to (must be in supported languages list)
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "language_detection",
"arguments": "{\"reason\": \"User requested Spanish\", \"language\": \"es\"}"
}
}
```
**Implementation**: Configure supported languages in agent settings and add the language detection system tool. The agent will automatically switch voice and responses to match detected languages.
## Enabling language detection
The languages that the agent can switch to must be defined in the `Agent` settings tab.

Enable language detection by selecting the pre-configured system tool to your agent's tools in the `Agent` tab.
This is automatically available as an option when selecting `add tool`.

Add a description that specifies when to call the tool

### API Implementation
When creating an agent via API, you can add the `language detection` tool to your agent configuration. It should be defined as a system tool:
```python
from elevenlabs import (
ConversationalConfig,
ElevenLabs,
AgentConfig,
PromptAgent,
PromptAgentInputToolsItem_System,
LanguagePresetInput,
ConversationConfigClientOverrideInput,
AgentConfigOverride,
)
# Initialize the client
elevenlabs = ElevenLabs(api_key="YOUR_API_KEY")
# Create the language detection tool
language_detection_tool = PromptAgentInputToolsItem_System(
name="language_detection",
description="" # Optional: Customize when the tool should be triggered
)
# Create language presets
language_presets = {
"nl": LanguagePresetInput(
overrides=ConversationConfigClientOverrideInput(
agent=AgentConfigOverride(
prompt=None,
first_message="Hoi, hoe gaat het met je?",
language=None
),
tts=None
),
first_message_translation=None
),
"fi": LanguagePresetInput(
overrides=ConversationConfigClientOverrideInput(
agent=AgentConfigOverride(
first_message="Hei, kuinka voit?",
),
tts=None
),
),
"tr": LanguagePresetInput(
overrides=ConversationConfigClientOverrideInput(
agent=AgentConfigOverride(
prompt=None,
first_message="Merhaba, nasılsın?",
language=None
),
tts=None
),
),
"ru": LanguagePresetInput(
overrides=ConversationConfigClientOverrideInput(
agent=AgentConfigOverride(
prompt=None,
first_message="Привет, как ты?",
language=None
),
tts=None
),
),
"pt": LanguagePresetInput(
overrides=ConversationConfigClientOverrideInput(
agent=AgentConfigOverride(
prompt=None,
first_message="Oi, como você está?",
language=None
),
tts=None
),
)
}
# Create the agent configuration
conversation_config = ConversationalConfig(
agent=AgentConfig(
prompt=PromptAgent(
tools=[language_detection_tool],
first_message="Hi how are you?"
)
),
language_presets=language_presets
)
# Create the agent
response = elevenlabs.conversational_ai.agents.create(
conversation_config=conversation_config
)
```
```javascript
import { ElevenLabs } from '@elevenlabs/elevenlabs-js';
// Initialize the client
const elevenlabs = new ElevenLabs({
apiKey: 'YOUR_API_KEY',
});
// Create the agent with language detection tool
await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
prompt: {
tools: [
{
type: 'system',
name: 'language_detection',
description: '', // Optional: Customize when the tool should be triggered
},
],
firstMessage: 'Hi, how are you?',
},
},
languagePresets: {
nl: {
overrides: {
agent: {
prompt: null,
firstMessage: 'Hoi, hoe gaat het met je?',
language: null,
},
tts: null,
},
},
fi: {
overrides: {
agent: {
prompt: null,
firstMessage: 'Hei, kuinka voit?',
language: null,
},
tts: null,
},
firstMessageTranslation: {
sourceHash: '{"firstMessage":"Hi how are you?","language":"en"}',
text: 'Hei, kuinka voit?',
},
},
tr: {
overrides: {
agent: {
prompt: null,
firstMessage: 'Merhaba, nasılsın?',
language: null,
},
tts: null,
},
},
ru: {
overrides: {
agent: {
prompt: null,
firstMessage: 'Привет, как ты?',
language: null,
},
tts: null,
},
},
pt: {
overrides: {
agent: {
prompt: null,
firstMessage: 'Oi, como você está?',
language: null,
},
tts: null,
},
},
ar: {
overrides: {
agent: {
prompt: null,
firstMessage: 'مرحبًا كيف حالك؟',
language: null,
},
tts: null,
},
},
},
},
});
```
```bash
curl -X POST https://api.elevenlabs.io/v1/convai/agents/create \
-H "xi-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"conversation_config": {
"agent": {
"prompt": {
"first_message": "Hi how are you?",
"tools": [
{
"type": "system",
"name": "language_detection",
"description": ""
}
]
}
},
"language_presets": {
"nl": {
"overrides": {
"agent": {
"prompt": null,
"first_message": "Hoi, hoe gaat het met je?",
"language": null
},
"tts": null
}
},
"fi": {
"overrides": {
"agent": {
"prompt": null,
"first_message": "Hei, kuinka voit?",
"language": null
},
"tts": null
}
},
"tr": {
"overrides": {
"agent": {
"prompt": null,
"first_message": "Merhaba, nasılsın?",
"language": null
},
"tts": null
}
},
"ru": {
"overrides": {
"agent": {
"prompt": null,
"first_message": "Привет, как ты?",
"language": null
},
"tts": null
}
},
"pt": {
"overrides": {
"agent": {
"prompt": null,
"first_message": "Oi, como você está?",
"language": null
},
"tts": null
}
},
"ar": {
"overrides": {
"agent": {
"prompt": null,
"first_message": "مرحبًا كيف حالك؟",
"language": null
},
"tts": null
}
}
}
}
}'
```
Leave the description blank to use the default language detection prompt.
# Agent transfer
> Seamlessly transfer the user between Conversational AI agents based on defined conditions.
## Overview
Agent-agent transfer allows a Conversational AI agent to hand off the ongoing conversation to another designated agent when specific conditions are met. This enables the creation of sophisticated, multi-layered conversational workflows where different agents handle specific tasks or levels of complexity.
For example, an initial agent (Orchestrator) could handle general inquiries and then transfer the call to a specialized agent based on the conversation's context. Transfers can also be nested:
```text
Orchestrator Agent (Initial Qualification)
│
├───> Agent 1 (e.g., Availability Inquiries)
│
├───> Agent 2 (e.g., Technical Support)
│ │
│ └───> Agent 2a (e.g., Hardware Support)
│
└───> Agent 3 (e.g., Billing Issues)
```
We recommend using the `gpt-4o` or `gpt-4o-mini` models when using agent-agent transfers due to better tool calling.
## Custom LLM integration
**Purpose**: Transfer conversations between specialized AI agents based on user needs.
**Trigger conditions**: The LLM should call this tool when:
* User request requires specialized knowledge or different agent capabilities
* Current agent cannot adequately handle the query
* Conversation flow indicates need for different agent type
**Parameters**:
* `reason` (string, optional): The reason for the agent transfer
* `agent_number` (integer, required): Zero-indexed number of the agent to transfer to (based on configured transfer rules)
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "transfer_to_agent",
"arguments": "{\"reason\": \"User needs billing support\", \"agent_number\": 0}"
}
}
```
**Implementation**: Define transfer rules mapping conditions to specific agent IDs. Configure which agents the current agent can transfer to. Agents are referenced by zero-indexed numbers in the transfer configuration.
## Enabling agent transfer
Agent transfer is configured using the `transfer_to_agent` system tool.
Enable agent transfer by selecting the `transfer_to_agent` system tool in your agent's configuration within the `Agent` tab. Choose "Transfer to AI Agent" when adding a tool.
You can provide a custom description to guide the LLM on when to trigger a transfer. If left blank, a default description encompassing the defined transfer rules will be used.
Configure the specific rules for transferring to other agents. For each rule, specify:
* **Agent**: The target agent to transfer the conversation to.
* **Condition**: A natural language description of the circumstances under which the transfer should occur (e.g., "User asks about billing details", "User requests technical support for product X").
The LLM will use these conditions, along with the tool description, to decide when and to which agent (by number) to transfer.
Ensure that the user account creating the agent has at least viewer permissions for any target agents specified in the transfer rules.
## API Implementation
You can configure the `transfer_to_agent` system tool when creating or updating an agent via the API.
```python
from elevenlabs import (
ConversationalConfig,
ElevenLabs,
AgentConfig,
PromptAgent,
PromptAgentInputToolsItem_System,
SystemToolConfigInputParams_TransferToAgent,
AgentTransfer
)
# Initialize the client
elevenlabs = ElevenLabs(api_key="YOUR_API_KEY")
# Define transfer rules
transfer_rules = [
AgentTransfer(agent_id="AGENT_ID_1", condition="When the user asks for billing support."),
AgentTransfer(agent_id="AGENT_ID_2", condition="When the user requests advanced technical help.")
]
# Create the transfer tool configuration
transfer_tool = PromptAgentInputToolsItem_System(
type="system",
name="transfer_to_agent",
description="Transfer the user to a specialized agent based on their request.", # Optional custom description
params=SystemToolConfigInputParams_TransferToAgent(
transfers=transfer_rules
)
)
# Create the agent configuration
conversation_config = ConversationalConfig(
agent=AgentConfig(
prompt=PromptAgent(
prompt="You are a helpful assistant.",
first_message="Hi, how can I help you today?",
tools=[transfer_tool],
)
)
)
# Create the agent
response = elevenlabs.conversational_ai.agents.create(
conversation_config=conversation_config
)
print(response)
```
```javascript
import { ElevenLabs } from '@elevenlabs/elevenlabs-js';
// Initialize the client
const elevenlabs = new ElevenLabs({
apiKey: 'YOUR_API_KEY',
});
// Define transfer rules
const transferRules = [
{ agentId: 'AGENT_ID_1', condition: 'When the user asks for billing support.' },
{ agentId: 'AGENT_ID_2', condition: 'When the user requests advanced technical help.' },
];
// Create the agent with the transfer tool
await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
prompt: {
prompt: 'You are a helpful assistant.',
firstMessage: 'Hi, how can I help you today?',
tools: [
{
type: 'system',
name: 'transfer_to_agent',
description: 'Transfer the user to a specialized agent based on their request.', // Optional custom description
params: {
systemToolType: 'transfer_to_agent',
transfers: transferRules,
},
},
],
},
},
},
});
```
# Transfer to human
> Seamlessly transfer the user to a human operator via phone number based on defined conditions.
## Overview
Human transfer allows a Conversational AI agent to transfer the ongoing call to a specified phone number when certain conditions are met. This enables agents to hand off complex issues, specific requests, or situations requiring human intervention to a live operator.
This feature utilizes the `transfer_to_number` system tool which supports transfers via Twilio and SIP trunk numbers. When triggered, the agent can provide a message to the user while they wait and a separate message summarizing the situation for the human operator receiving the call.
## Custom LLM integration
**Purpose**: Seamlessly hand off conversations to human operators when AI assistance is insufficient.
**Trigger conditions**: The LLM should call this tool when:
* Complex issues requiring human judgment
* User explicitly requests human assistance
* AI reaches limits of capability for the specific request
* Escalation protocols are triggered
**Parameters**:
* `reason` (string, optional): The reason for the transfer
* `transfer_number` (string, required): The phone number to transfer to (must match configured numbers)
* `client_message` (string, required): Message read to the client while waiting for transfer
* `agent_message` (string, required): Message for the human operator receiving the call
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "transfer_to_number",
"arguments": "{\"reason\": \"Complex billing issue\", \"transfer_number\": \"+15551234567\", \"client_message\": \"I'm transferring you to a billing specialist who can help with your account.\", \"agent_message\": \"Customer has a complex billing dispute about order #12345 from last month.\"}"
}
}
```
**Implementation**: Configure transfer phone numbers and conditions. Define messages for both customer and receiving human operator. Works with both Twilio and SIP trunking.
## Numbers that can be transferred to
Currently only [SIP trunking](/docs/conversational-ai/phone-numbers/sip-trunking) phone numbers support transferring to external numbers.
[Twilio phone numbers](/docs/conversational-ai/phone-numbers/twilio-integration/native-integration) currently can only transfer to phone numbers hosted on Twilio. If this is needed we recommended using Twilio numbers via [Twilio elastic SIP trunking](https://www.twilio.com/docs/sip-trunking) and our SIP trunking support, rather than via the native integration.
## Enabling human transfer
Human transfer is configured using the `transfer_to_number` system tool.
Enable human transfer by selecting the `transfer_to_number` system tool in your agent's configuration within the `Agent` tab. Choose "Transfer to Human" when adding a tool.
{/* Placeholder for image showing adding the 'Transfer to Human' tool */}
You can provide a custom description to guide the LLM on when to trigger a transfer. If left blank, a default description encompassing the defined transfer rules will be used.
{/* Placeholder for image showing the tool description field */}
Configure the specific rules for transferring to phone numbers. For each rule, specify:
* **Phone Number**: The target phone number in E.164 format (e.g., +12125551234) to transfer the call to.
* **Condition**: A natural language description of the circumstances under which the transfer should occur (e.g., "User explicitly requests to speak to a human", "User needs to update sensitive account information").
The LLM will use these conditions, along with the tool description, to decide when and to which phone number to transfer.
{/* Placeholder for image showing transfer rules configuration */}
Ensure the phone number is correctly formatted (E.164) and associated with a properly configured Twilio account capable of receiving calls.
## API Implementation
You can configure the `transfer_to_number` system tool when creating or updating an agent via the API. The tool allows specifying messages for both the client (user being transferred) and the agent (human operator receiving the call).
```python
from elevenlabs import (
ConversationalConfig,
ElevenLabs,
AgentConfig,
PromptAgent,
PromptAgentInputToolsItem_System,
SystemToolConfigInputParams_TransferToNumber,
PhoneNumberTransfer,
)
# Initialize the client
elevenlabs = ElevenLabs(api_key="YOUR_API_KEY")
# Define transfer rules
transfer_rules = [
PhoneNumberTransfer(phone_number="+15551234567", condition="When the user asks for billing support."),
PhoneNumberTransfer(phone_number="+15559876543", condition="When the user requests to file a formal complaint.")
]
# Create the transfer tool configuration
transfer_tool = PromptAgentInputToolsItem_System(
type="system",
name="transfer_to_human",
description="Transfer the user to a specialized agent based on their request.", # Optional custom description
params=SystemToolConfigInputParams_TransferToNumber(
transfers=transfer_rules
)
)
# Create the agent configuration
conversation_config = ConversationalConfig(
agent=AgentConfig(
prompt=PromptAgent(
prompt="You are a helpful assistant.",
first_message="Hi, how can I help you today?",
tools=[transfer_tool],
)
)
)
# Create the agent
response = elevenlabs.conversational_ai.agents.create(
conversation_config=conversation_config
)
# Note: When the LLM decides to call this tool, it needs to provide:
# - transfer_number: The phone number to transfer to (must match one defined in rules).
# - client_message: Message read to the user during transfer.
# - agent_message: Message read to the human operator receiving the call.
```
```javascript
import { ElevenLabs } from '@elevenlabs/elevenlabs-js';
// Initialize the client
const elevenlabs = new ElevenLabs({
apiKey: 'YOUR_API_KEY',
});
// Define transfer rules
const transferRules = [
{ phoneNumber: '+15551234567', condition: 'When the user asks for billing support.' },
{ phoneNumber: '+15559876543', condition: 'When the user requests to file a formal complaint.' },
];
// Create the agent with the transfer tool
await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
prompt: {
prompt: 'You are a helpful assistant.',
firstMessage: 'Hi, how can I help you today?',
tools: [
{
type: 'system',
name: 'transfer_to_number',
description: 'Transfer the user to a human operator based on their request.', // Optional custom description
params: {
systemToolType: 'transfer_to_number',
transfers: transferRules,
},
},
],
},
},
},
});
// Note: When the LLM decides to call this tool, it needs to provide:
// - transfer_number: The phone number to transfer to (must match one defined in rules).
// - client_message: Message read to the user during transfer.
// - agent_message: Message read to the human operator receiving the call.
```
# Skip turn
> Allow your agent to pause and wait for the user to speak next.
## Overview
The **Skip Turn** tool allows your conversational agent to explicitly pause and wait for the user to speak or act before continuing. This system tool is useful when the user indicates they need a moment, for example, by saying "Give me a second," "Let me think," or "One moment please."
## Functionality
* **User-Initiated Pause**: The tool is designed to be invoked by the LLM when it detects that the user needs a brief pause without interruption.
* **No Verbal Response**: After this tool is called, the assistant will not speak. It waits for the user to re-engage or for another turn-taking condition to be met.
* **Seamless Conversation Flow**: It helps maintain a natural conversational rhythm by respecting the user's need for a short break without ending the interaction or the agent speaking unnecessarily.
## Custom LLM integration
**Purpose**: Allow the agent to pause and wait for user input without speaking.
**Trigger conditions**: The LLM should call this tool when:
* User indicates they need a moment ("Give me a second", "Let me think")
* User requests pause in conversation flow
* Agent detects user needs time to process information
**Parameters**:
* `reason` (string, optional): Free-form reason explaining why the pause is needed
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "skip_turn",
"arguments": "{\"reason\": \"User requested time to think\"}"
}
}
```
**Implementation**: No additional configuration needed. The tool simply signals the agent to remain silent until the user speaks again.
### API implementation
When creating an agent via API, you can add the Skip Turn tool to your agent configuration. It should be defined as a system tool, with the name `skip_turn`.
```python
from elevenlabs import (
ConversationalConfig,
ElevenLabs,
AgentConfig,
PromptAgent,
PromptAgentInputToolsItem_System
)
# Initialize the client
elevenlabs = ElevenLabs(api_key="YOUR_API_KEY")
# Create the skip turn tool
skip_turn_tool = PromptAgentInputToolsItem_System(
name="skip_turn",
description="" # Optional: Customize when the tool should be triggered, or leave blank for default.
)
# Create the agent configuration
conversation_config = ConversationalConfig(
agent=AgentConfig(
prompt=PromptAgent(
tools=[skip_turn_tool]
)
)
)
# Create the agent
response = elevenlabs.conversational_ai.agents.create(
conversation_config=conversation_config
)
```
```javascript
import { ElevenLabs } from '@elevenlabs/elevenlabs-js';
// Initialize the client
const elevenlabs = new ElevenLabs({
apiKey: 'YOUR_API_KEY',
});
// Create the agent with skip turn tool
await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
prompt: {
tools: [
{
type: 'system',
name: 'skip_turn',
description: '', // Optional: Customize when the tool should be triggered, or leave blank for default.
},
],
},
},
},
});
```
## UI configuration
You can also configure the Skip Turn tool directly within the Agent's UI, in the tools section.
### Step 1: Add a new tool
Navigate to your agent's configuration page. In the "Tools" section, click on "Add tool", the `Skip Turn` option will already be available.
### Step 2: Configure the tool
You can optionally provide a description to customize when the LLM should trigger this tool, or leave it blank to use the default behavior.
### Step 3: Enable the tool
Once configured, the `Skip Turn` tool will appear in your agent's list of enabled tools and the agent will be able to skip turns. .
# Events
> Understand real-time communication events exchanged between client and server in conversational AI.
## Overview
Events are the foundation of real-time communication in conversational AI applications using WebSockets.
They facilitate the exchange of information like audio streams, transcriptions, agent responses, and contextual updates between the client application and the server infrastructure.
Understanding these events is crucial for building responsive and interactive conversational experiences.
Events are broken down into two categories:
Events sent from the server to the client, delivering audio, transcripts, agent messages, and
system signals.
Events sent from the client to the server, providing contextual updates or responding to server
requests.
# Client events
> Understand and handle real-time events received by the client during conversational applications.
**Client events** are system-level events sent from the server to the client that facilitate real-time communication. These events deliver audio, transcription, agent responses, and other critical information to the client application.
For information on events you can send from the client to the server, see the [Client-to-server
events](/docs/conversational-ai/customization/events/client-to-server-events) documentation.
## Overview
Client events are essential for maintaining the real-time nature of conversations. They provide everything from initialization metadata to processed audio and agent responses.
These events are part of the WebSocket communication protocol and are automatically handled by our
SDKs. Understanding them is crucial for advanced implementations and debugging.
## Client event types
* Automatically sent when starting a conversation
* Initializes conversation settings and parameters
```javascript
// Example initialization metadata
{
"type": "conversation_initiation_metadata",
"conversation_initiation_metadata_event": {
"conversation_id": "conv_123",
"agent_output_audio_format": "pcm_44100", // TTS output format
"user_input_audio_format": "pcm_16000" // ASR input format
}
}
```
* Health check event requiring immediate response
* Automatically handled by SDK
* Used to maintain WebSocket connection
```javascript
// Example ping event structure
{
"ping_event": {
"event_id": 123456,
"ping_ms": 50 // Optional, estimated latency in milliseconds
},
"type": "ping"
}
```
```javascript
// Example ping handler
websocket.on('ping', () => {
websocket.send('pong');
});
```
* Contains base64 encoded audio for playback
* Includes numeric event ID for tracking and sequencing
* Handles voice output streaming
```javascript
// Example audio event structure
{
"audio_event": {
"audio_base_64": "base64_encoded_audio_string",
"event_id": 12345
},
"type": "audio"
}
```
```javascript
// Example audio event handler
websocket.on('audio', (event) => {
const { audio_event } = event;
const { audio_base_64, event_id } = audio_event;
audioPlayer.play(audio_base_64);
});
```
* Contains finalized speech-to-text results
* Represents complete user utterances
* Used for conversation history
```javascript
// Example transcript event structure
{
"type": "user_transcript",
"user_transcription_event": {
"user_transcript": "Hello, how can you help me today?"
}
}
```
```javascript
// Example transcript handler
websocket.on('user_transcript', (event) => {
const { user_transcription_event } = event;
const { user_transcript } = user_transcription_event;
updateConversationHistory(user_transcript);
});
```
* Contains complete agent message
* Sent with first audio chunk
* Used for display and history
```javascript
// Example response event structure
{
"type": "agent_response",
"agent_response_event": {
"agent_response": "Hello, how can I assist you today?"
}
}
```
```javascript
// Example response handler
websocket.on('agent_response', (event) => {
const { agent_response_event } = event;
const { agent_response } = agent_response_event;
displayAgentMessage(agent_response);
});
```
* Contains truncated response after interruption
* Updates displayed message
* Maintains conversation accuracy
```javascript
// Example response correction event structure
{
"type": "agent_response_correction",
"agent_response_correction_event": {
"original_agent_response": "Let me tell you about the complete history...",
"corrected_agent_response": "Let me tell you about..." // Truncated after interruption
}
}
```
```javascript
// Example response correction handler
websocket.on('agent_response_correction', (event) => {
const { agent_response_correction_event } = event;
const { corrected_agent_response } = agent_response_correction_event;
displayAgentMessage(corrected_agent_response);
});
```
* Represents a function call the agent wants the client to execute
* Contains tool name, tool call ID, and parameters
* Requires client-side execution of the function and sending the result back to the server
If you are using the SDK, callbacks are provided to handle sending the result back to the server.
```javascript
// Example tool call event structure
{
"type": "client_tool_call",
"client_tool_call": {
"tool_name": "search_database",
"tool_call_id": "call_123456",
"parameters": {
"query": "user information",
"filters": {
"date": "2024-01-01"
}
}
}
}
```
```javascript
// Example tool call handler
websocket.on('client_tool_call', async (event) => {
const { client_tool_call } = event;
const { tool_name, tool_call_id, parameters } = client_tool_call;
try {
const result = await executeClientTool(tool_name, parameters);
// Send success response back to continue conversation
websocket.send({
type: "client_tool_result",
tool_call_id: tool_call_id,
result: result,
is_error: false
});
} catch (error) {
// Send error response if tool execution fails
websocket.send({
type: "client_tool_result",
tool_call_id: tool_call_id,
result: error.message,
is_error: true
});
}
});
```
* Indicates when the agent has executed a tool function
* Contains tool metadata and execution status
* Provides visibility into agent tool usage during conversations
```javascript
// Example agent tool response event structure
{
"type": "agent_tool_response",
"agent_tool_response": {
"tool_name": "skip_turn",
"tool_call_id": "skip_turn_c82ca55355c840bab193effb9a7e8101",
"tool_type": "system",
"is_error": false
}
}
```
```javascript
// Example agent tool response handler
websocket.on('agent_tool_response', (event) => {
const { agent_tool_response } = event;
const { tool_name, tool_call_id, tool_type, is_error } = agent_tool_response;
if (is_error) {
console.error(`Agent tool ${tool_name} failed:`, tool_call_id);
} else {
console.log(`Agent executed ${tool_type} tool: ${tool_name}`);
}
});
```
* Voice Activity Detection score event
* Indicates the probability that the user is speaking
* Values range from 0 to 1, where higher values indicate higher confidence of speech
```javascript
// Example VAD score event
{
"type": "vad_score",
"vad_score_event": {
"vad_score": 0.95
}
}
```
## Event flow
Here's a typical sequence of events during a conversation:
```mermaid
sequenceDiagram
participant Client
participant Server
Server->>Client: conversation_initiation_metadata
Note over Client,Server: Connection established
Server->>Client: ping
Client->>Server: pong
Server->>Client: audio
Note over Client: Playing audio
Note over Client: User responds
Server->>Client: user_transcript
Server->>Client: agent_response
Server->>Client: audio
Server->>Client: client_tool_call
Note over Client: Client tool runs
Client->>Server: client_tool_result
Server->>Client: agent_response
Server->>Client: audio
Note over Client: Playing audio
Note over Client: Interruption detected
Server->>Client: agent_response_correction
```
### Best practices
1. **Error handling**
* Implement proper error handling for each event type
* Log important events for debugging
* Handle connection interruptions gracefully
2. **Audio management**
* Buffer audio chunks appropriately
* Implement proper cleanup on interruption
* Handle audio resource management
3. **Connection management**
* Respond to PING events promptly
* Implement reconnection logic
* Monitor connection health
## Troubleshooting
* Ensure proper WebSocket connection
* Check PING/PONG responses
* Verify API credentials
* Check audio chunk handling
* Verify audio format compatibility
* Monitor memory usage
* Log all events for debugging
* Implement error boundaries
* Check event handler registration
For detailed implementation examples, check our [SDK
documentation](/docs/conversational-ai/libraries/python).
# Client to server events
> Send contextual information from the client to enhance conversational applications in real-time.
**Client-to-server events** are messages that your application proactively sends to the server to provide additional context during conversations. These events enable you to enhance the conversation with relevant information without interrupting the conversational flow.
For information on events the server sends to the client, see the [Client
events](/docs/conversational-ai/customization/events/client-events) documentation.
## Overview
Your application can send contextual information to the server to improve conversation quality and relevance at any point during the conversation. This does not have to be in response to a client event received from the server. This is particularly useful for sharing UI state, user actions, or other environmental data that may not be directly communicated through voice.
While our SDKs provide helper methods for sending these events, understanding the underlying
protocol is valuable for custom implementations and advanced use cases.
## Event types
### Contextual updates
Contextual updates allow your application to send non-interrupting background information to the conversation.
**Key characteristics:**
* Updates are incorporated as background information in the conversation.
* Does not interrupt the current conversation flow.
* Useful for sending UI state, user actions, or environmental data.
```javascript
// Contextual update event structure
{
"type": "contextual_update",
"text": "User appears to be looking at pricing page"
}
```
```javascript
// Example sending contextual updates
function sendContextUpdate(information) {
websocket.send(
JSON.stringify({
type: 'contextual_update',
text: information,
})
);
}
// Usage examples
sendContextUpdate('Customer status: Premium tier');
sendContextUpdate('User navigated to Help section');
sendContextUpdate('Shopping cart contains 3 items');
```
### User messages
User messages allow you to send text directly to the conversation as if the user had spoken it. This is useful for text-based interactions or when you want to inject specific text into the conversation flow.
**Key characteristics:**
* Text is processed as user input to the conversation.
* Triggers the same response flow as spoken user input.
* Useful for text-based interfaces or programmatic user input.
```javascript
// User message event structure
{
"type": "user_message",
"text": "I would like to upgrade my account"
}
```
```javascript
// Example sending user messages
function sendUserMessage(text) {
websocket.send(
JSON.stringify({
type: 'user_message',
text: text,
})
);
}
// Usage examples
sendUserMessage('I need help with billing');
sendUserMessage('What are your pricing options?');
sendUserMessage('Cancel my subscription');
```
### User activity
User activity events serve as indicators to prevent interrupts from the agent.
**Key characteristics:**
* Resets the turn timeout timer.
* Does not affect conversation content or flow.
* Useful for maintaining long-running conversations during periods of silence.
```javascript
// User activity event structure
{
"type": "user_activity"
}
```
```javascript
// Example sending user activity
function sendUserActivity() {
websocket.send(
JSON.stringify({
type: 'user_activity',
})
);
}
// Usage example - send activity ping every 30 seconds
setInterval(sendUserActivity, 30000);
```
## Best practices
1. **Contextual updates**
* Send relevant but concise contextual information.
* Avoid overwhelming the LLM with too many updates.
* Focus on information that impacts the conversation flow or is important context from activity in a UI not accessible to the voice agent.
2. **User messages**
* Use for text-based user input when audio is not available or appropriate.
* Ensure text content is clear and well-formatted.
* Consider the conversation context when injecting programmatic messages.
3. **User activity**
* Send activity pings during periods of user interaction to maintain session.
* Use reasonable intervals (e.g., 30-60 seconds) to avoid unnecessary network traffic.
* Implement activity detection based on actual user engagement (mouse movement, typing, etc.).
4. **Timing considerations**
* Send updates at appropriate moments.
* Consider grouping multiple contextual updates into a single update (instead of sending every small change separately).
* Balance between keeping the session alive and avoiding excessive messaging.
For detailed implementation examples, check our [SDK
documentation](/docs/conversational-ai/libraries/python).
# Knowledge base
> Enhance your conversational agent with custom knowledge.
**Knowledge bases** allow you to equip your agent with relevant, domain-specific information.
## Overview
A well-curated knowledge base helps your agent go beyond its pre-trained data and deliver context-aware answers.
Here are a few examples where knowledge bases can be useful:
* **Product catalogs**: Store product specifications, pricing, and other essential details.
* **HR or corporate policies**: Provide quick answers about vacation policies, employee benefits, or onboarding procedures.
* **Technical documentation**: Equip your agent with in-depth guides or API references to assist developers.
* **Customer FAQs**: Answer common inquiries consistently.
The agent on this page is configured with full knowledge of ElevenLabs' documentation and sitemap. Go ahead and ask it about anything about ElevenLabs.
## Usage
Files, URLs, and text can be added to the knowledge base in the dashboard. They can also be added programmatically through our [API](https://elevenlabs.io/docs/api-reference).
Upload files in formats like PDF, TXT, DOCX, HTML, and EPUB.

Import URLs from sources like documentation and product pages.

When creating a knowledge base item from a URL, we do not currently support scraping all pages
linked to from the initial URL, or continuously updating the knowledge base over time.
However, these features are coming soon.
Ensure you have permission to use the content from the URLs you provide
Manually add text to the knowledge base.

## Best practices
Content quality
Provide clear, well-structured information that's relevant to your agent's purpose.
Size management
Break large documents into smaller, focused pieces for better processing.
Regular updates
Regularly review and update the agent's knowledge base to ensure the information remains current and accurate.
Identify knowledge gaps
Review conversation transcripts to identify popular topics, queries and areas where users struggle to find information. Note any knowledge gaps and add the missing context to the knowledge base.
## Enterprise features
Non-enterprise accounts have a maximum of 20MB or 300k characters.
Need higher limits? [Contact our sales team](https://elevenlabs.io/contact-sales) to discuss
enterprise plans with expanded knowledge base capabilities.
# Knowledge base dashboard
> Learn how to manage and organize your knowledge base through the ElevenLabs dashboard
## Overview
The [knowledge base dashboard](https://elevenlabs.io/app/conversational-ai/knowledge-base) provides a centralized way to manage documents and track their usage across your AI agents. This guide explains how to navigate and use the knowledge base dashboard effectively.

## Adding existing documents to agents
When configuring an agent's knowledge base, you can easily add existing documents to an agent.
1. Navigate to the agent's [configuration](https://elevenlabs.io/app/conversational-ai/)
2. Click "Add document" in the knowledge base section of the "Agent" tab.
3. The option to select from your existing knowledge base documents or upload a new document will appear.

Documents can be reused across multiple agents, making it efficient to maintain consistent
knowledge across your workspace.
## Document dependencies
Each document in your knowledge base includes a "Agents" tab that shows which agents currently depend on that document.

It is not possible to delete a document if any agent depends on it.
# Retrieval-Augmented Generation
> Enhance your agent with large knowledge bases using RAG.
## Overview
**Retrieval-Augmented Generation (RAG)** enables your agent to access and use large knowledge bases during conversations. Instead of loading entire documents into the context window, RAG retrieves only the most relevant information for each user query, allowing your agent to:
* Access much larger knowledge bases than would fit in a prompt
* Provide more accurate, knowledge-grounded responses
* Reduce hallucinations by referencing source material
* Scale knowledge without creating multiple specialized agents
RAG is ideal for agents that need to reference large documents, technical manuals, or extensive
knowledge bases that would exceed the context window limits of traditional prompting.
RAG adds on slight latency to the response time of your agent, around 500ms.
## How RAG works
When RAG is enabled, your agent processes user queries through these steps:
1. **Query processing**: The user's question is analyzed and reformulated for optimal retrieval.
2. **Embedding generation**: The processed query is converted into a vector embedding that represents the user's question.
3. **Retrieval**: The system finds the most semantically similar content from your knowledge base.
4. **Response generation**: The agent generates a response using both the conversation context and the retrieved information.
This process ensures that relevant information to the user's query is passed to the LLM to generate a factually correct answer.
When RAG is enabled, the size of knowledge base items that can be assigned to an agent is
increased from 300KB to 10MB
## Guide
### Prerequisites
* An [ElevenLabs account](https://elevenlabs.io)
* A configured ElevenLabs [Conversational Agent](/docs/conversational-ai/quickstart)
* At least one document added to your agent's knowledge base
In your agent's settings, navigate to the **Knowledge Base** section and toggle on the **Use RAG** option.
After enabling RAG, you'll see additional configuration options:
* **Embedding model**: Select the model that will convert text into vector embeddings
* **Maximum document chunks**: Set the maximum amount of retrieved content per query
* **Maximum vector distance**: Set the maximum distance between the query and the retrieved chunks
These parameters could impact latency. They also could impact LLM cost which in the future will be passed on to you.
For example, retrieving more chunks increases cost.
Increasing vector distance allows for more context to be passed, but potentially less relevant context.
This may affect quality and you should experiment with different parameters to find the best results.
Each document in your knowledge base needs to be indexed before it can be used with RAG. This
process happens automatically when a document is added to an agent with RAG enabled.
Indexing may take a few minutes for large documents. You can check the indexing status in the
knowledge base list.
For each document in your knowledge base, you can choose how it's used:
* **Auto (default)**: The document is only retrieved when relevant to the query
* **Prompt**: The document is always included in the system prompt, regardless of relevance, but can also be retrieved by RAG
Setting too many documents to "Prompt" mode may exceed context limits. Use this option sparingly
for critical information.
After saving your configuration, test your agent by asking questions related to your knowledge base. The agent should now be able to retrieve and reference specific information from your documents.
## Usage limits
To ensure fair resource allocation, ElevenLabs enforces limits on the total size of documents that can be indexed for RAG per workspace, based on subscription tier.
The limits are as follows:
| Subscription Tier | Total Document Size Limit | Notes |
| :---------------- | :------------------------ | :------------------------------------------ |
| Free | 1MB | Indexes may be deleted after inactivity. |
| Starter | 2MB | |
| Creator | 20MB | |
| Pro | 100MB | |
| Scale | 500MB | |
| Business | 1GB | |
| Enterprise | Custom | Higher limits available based on agreement. |
**Note:**
* These limits apply to the total **original file size** of documents indexed for RAG, not the internal storage size of the RAG index itself (which can be significantly larger).
* Documents smaller than 500 bytes cannot be indexed for RAG and will automatically be used in the prompt instead.
## API implementation
You can also implement RAG through the [API](/docs/api-reference/knowledge-base/compute-rag-index):
```python
from elevenlabs import ElevenLabs
import time
# Initialize the ElevenLabs client
elevenlabs = ElevenLabs(api_key="your-api-key")
# First, index a document for RAG
document_id = "your-document-id"
embedding_model = "e5_mistral_7b_instruct"
# Trigger RAG indexing
response = elevenlabs.conversational_ai.knowledge_base.document.compute_rag_index(
documentation_id=document_id,
model=embedding_model
)
# Check indexing status
while response.status not in ["SUCCEEDED", "FAILED"]:
time.sleep(5) # Wait 5 seconds before checking status again
response = elevenlabs.conversational_ai.knowledge_base.document.compute_rag_index(
documentation_id=document_id,
model=embedding_model
)
# Then update agent configuration to use RAG
agent_id = "your-agent-id"
# Get the current agent configuration
agent_config = elevenlabs.conversational_ai.agents.get(agent_id=agent_id)
# Enable RAG in the agent configuration
agent_config.agent.prompt.rag = {
"enabled": True,
"embedding_model": "e5_mistral_7b_instruct",
"max_documents_length": 10000
}
# Update document usage mode if needed
for i, doc in enumerate(agent_config.agent.prompt.knowledge_base):
if doc.id == document_id:
agent_config.agent.prompt.knowledge_base[i].usage_mode = "auto"
# Update the agent configuration
elevenlabs.conversational_ai.agents.update(
agent_id=agent_id,
conversation_config=agent_config.agent
)
```
```javascript
// First, index a document for RAG
async function enableRAG(documentId, agentId, apiKey) {
try {
// Initialize the ElevenLabs client
const { ElevenLabs } = require('elevenlabs');
const elevenlabs = new ElevenLabs({
apiKey: apiKey,
});
// Start document indexing for RAG
let response = await elevenlabs.conversationalAi.knowledgeBase.document.computeRagIndex(
documentId,
{
model: 'e5_mistral_7b_instruct',
}
);
// Check indexing status until completion
while (response.status !== 'SUCCEEDED' && response.status !== 'FAILED') {
await new Promise((resolve) => setTimeout(resolve, 5000)); // Wait 5 seconds
response = await elevenlabs.conversationalAi.knowledgeBase.document.computeRagIndex(
documentId,
{
model: 'e5_mistral_7b_instruct',
}
);
}
if (response.status === 'FAILED') {
throw new Error('RAG indexing failed');
}
// Get current agent configuration
const agentConfig = await elevenlabs.conversationalAi.agents.get(agentId);
// Enable RAG in the agent configuration
const updatedConfig = {
conversation_config: {
...agentConfig.agent,
prompt: {
...agentConfig.agent.prompt,
rag: {
enabled: true,
embedding_model: 'e5_mistral_7b_instruct',
max_documents_length: 10000,
},
},
},
};
// Update document usage mode if needed
if (agentConfig.agent.prompt.knowledge_base) {
agentConfig.agent.prompt.knowledge_base.forEach((doc, index) => {
if (doc.id === documentId) {
updatedConfig.conversation_config.prompt.knowledge_base[index].usage_mode = 'auto';
}
});
}
// Update the agent configuration
await elevenlabs.conversationalAi.agents.update(agentId, updatedConfig);
console.log('RAG configuration updated successfully');
return true;
} catch (error) {
console.error('Error configuring RAG:', error);
throw error;
}
}
// Example usage
// enableRAG('your-document-id', 'your-agent-id', 'your-api-key')
// .then(() => console.log('RAG setup complete'))
// .catch(err => console.error('Error:', err));
```
# Personalization
> Learn how to personalize your agent's behavior using dynamic variables and overrides.
## Overview
Personalization allows you to adapt your agent's behavior for each individual user, enabling more natural and contextually relevant conversations. ElevenLabs offers multiple approaches to personalization:
1. **Dynamic Variables** - Inject runtime values into prompts and messages
2. **Overrides** - Completely replace system prompts or messages
3. **Twilio Integration** - Personalize inbound call experiences via webhooks
## Personalization Methods
Define runtime values using `{{ var_name }}` syntax to personalize your agent's messages, system
prompts, and tools.
Completely replace system prompts, first messages, language, or voice settings for each
conversation.
Dynamically personalize inbound Twilio calls using webhook data.
## Conversation Initiation Client Data Structure
The `conversation_initiation_client_data` object defines what can be customized when starting a conversation:
```json
{
"type": "conversation_initiation_client_data",
"conversation_config_override": {
"agent": {
"prompt": {
"prompt": "overriding system prompt"
},
"first_message": "overriding first message",
"language": "en"
},
"tts": {
"voice_id": "voice-id-here"
}
},
"custom_llm_extra_body": {
"temperature": 0.7,
"max_tokens": 100
},
"dynamic_variables": {
"string_var": "text value",
"number_var": 1.2,
"integer_var": 123,
"boolean_var": true
}
}
```
## Choosing the Right Approach
Method
Best For
Implementation
**Dynamic Variables**
* Inserting user-specific data into templated content - Maintaining consistent agent
behavior with personalized details - Personalizing tool parameters
Define variables with
`{{ variable_name }}`
and pass values at runtime
**Overrides**
* Completely changing agent behavior per user - Switching languages or voices - Legacy
applications (consider migrating to Dynamic Variables)
Enable specific override permissions in security settings and pass complete replacement
content
## Learn More
* [Dynamic Variables Documentation](/docs/conversational-ai/customization/personalization/dynamic-variables)
* [Overrides Documentation](/docs/conversational-ai/customization/personalization/overrides)
* [Twilio Integration Documentation](/docs/conversational-ai/customization/personalization/twilio-personalization)
# Dynamic variables
> Pass runtime values to personalize your agent's behavior.
**Dynamic variables** allow you to inject runtime values into your agent's messages, system prompts, and tools. This enables you to personalize each conversation with user-specific data without creating multiple agents.
## Overview
Dynamic variables can be integrated into multiple aspects of your agent:
* **System prompts** to customize behavior and context
* **First messages** to personalize greetings
* **Tool parameters and headers** to pass user-specific data
Here are a few examples where dynamic variables are useful:
* **Personalizing greetings** with user names
* **Including account details** in responses
* **Passing data** to tool calls
* **Customizing behavior** based on subscription tiers
* **Accessing system information** like conversation ID or call duration
Dynamic variables are ideal for injecting user-specific data that shouldn't be hardcoded into your
agent's configuration.
## System dynamic variables
Your agent has access to these automatically available system variables:
* `system__agent_id` - Unique agent identifier
* `system__caller_id` - Caller's phone number (voice calls only)
* `system__called_number` - Destination phone number (voice calls only)
* `system__call_duration_secs` - Call duration in seconds
* `system__time_utc` - Current UTC time (ISO format)
* `system__conversation_id` - ElevenLabs' unique conversation identifier
* `system__call_sid` - Call SID (twilio calls only)
System variables:
* Are available without runtime configuration
* Are prefixed with `system__` (reserved prefix)
* In system prompts: Set once at conversation start (value remains static)
* In tool calls: Updated at execution time (value reflects current state)
Custom dynamic variables cannot use the reserved
`system__`
prefix.
## Secret dynamic variables
Secret dynamic variables are populated in the same way as normal dynamic variables but indicate to our Conversational AI platform that these should
only be used in dynamic variable headers and never sent to an LLM provider as part of an agent's system prompt or first message.
We recommend using these for auth tokens or private IDs that should not be sent to an LLM. To create a secret dynamic variable, simply prefix the dynamic variable with `secret__`.
## Guide
### Prerequisites
* An [ElevenLabs account](https://elevenlabs.io)
* A configured ElevenLabs Conversational Agent ([create one here](/docs/conversational-ai/quickstart))
Add variables using double curly braces `{{variable_name}}` in your:
* System prompts
* First messages
* Tool parameters


You can also define dynamic variables in the tool configuration.
To create a new dynamic variable, set the value type to Dynamic variable and click the `+` button.


Configure default values in the web interface for testing:

When starting a conversation, provide the dynamic variables in your code:
Ensure you have the latest [SDK](/docs/conversational-ai/libraries) installed.
```python title="Python" focus={10-23} maxLines=25
import os
import signal
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation, ConversationInitiationData
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
agent_id = os.getenv("AGENT_ID")
api_key = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(api_key=api_key)
dynamic_vars = {
"user_name": "Angelo",
}
config = ConversationInitiationData(
dynamic_variables=dynamic_vars
)
conversation = Conversation(
elevenlabs,
agent_id,
config=config,
# Assume auth is required when API_KEY is set.
requires_auth=bool(api_key),
# Use the default audio interface.
audio_interface=DefaultAudioInterface(),
# Simple callbacks that print the conversation to the console.
callback_agent_response=lambda response: print(f"Agent: {response}"),
callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
# Uncomment the below if you want to see latency measurements.
# callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
)
conversation.start_session()
signal.signal(signal.SIGINT, lambda sig, frame: conversation.end_session())
```
```javascript title="JavaScript" focus={7-20} maxLines=25
import { Conversation } from '@elevenlabs/client';
class VoiceAgent {
...
async startConversation() {
try {
// Request microphone access
await navigator.mediaDevices.getUserMedia({ audio: true });
this.conversation = await Conversation.startSession({
agentId: 'agent_id_goes_here', // Replace with your actual agent ID
dynamicVariables: {
user_name: 'Angelo'
},
... add some callbacks here
});
} catch (error) {
console.error('Failed to start conversation:', error);
alert('Failed to start conversation. Please ensure microphone access is granted.');
}
}
}
```
```swift title="Swift"
let dynamicVars: [String: DynamicVariableValue] = [
"customer_name": .string("John Doe"),
"account_balance": .number(5000.50),
"user_id": .int(12345),
"is_premium": .boolean(true)
]
// Create session config with dynamic variables
let config = SessionConfig(
agentId: "your_agent_id",
dynamicVariables: dynamicVars
)
// Start the conversation
let conversation = try await Conversation.startSession(
config: config
)
```
```html title="Widget"
```
## Supported Types
Dynamic variables support these value types:
Text values
Numeric values
True/false values
## Troubleshooting
Verify that:
* Variable names match exactly (case-sensitive)
* Variables use double curly braces: `{{ variable_name }}`
* Variables are included in your dynamic\_variables object
Ensure that:
* Variable values match the expected type
* Values are strings, numbers, or booleans only
# Overrides
> Tailor each conversation with personalized context for each user.
While overrides are still supported for completely replacing system prompts or first messages, we
recommend using [Dynamic
Variables](/docs/conversational-ai/customization/personalization/dynamic-variables) as the
preferred way to customize your agent's responses and inject real-time data. Dynamic Variables
offer better maintainability and a more structured approach to personalization.
**Overrides** enable your assistant to adapt its behavior for each user interaction. You can pass custom data and settings at the start of each conversation, allowing the assistant to personalize its responses and knowledge with real-time context. Overrides completely override the agent's default values defined in the agent's [dashboard](https://elevenlabs.io/app/conversational-ai/agents).
## Overview
Overrides allow you to modify your AI agent's behavior in real-time without creating multiple agents. This enables you to personalize responses with user-specific data.
Overrides can be enabled for the following fields in the agent's security settings:
* System prompt
* First message
* Language
* Voice ID
When overrides are enabled for a field, providing an override is still optional. If not provided, the agent will use the default values defined in the agent's [dashboard](https://elevenlabs.io/app/conversational-ai/agents). An error will be thrown if an override is provided for a field that does not have overrides enabled.
Here are a few examples where overrides can be useful:
* **Greet users** by their name
* **Include account-specific details** in responses
* **Adjust the agent's language** or tone based on user preferences
* **Pass real-time data** like account balances or order status
Overrides are particularly useful for applications requiring personalized interactions or handling
sensitive user data that shouldn't be stored in the agent's base configuration.
## Guide
### Prerequisites
* An [ElevenLabs account](https://elevenlabs.io)
* A configured ElevenLabs Conversational Agent ([create one here](/docs/conversational-ai/quickstart))
This guide will show you how to override the default agent **System prompt** & **First message**.
For security reasons, overrides are disabled by default. Navigate to your agent's settings and
select the **Security** tab.
Enable the `First message` and `System prompt` overrides.

In your code, where the conversation is started, pass the overrides as a parameter.
Ensure you have the latest [SDK](/docs/conversational-ai/libraries) installed.
```python title="Python" focus={3-14} maxLines=14
from elevenlabs.conversational_ai.conversation import Conversation, ConversationInitiationData
...
conversation_override = {
"agent": {
"prompt": {
"prompt": f"The customer's bank account balance is {customer_balance}. They are based in {customer_location}." # Optional: override the system prompt.
},
"first_message": f"Hi {customer_name}, how can I help you today?", # Optional: override the first_message.
"language": "en" # Optional: override the language.
},
"tts": {
"voice_id": "custom_voice_id" # Optional: override the voice.
}
}
config = ConversationInitiationData(
conversation_config_override=conversation_override
)
conversation = Conversation(
...
config=config,
...
)
conversation.start_session()
```
```javascript title="JavaScript" focus={4-15} maxLines=15
...
const conversation = await Conversation.startSession({
...
overrides: {
agent: {
prompt: {
prompt: `The customer's bank account balance is ${customer_balance}. They are based in ${customer_location}.` // Optional: override the system prompt.
},
firstMessage: `Hi ${customer_name}, how can I help you today?`, // Optional: override the first message.
language: "en" // Optional: override the language.
},
tts: {
voiceId: "custom_voice_id" // Optional: override the voice.
}
},
...
})
```
```swift title="Swift" focus={3-14} maxLines=14
import ElevenLabsSDK
let promptOverride = ElevenLabsSDK.AgentPrompt(
prompt: "The customer's bank account balance is \(customer_balance). They are based in \(customer_location)." // Optional: override the system prompt.
)
let agentConfig = ElevenLabsSDK.AgentConfig(
prompt: promptOverride, // Optional: override the system prompt.
firstMessage: "Hi \(customer_name), how can I help you today?", // Optional: override the first message.
language: .en // Optional: override the language.
)
let overrides = ElevenLabsSDK.ConversationConfigOverride(
agent: agentConfig, // Optional: override agent settings.
tts: TTSConfig(voiceId: "custom_voice_id") // Optional: override the voice.
)
let config = ElevenLabsSDK.SessionConfig(
agentId: "",
overrides: overrides
)
let conversation = try await ElevenLabsSDK.Conversation.startSession(
config: config,
callbacks: callbacks
)
```
```html title="Widget"
override-prompt="Custom system prompt for this user"
override-first-message="Hi! How can I help you today?"
override-voice-id="custom_voice_id"
>
```
When using overrides, omit any fields you don't want to override rather than setting them to empty strings or null values. Only include the fields you specifically want to customize.
# Twilio personalization
> Configure personalization for incoming Twilio calls using webhooks.
## Overview
When receiving inbound Twilio calls, you can dynamically fetch conversation initiation data through a webhook. This allows you to customize your agent's behavior based on caller information and other contextual data.
## How it works
1. When a Twilio call is received, the ElevenLabs Conversational AI platform will make a webhook call to your specified endpoint, passing call information (`caller_id`, `agent_id`, `called_number`, `call_sid`) as arguments
2. Your webhook returns conversation initiation client data, including dynamic variables and overrides (an example is shown below)
3. This data is used to initiate the conversation
The system uses Twilio's connection/dialing period to fetch webhook data in parallel, creating a
seamless experience where:
* Users hear the expected telephone connection sound
* In parallel, theConversational AI platform fetches necessary webhook data
* The conversation is initiated with the fetched data by the time the audio connection is established
## Configuration
In the [settings page](https://elevenlabs.io/app/conversational-ai/settings) of the Conversational AI platform, configure the webhook URL and add any
secrets needed for authentication.

Click on the webhook to modify which secrets are sent in the headers.

In the "Security" tab of the [agent's page](https://elevenlabs.io/app/conversational-ai/agents/), enable fetching conversation initiation data for inbound Twilio calls, and define fields that can be overridden.

The webhook will receive a POST request with the following parameters:
| Parameter | Type | Description |
| --------------- | ------ | -------------------------------------- |
| `caller_id` | string | The phone number of the caller |
| `agent_id` | string | The ID of the agent receiving the call |
| `called_number` | string | The Twilio number that was called |
| `call_sid` | string | Unique identifier for the Twilio call |
Your webhook must return a JSON response containing the initiation data for the agent.
The `dynamic_variables` field must contain all dynamic variables defined for the agent. Overrides
on the other hand are entirely optional. For more information about dynamic variables and
overrides see the [dynamic variables](/docs/conversational-ai/customization/personalization/dynamic-variables) and
[overrides](/docs/conversational-ai/customization/personalization/overrides) docs.
An example response could be:
```json
{
"type": "conversation_initiation_client_data",
"dynamic_variables": {
"customer_name": "John Doe",
"account_status": "premium",
"last_interaction": "2024-01-15"
},
"conversation_config_override": {
"agent": {
"prompt": {
"prompt": "The customer's bank account balance is $100. They are based in San Francisco."
},
"first_message": "Hi, how can I help you today?",
"language": "en"
},
"tts": {
"voice_id": "new-voice-id"
}
}
}
```
The Conversational AI platform will use the dynamic variables to populate the conversation initiation data, and the conversation will start smoothly.
Ensure your webhook responds within a reasonable timeout period to avoid delaying the call
handling.
## Security
* Use HTTPS endpoints only
* Implement authentication using request headers
* Store sensitive values as secrets through the [ElevenLabs secrets manager](https://elevenlabs.io/app/conversational-ai/settings)
* Validate the incoming request parameters
# Voice customization
> Learn how to customize your AI agent's voice and speech patterns.
## Overview
You can customize various aspects of your AI agent's voice to create a more natural and engaging conversation experience. This includes controlling pronunciation, speaking speed, and language-specific voice settings.
## Available customizations
Enable your agent to switch between different voices for multi-character conversations,
storytelling, and language tutoring.
Control how your agent pronounces specific words and phrases using
[IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) or
[CMU](https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary) notation.
Adjust how quickly or slowly your agent speaks, with values ranging from 0.7x to 1.2x.
Configure different voices for each supported language to ensure natural pronunciation.
## Best practices
Choose voices that match your target language and region for the most natural pronunciation.
Consider testing multiple voices to find the best fit for your use case.
Start with the default speed (1.0) and adjust based on your specific needs. Test different
speeds with your content to find the optimal balance between clarity and natural flow.
Focus on terms specific to your business or use case that need consistent pronunciation and are
not widely used in everyday conversation. Test pronunciations with your chosen voice and model
combination.
Some voice customization features may be model-dependent. For example, phoneme-based pronunciation
control is only available with the Turbo v2 model.
# Multi-voice support
> Enable your AI agent to switch between different voices for multi-character conversations and enhanced storytelling.
## Overview
Multi-voice support allows your conversational AI agent to dynamically switch between different ElevenLabs voices during a single conversation. This powerful feature enables:
* **Multi-character storytelling**: Different voices for different characters in narratives
* **Language tutoring**: Native speaker voices for different languages
* **Emotional agents**: Voice changes based on emotional context
* **Role-playing scenarios**: Distinct voices for different personas
## How it works
When multi-voice support is enabled, your agent can use XML-style markup to switch between configured voices during text generation. The agent automatically returns to the default voice when no specific voice is specified.
```xml title="Example voice switching"
The teacher said, ¡Hola estudiantes!
Then the student replied, Hello! How are you today?
```
```xml title="Multi-character dialogue"
Once upon a time, in a distant kingdom...
I need to find the magic crystal!
The crystal lies beyond the enchanted forest.
```
## Configuration
### Adding supported voices
Navigate to your agent settings and locate the **Multi-voice support** section under the `Voice` tab.
### Add a new voice
Click **Add voice** to configure a new supported voice for your agent.
### Configure voice properties
Set up the voice with the following details:
* **Voice label**: Unique identifier (e.g., "Joe", "Spanish", "Happy")
* **Voice**: Select from your available ElevenLabs voices
* **Model Family**: Choose Turbo, Flash, or Multilingual (optional)
* **Language**: Override the default language for this voice (optional)
* **Description**: When the agent should use this voice
### Save configuration
Click **Add voice** to save the configuration. The voice will be available for your agent to use immediately.
### Voice properties
A unique identifier that the LLM uses to reference this voice. Choose descriptive labels like: -
Character names: "Alice", "Bob", "Narrator" - Languages: "Spanish", "French", "German" -
Emotions: "Happy", "Sad", "Excited" - Roles: "Teacher", "Student", "Guide"
Override the agent's default model family for this specific voice: - **Flash**: Fastest eneration,
optimized for real-time use - **Turbo**: Balanced speed and quality - **Multilingual**: Highest
quality, best for non-English languages - **Same as agent**: Use agent's default setting
Specify a different language for this voice, useful for: - Multilingual conversations - Language
tutoring applications - Region-specific pronunciations
Provide context for when the agent should use this voice.
Examples:
* "For any Spanish words or phrases"
* "When the message content is joyful or excited"
* "Whenever the character Joe is speaking"
## Implementation
### XML markup syntax
Your agent uses XML-style tags to switch between voices:
```xml
text to be spoken
```
**Key points:**
* Replace `VOICE_LABEL` with the exact label you configured
* Text outside tags uses the default voice
* Tags are case-sensitive
* Nested tags are not supported
### System prompt integration
When you configure supported voices, the system automatically adds instructions to your agent's prompt:
```
When a message should be spoken by a particular person, use markup: "
message " where CHARACTER is the character label.
Available voices are as follows:
- default: any text outside of the CHARACTER tags
- Joe: Whenever Joe is speaking
- Spanish: For any Spanish words or phrases
- Narrator: For narrative descriptions
```
### Example usage
```
Teacher: Let's practice greetings. In Spanish, we say ¡Hola! ¿Cómo estás?
Student: How do I respond?
Teacher: You can say ¡Hola! Estoy bien, gracias. which means Hello! I'm fine, thank you.
```
```
Once upon a time, a brave princess ventured into a dark cave.
I'm not afraid of you, dragon! she declared boldly. The dragon rumbled from
the shadows, You should be, little one.
But the princess stood her ground, ready for whatever came next.
```
## Best practices
* Choose voices that clearly differentiate between characters or contexts
* Test voice combinations to ensure they work well together
* Consider the emotional tone and personality for each voice
* Ensure voices match the language and accent when switching languages
* Use descriptive, intuitive labels that the LLM can understand
* Keep labels short and memorable
* Avoid special characters or spaces in labels
* Limit the number of supported voices to what you actually need
* Use the same model family when possible to reduce switching overhead
* Test with your expected conversation patterns
* Monitor response times with multiple voice switches
* Provide clear descriptions for when each voice should be used
* Test edge cases where voice switching might be unclear
* Consider fallback behavior when voice labels are ambiguous
* Ensure voice switches enhance rather than distract from the conversation
## Limitations
* Maximum of 10 supported voices per agent (including default)
* Voice switching adds minimal latency during generation
* XML tags must be properly formatted and closed
* Voice labels are case-sensitive in markup
* Nested voice tags are not supported
## FAQ
If the agent uses a voice label that hasn't been configured, the text will be spoken using the
default voice. The XML tags will be ignored.
Yes, you can switch voices within a single response. Each tagged section will use the specified
voice, while untagged text uses the default voice.
Voice switching adds minimal overhead. The first use of each voice in a conversation may have
slightly higher latency as the voice is initialized.
Yes, you can configure multiple labels that use the same ElevenLabs voice but with different model
families, languages, or contexts.
Provide clear examples in your system prompt and test thoroughly. You can include specific
scenarios where voice switching should occur and examples of the XML markup format.
# Pronunciation dictionaries
> Learn how to control how your AI agent pronounces specific words and phrases.
## Overview
Pronunciation dictionaries allow you to customize how your AI agent pronounces specific words or phrases. This is particularly useful for:
* Correcting pronunciation of names, places, or technical terms
* Ensuring consistent pronunciation across conversations
* Customizing regional pronunciation variations
## Configuration
You can find the pronunciation dictionary settings under the **Voice** tab in your agent's configuration.
The phoneme function of pronunciation dictionaries only works with the Turbo v2 model, while the
alias function works with all models.
## Dictionary file format
Pronunciation dictionaries use XML-based `.pls` files. Here's an example structure:
```xml
Apple
ˈæpl̩
UN
United Nations
```
## Supported formats
We support two types of pronunciation notation:
1. **IPA (International Phonetic Alphabet)**
* More precise control over pronunciation
* Requires knowledge of IPA symbols
* Example: "nginx" as `/ˈɛndʒɪnˈɛks/`
2. **CMU (Carnegie Mellon University) Dictionary format**
* Simpler ASCII-based format
* More accessible for English pronunciations
* Example: "tomato" as "T AH M EY T OW"
You can use AI tools like Claude or ChatGPT to help generate IPA or CMU notations for specific
words.
## Best practices
1. **Case sensitivity**: Create separate entries for capitalized and lowercase versions of words if needed
2. **Testing**: Always test pronunciations with your chosen voice and model
3. **Maintenance**: Keep your dictionary organized and documented
4. **Scope**: Focus on words that are frequently mispronounced or critical to your use case
## FAQ
Currently, only the Turbo v2 model supports phoneme-based pronunciation. Other models will
silently skip phoneme entries.
Yes, you can upload multiple dictionary files to handle different sets of pronunciations.
The model will use its default pronunciation rules for any words not specified in the
dictionary.
## Additional resources
* [Professional Voice Cloning](/docs/product-guides/voices/voice-cloning/professional-voice-cloning)
* [Voice Design](/docs/product-guides/voices/voice-design)
* [Text to Speech API Reference](/docs/api-reference/text-to-speech)
# Speed control
> Learn how to adjust the speaking speed of your conversational AI agent.
## Overview
The speed control feature allows you to adjust how quickly or slowly your agent speaks. This can be useful for:
* Making speech more accessible for different audiences
* Matching specific use cases (e.g., slower for educational content)
* Optimizing for different types of conversations
## Configuration
Speed is controlled through the [`speed` parameter](/docs/api-reference/agents/create#request.body.conversation_config.tts.speed) with the following specifications:
* **Range**: 0.7 to 1.2
* **Default**: 1.0
* **Type**: Optional
## How it works
The speed parameter affects the pace of speech generation:
* Values below 1.0 slow down the speech
* Values above 1.0 speed up the speech
* 1.0 represents normal speaking speed
Extreme values near the minimum or maximum may affect the quality of the generated speech.
## Best practices
* Start with the default speed (1.0) and adjust based on user feedback
* Test different speeds with your specific content
* Consider your target audience when setting the speed
* Monitor speech quality at extreme values
Values outside the 0.7-1.2 range are not supported.
# Language
> Learn how to configure your agent to speak multiple languages.
## Overview
This guide shows you how to configure your agent to speak multiple languages. You'll learn to:
* Configure your agent's primary language
* Add support for multiple languages
* Set language-specific voices and first messages
* Optimize voice selection for natural pronunciation
* Enable automatic language switching
## Guide
When you create a new agent, it's configured with:
* English as the primary language
* Flash v2 model for fast, English-only responses
* A default first message.

Additional languages switch the agent to use the v2.5 Multilingual model. English will always use
the v2 model.
First, navigate to your agent's configuration page and locate the **Agent** tab.
1. In the **Additional Languages** add an additional language (e.g. French)
2. Review the first message, which is automatically translated using a Large Language Model (LLM). Customize it as needed for each additional language to ensure accuracy and cultural relevance.

Selecting the **All** option in the **Additional Languages** dropdown will configure the agent to
support 31 languages. Collectively, these languages are spoken by approximately 90% of the world's
population.
For optimal pronunciation, configure each additional language with a language-specific voice from our [Voice Library](https://elevenlabs.io/app/voice-library).
To find great voices for each language curated by the ElevenLabs team, visit the [language top
picks](https://elevenlabs.io/app/voice-library/collections).


Add the [language detection tool](/docs/conversational-ai/customization/tools/system-tools/language-detection) to your agent can automatically switch to the user's preferred language.
Now that the agent is configured to support additional languages, the widget will prompt the user for their preferred language before the conversation begins.
If using the SDK, the language can be set programmatically using conversation overrides. See the
[Overrides](/docs/conversational-ai/customization/personalization/overrides) guide for implementation details.

Language selection is fixed for the duration of the call - users cannot switch languages
mid-conversation.
### Internationalization
You can integrate the widget with your internationalization framework by dynamically setting the language and UI text attributes.
```html title="Widget"
```
Ensure the language codes match between your i18n framework and the agent's supported languages.
## Best practices
Select voices specifically trained in your target languages. This ensures:
* Natural pronunciation
* Appropriate regional accents
* Better handling of language-specific nuances
While automatic translations are provided, consider:
* Reviewing translations for accuracy
* Adapting greetings for cultural context
* Adjusting formal/informal tone as needed
# Large Language Models (LLMs)
> Understand the available LLMs for your conversational AI agents, their capabilities, and pricing.
## Overview
Our conversational AI platform supports a variety of cutting-edge Large Language Models (LLMs) to power your voice agents. Choosing the right LLM depends on your specific needs, balancing factors like performance, context window size, features, and cost. This document provides details on the supported models and their associated pricing.
The selection of an LLM is a critical step in configuring your conversational AI agent, directly impacting its conversational abilities, knowledge depth, and operational cost.
## Supported LLMs
We offer models from leading providers such as OpenAI, Google, and Anthropic, as well as the option to integrate your own custom LLM for maximum flexibility.
Pricing is typically denoted in USD per 1 million tokens unless specified otherwise. A token is a
fundamental unit of text data for LLMs, roughly equivalent to 4 characters on average.
Google's Gemini models offer a balance of performance, large context windows, and competitive pricing, with the lowest latency.
| Model | Max Output Tokens | Max Context (Tokens) | Input Price (\$/1M tokens) | Output Price (\$/1M tokens) | Input Cache Read (\$/1M tokens) | Input Cache Write (\$/1M tokens) |
| ----------------------- | ----------------- | -------------------- | -------------------------- | --------------------------- | ------------------------------- | -------------------------------- |
| `gemini-1.5-pro` | 8,192 | 2,097,152 | 1.25 | 5 | 0.3125 | n/a |
| `gemini-1.5-flash` | 8,192 | 1,048,576 | 0.075 | 0.3 | 0.01875 | n/a |
| `gemini-2.0-flash` | 8,192 | 1,048,576 | 0.1 | 0.4 | 0.025 | n/a |
| `gemini-2.0-flash-lite` | 8,192 | 1,048,576 | 0.075 | 0.3 | n/a | n/a |
| `gemini-2.5-flash` | 65,535 | 1,048,576 | 0.15 | 0.6 | n/a | n/a |
| Model | Avg LLM Cost (No KB) (\$/min) | Avg LLM Cost (Large KB) (\$/min) |
| ----------------------- | ----------------------------- | -------------------------------- |
| `gemini-1.5-pro` | 0.009 | 0.10 |
| `gemini-1.5-flash` | 0.002 | 0.01 |
| `gemini-2.0-flash` | 0.001 | 0.02 |
| `gemini-2.0-flash-lite` | 0.001 | 0.009 |
| `gemini-2.5-flash` | 0.001 | 0.10 |
OpenAI models are known for their strong general-purpose capabilities and wide range of options.
| Model | Max Output Tokens | Max Context (Tokens) | Input Price (\$/1M tokens) | Output Price (\$/1M tokens) | Input Cache Read (\$/1M tokens) | Input Cache Write (\$/1M tokens) |
| --------------- | ----------------- | -------------------- | -------------------------- | --------------------------- | ------------------------------- | -------------------------------- |
| `gpt-4o-mini` | 16,384 | 128,000 | 0.15 | 0.6 | 0.075 | n/a |
| `gpt-4o` | 4,096 | 128,000 | 2.5 | 10 | 1.25 | n/a |
| `gpt-4` | 8,192 | 8,192 | 30 | 60 | n/a | n/a |
| `gpt-4-turbo` | 4,096 | 128,000 | 10 | 30 | n/a | n/a |
| `gpt-4.1` | 32,768 | 1,047,576 | 2 | 8 | n/a | n/a |
| `gpt-4.1-mini` | 32,768 | 1,047,576 | 0.4 | 1.6 | 0.1 | n/a |
| `gpt-4.1-nano` | 32,768 | 1,047,576 | 0.1 | 0.4 | 0.025 | n/a |
| `gpt-3.5-turbo` | 4,096 | 16,385 | 0.5 | 1.5 | n/a | n/a |
| Model | Avg LLM Cost (No KB) (\$/min) | Avg LLM Cost (Large KB) (\$/min) |
| --------------- | ----------------------------- | -------------------------------- |
| `gpt-4o-mini` | 0.001 | 0.10 |
| `gpt-4o` | 0.01 | 0.13 |
| `gpt-4` | n/a | n/a |
| `gpt-4-turbo` | 0.04 | 0.39 |
| `gpt-4.1` | 0.003 | 0.13 |
| `gpt-4.1-mini` | 0.002 | 0.07 |
| `gpt-4.1-nano` | 0.000 | 0.006 |
| `gpt-3.5-turbo` | 0.005 | 0.08 |
Anthropic's Claude models are designed with a focus on helpfulness, honesty, and harmlessness, often featuring large context windows.
| Model | Max Output Tokens | Max Context (Tokens) | Input Price (\$/1M tokens) | Output Price (\$/1M tokens) | Input Cache Read (\$/1M tokens) | Input Cache Write (\$/1M tokens) |
| ---------------------- | ----------------- | -------------------- | -------------------------- | --------------------------- | ------------------------------- | -------------------------------- |
| `claude-sonnet-4` | 64,000 | 200,000 | 3 | 15 | 0.3 | 3.75 |
| `claude-3-7-sonnet` | 4,096 | 200,000 | 3 | 15 | 0.3 | 3.75 |
| `claude-3-5-sonnet` | 4,096 | 200,000 | 3 | 15 | 0.3 | 3.75 |
| `claude-3-5-sonnet-v1` | 4,096 | 200,000 | 3 | 15 | 0.3 | 3.75 |
| `claude-3-0-haiku` | 4,096 | 200,000 | 0.25 | 1.25 | 0.03 | 0.3 |
| Model | Avg LLM Cost (No KB) (\$/min) | Avg LLM Cost (Large KB) (\$/min) |
| ---------------------- | ----------------------------- | -------------------------------- |
| `claude-sonnet-4` | 0.03 | 0.26 |
| `claude-3-7-sonnet` | 0.03 | 0.26 |
| `claude-3-5-sonnet` | 0.03 | 0.20 |
| `claude-3-5-sonnet-v1` | 0.03 | 0.17 |
| `claude-3-0-haiku` | 0.002 | 0.03 |
## Choosing an LLM
Selecting the most suitable LLM for your application involves considering several factors:
* **Task Complexity**: More demanding or nuanced tasks generally benefit from more powerful models (e.g., OpenAI's GPT-4 series, Anthropic's Claude Sonnet 4, Google's Gemini 2.5 models).
* **Latency Requirements**: For applications requiring real-time or near real-time responses, such as live voice conversations, models optimized for speed are preferable (e.g., Google's Gemini Flash series, Anthropic's Claude Haiku, OpenAI's GPT-4o-mini).
* **Context Window Size**: If your application needs to process, understand, or recall information from long conversations or extensive documents, select models with larger context windows.
* **Cost-Effectiveness**: Balance the desired performance and features against your budget. LLM prices can vary significantly, so analyze the pricing structure (input, output, and cache tokens) in relation to your expected usage patterns.
* **HIPAA Compliance**: If your application involves Protected Health Information (PHI), it is crucial to use an LLM that is designated as HIPAA compliant and ensure your entire data handling process meets regulatory standards.
## HIPAA Compliance
Certain LLMs available on our platform may be suitable for use in environments requiring HIPAA compliance, please see the [HIPAA compliance docs](/docs/conversational-ai/legal/hipaa) for more details
## Understanding LLM Pricing
* **Tokens**: LLM usage is typically billed based on the number of tokens processed. As a general guideline for English text, 100 tokens is approximately equivalent to 75 words.
* **Input vs. Output Pricing**: Providers often differentiate pricing for input tokens (the data you send to the model) and output tokens (the data the model generates in response).
* **Cache Pricing**:
* `input_cache_read`: This refers to the cost associated with retrieving previously processed input data from a cache. Utilizing cached data can lead to cost savings if identical inputs are processed multiple times.
* `input_cache_write`: This is the cost associated with storing input data into a cache. Some LLM providers may charge for this operation.
* The prices listed in this document are per 1 million tokens and are based on the information available at the time of writing. These prices are subject to change by the LLM providers.
For the most accurate and current information on model capabilities, pricing, and terms of service, always consult the official documentation from the respective LLM providers (OpenAI, Google, Anthropic, xAI).
# Optimizing LLM costs
> Practical strategies to reduce LLM inference expenses on the ElevenLabs platform.
## Overview
Managing Large Language Model (LLM) inference costs is essential for developing sustainable AI applications. This guide outlines key strategies to optimize expenditure on the ElevenLabs platform by effectively utilizing its features. For detailed model capabilities and pricing, refer to our main [LLM documentation](/docs/conversational-ai/customization/llm).
ElevenLabs supports reducing costs by reducing inference of the models during periods of silence.
These periods are billed at 5% of the usual per minute rate. See [the Conversational AI overview
page](/docs/conversational-ai/overview#pricing-during-silent-periods) for more details.
## Understanding inference costs
LLM inference costs on our platform are primarily influenced by:
* **Input tokens**: The amount of data processed from your prompt, including user queries, system instructions, and any contextual data.
* **Output tokens**: The number of tokens generated by the LLM in its response.
* **Model choice**: Different LLMs have varying per-token pricing. More powerful models generally incur higher costs.
Monitoring your usage via the ElevenLabs dashboard or API is crucial for identifying areas for cost reduction.
## Strategic model selection
Choosing the most appropriate LLM is a primary factor in cost efficiency.
* **Right-sizing**: Select the least complex (and typically less expensive) model that can reliably perform your specific task. Avoid using high-cost models for simple operations. For instance, models like Google's `gemini-2.0-flash` offer highly competitive pricing for many common tasks. Always cross-reference with the full [Supported LLMs list](/docs/conversational-ai/customization/llm#supported-llms) for the latest pricing and capabilities.
* **Experimentation**: Test various models for your tasks, comparing output quality against incurred costs. Consider language support, context window needs, and specialized skills.
## Prompt optimization
Prompt engineering is a powerful technique for reducing token consumption and associated costs. By crafting clear, concise, and unambiguous system prompts, you can guide the model to produce more efficient responses. Eliminate redundant wording and unnecessary context that might inflate your token count. Consider explicitly instructing the model on your desired output length—for example, by adding phrases like "Limit your response to two sentences" or "Provide a brief summary." These simple directives can significantly reduce the number of output tokens while maintaining the quality and relevance of the generated content.
**Modular design**: For complex conversational flows, leverage [agent-agent transfer](/docs/conversational-ai/customization/tools/system-tools/agent-transfer). This allows you to break down a single, large system prompt into multiple, smaller, and more specialized prompts, each handled by a different agent. This significantly reduces the token count per interaction by loading only the contextually relevant prompt for the current stage of the conversation, rather than a comprehensive prompt designed for all possibilities.
## Leveraging knowledge and retrieval
For applications requiring access to large information volumes, Retrieval Augmented Generation (RAG) and a well-maintained knowledge base are key.
* **Efficient RAG**:
* RAG reduces input tokens by providing the LLM with only relevant snippets from your [Knowledge Base](/docs/conversational-ai/customization/knowledge-base), instead of including extensive data in the prompt.
* Optimize the retriever to fetch only the most pertinent "chunks" of information.
* Fine-tune chunk size and overlap for a balance between context and token count.
* Learn more about implementing [RAG](/docs/conversational-ai/customization/knowledge-base/rag).
* **Context size**:
* Ensure your [Knowledge Base](/docs/conversational-ai/customization/knowledge-base) contains accurate, up-to-date, and relevant information.
* Well-structured content improves retrieval precision and reduces token usage from irrelevant context.
## Intelligent tool utilization
Using [Server Tools](/docs/conversational-ai/customization/tools/server-tools) allows LLMs to delegate tasks to external APIs or custom code, which can be more cost-effective.
* **Task offloading**: Identify deterministic tasks, those requiring real-time data, complex calculations, or API interactions (e.g., database lookups, external service calls).
* **Orchestration**: The LLM acts as an orchestrator, making structured tool calls. This is often far more token-efficient than attempting complex tasks via prompting alone.
* **Tool descriptions**: Provide clear, concise descriptions for each tool, enabling the LLM to use them efficiently and accurately.
## Checklist
Consider applying these techniques to reduce cost:
| Feature | Cost impact | Action items |
| :---------------- | :------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| LLM choice | Reduces per-token cost | Select the smallest, most economical model that reliably performs the task. Experiment and compare cost vs. quality. |
| Custom LLMs | Potentially lower inference cost for specialized tasks | Evaluate for high-volume, specific tasks; fine-tune on proprietary data to create smaller, efficient models. |
| System prompts | Reduces input & output tokens, guides model behavior | Be concise, clear, and specific. Instruct on desired output format and length (e.g., "be brief," "use JSON"). |
| User prompts | Reduces input tokens | Encourage specific queries; use few-shot examples strategically; summarize or select relevant history. |
| Output control | Reduces output tokens | Prompt for summaries or key info; use `max_tokens` cautiously; iterate on prompts to achieve natural conciseness. |
| RAG | Reduces input tokens by avoiding large context in prompt | Optimize retriever for relevance; fine-tune chunk size/overlap; ensure high-quality embeddings and search algorithms. |
| Knowledge base | Improves RAG efficiency, reducing irrelevant tokens | Curate regularly; remove outdated info; ensure good structure, metadata, and tagging for precise retrieval. |
| Tools (functions) | Avoids LLM calls for specific tasks; reduces tokens | Delegate deterministic, calculation-heavy, or external API tasks to tools. Design clear tool descriptions for the LLM. |
| Agent transfer | Enables use of cheaper models for simpler parts of tasks | Use simpler/cheaper agents for initial triage/FAQs; transfer to capable agents only when needed; decompose large prompts into smaller prompts across various agents |
For stateful conversations, rather than passing in multiple conversation transcripts as a part of
the system prompt, implement history summarization or sliding window techniques to keep context
lean. This can be particularly effective when building consumer applications and can often be
managed upon receiving a post-call webhook.
Continuously monitor your LLM usage and costs. Regularly review and refine your prompts, RAG
configurations, and tool integrations to ensure ongoing cost-effectiveness.
# Integrate your own model
> Connect an agent to your own LLM or host your own server.
Custom LLM allows you to connect your conversations to your own LLM via an external endpoint.
ElevenLabs also supports [natively integrated LLMs](/docs/conversational-ai/customization/llm)
**Custom LLMs** let you bring your own OpenAI API key or run an entirely custom LLM server.
## Overview
By default, we use our own internal credentials for popular models like OpenAI. To use a custom LLM server, it must align with the OpenAI [create chat completion](https://platform.openai.com/docs/api-reference/chat/create) request/response structure.
The following guides cover both use cases:
1. **Bring your own OpenAI key**: Use your own OpenAI API key with our platform.
2. **Custom LLM server**: Host and connect your own LLM server implementation.
You'll learn how to:
* Store your OpenAI API key in ElevenLabs
* host a server that replicates OpenAI's [create chat completion](https://platform.openai.com/docs/api-reference/chat/create) endpoint
* Direct ElevenLabs to your custom endpoint
* Pass extra parameters to your LLM as needed
## Using your own OpenAI key
To integrate a custom OpenAI key, create a secret containing your OPENAI\_API\_KEY:
Navigate to the "Secrets" page and select "Add Secret"

Choose "Custom LLM" from the dropdown menu.

Enter the URL, your model, and the secret you created.

Set "Custom LLM extra body" to true.

## Custom LLM Server
To bring a custom LLM server, set up a compatible server endpoint using OpenAI's style, specifically targeting create\_chat\_completion.
Here's an example server implementation using FastAPI and OpenAI's Python SDK:
```python
import json
import os
import fastapi
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI
import uvicorn
import logging
from dotenv import load_dotenv
from pydantic import BaseModel
from typing import List, Optional
# Load environment variables from .env file
load_dotenv()
# Retrieve API key from environment
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
raise ValueError("OPENAI_API_KEY not found in environment variables")
app = fastapi.FastAPI()
oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
class Message(BaseModel):
role: str
content: str
class ChatCompletionRequest(BaseModel):
messages: List[Message]
model: str
temperature: Optional[float] = 0.7
max_tokens: Optional[int] = None
stream: Optional[bool] = False
user_id: Optional[str] = None
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
oai_request = request.dict(exclude_none=True)
if "user_id" in oai_request:
oai_request["user"] = oai_request.pop("user_id")
chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
async def event_stream():
try:
async for chunk in chat_completion_coroutine:
# Convert the ChatCompletionChunk to a dictionary before JSON serialization
chunk_dict = chunk.model_dump()
yield f"data: {json.dumps(chunk_dict)}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logging.error("An error occurred: %s", str(e))
yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8013)
```
Run this code or your own server code.

### Setting Up a Public URL for Your Server
To make your server accessible, create a public URL using a tunneling tool like ngrok:
```shell
ngrok http --url=
.ngrok.app 8013
```

### Configuring Elevenlabs CustomLLM
Now let's make the changes in Elevenlabs


Direct your server URL to ngrok endpoint, setup "Limit token usage" to 5000 and set "Custom LLM extra body" to true.
You can start interacting with Conversational AI with your own LLM server
## Optimizing for slow processing LLMs
If your custom LLM has slow processing times (perhaps due to agentic reasoning or pre-processing requirements) you can improve the conversational flow by implementing **buffer words** in your streaming responses. This technique helps maintain natural speech prosody while your LLM generates the complete response.
### Buffer words
When your LLM needs more time to process the full response, return an initial response ending with `"... "` (ellipsis followed by a space). This allows the Text to Speech system to maintain natural flow while keeping the conversation feeling dynamic.
This creates natural pauses that flow well into subsequent content that the LLM can reason longer about. The extra space is crucial to ensure that the subsequent content is not appended to the "..." which can lead to audio distortions.
### Implementation
Here's how to modify your custom LLM server to implement buffer words:
```python title="server.py"
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
oai_request = request.dict(exclude_none=True)
if "user_id" in oai_request:
oai_request["user"] = oai_request.pop("user_id")
async def event_stream():
try:
# Send initial buffer chunk while processing
initial_chunk = {
"id": "chatcmpl-buffer",
"object": "chat.completion.chunk",
"created": 1234567890,
"model": request.model,
"choices": [{
"delta": {"content": "Let me think about that... "},
"index": 0,
"finish_reason": None
}]
}
yield f"data: {json.dumps(initial_chunk)}\n\n"
# Process the actual LLM response
chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
async for chunk in chat_completion_coroutine:
chunk_dict = chunk.model_dump()
yield f"data: {json.dumps(chunk_dict)}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logging.error("An error occurred: %s", str(e))
yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
```
```typescript title="server.ts"
app.post('/v1/chat/completions', async (req: Request, res: Response) => {
const request = req.body as ChatCompletionRequest;
const oaiRequest = { ...request };
if (oaiRequest.user_id) {
oaiRequest.user = oaiRequest.user_id;
delete oaiRequest.user_id;
}
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
try {
// Send initial buffer chunk while processing
const initialChunk = {
id: "chatcmpl-buffer",
object: "chat.completion.chunk",
created: Math.floor(Date.now() / 1000),
model: request.model,
choices: [{
delta: { content: "Let me think about that... " },
index: 0,
finish_reason: null
}]
};
res.write(`data: ${JSON.stringify(initialChunk)}\n\n`);
// Process the actual LLM response
const stream = await openai.chat.completions.create({
...oaiRequest,
stream: true
});
for await (const chunk of stream) {
res.write(`data: ${JSON.stringify(chunk)}\n\n`);
}
res.write('data: [DONE]\n\n');
res.end();
} catch (error) {
console.error('An error occurred:', error);
res.write(`data: ${JSON.stringify({ error: 'Internal error occurred!' })}\n\n`);
res.end();
}
});
```
## System tools integration
Your custom LLM can trigger [system tools](/docs/conversational-ai/customization/tools/system-tools) to control conversation flow and state. These tools are automatically included in the `tools` parameter of your chat completion requests when configured in your agent.
### How system tools work
1. **LLM Decision**: Your custom LLM decides when to call these tools based on conversation context
2. **Tool Response**: The LLM responds with function calls in standard OpenAI format
3. **Backend Processing**: ElevenLabs processes the tool calls and updates conversation state
For more information on system tools, please see [our guide](/docs/conversational-ai/customization/tools/system-tools)
### Available system tools
**Purpose**: Automatically terminate conversations when appropriate conditions are met.
**Trigger conditions**: The LLM should call this tool when:
* The main task has been completed and user is satisfied
* The conversation reached natural conclusion with mutual agreement
* The user explicitly indicates they want to end the conversation
**Parameters**:
* `reason` (string, required): The reason for ending the call
* `message` (string, optional): A farewell message to send to the user before ending the call
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "end_call",
"arguments": "{\"reason\": \"Task completed successfully\", \"message\": \"Thank you for using our service. Have a great day!\"}"
}
}
```
**Implementation**: Configure as a system tool in your agent settings. The LLM will receive detailed instructions about when to call this function.
Learn more: [End call tool](/docs/conversational-ai/customization/tools/system-tools/end-call)
**Purpose**: Automatically switch to the user's detected language during conversations.
**Trigger conditions**: The LLM should call this tool when:
* User speaks in a different language than the current conversation language
* User explicitly requests to switch languages
* Multi-language support is needed for the conversation
**Parameters**:
* `reason` (string, required): The reason for the language switch
* `language` (string, required): The language code to switch to (must be in supported languages list)
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "language_detection",
"arguments": "{\"reason\": \"User requested Spanish\", \"language\": \"es\"}"
}
}
```
**Implementation**: Configure supported languages in agent settings and add the language detection system tool. The agent will automatically switch voice and responses to match detected languages.
Learn more: [Language detection tool](/docs/conversational-ai/customization/tools/system-tools/language-detection)
**Purpose**: Transfer conversations between specialized AI agents based on user needs.
**Trigger conditions**: The LLM should call this tool when:
* User request requires specialized knowledge or different agent capabilities
* Current agent cannot adequately handle the query
* Conversation flow indicates need for different agent type
**Parameters**:
* `reason` (string, optional): The reason for the agent transfer
* `agent_number` (integer, required): Zero-indexed number of the agent to transfer to (based on configured transfer rules)
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "transfer_to_agent",
"arguments": "{\"reason\": \"User needs billing support\", \"agent_number\": 0}"
}
}
```
**Implementation**: Define transfer rules mapping conditions to specific agent IDs. Configure which agents the current agent can transfer to. Agents are referenced by zero-indexed numbers in the transfer configuration.
Learn more: [Agent transfer tool](/docs/conversational-ai/customization/tools/system-tools/agent-transfer)
**Purpose**: Seamlessly hand off conversations to human operators when AI assistance is insufficient.
**Trigger conditions**: The LLM should call this tool when:
* Complex issues requiring human judgment
* User explicitly requests human assistance
* AI reaches limits of capability for the specific request
* Escalation protocols are triggered
**Parameters**:
* `reason` (string, optional): The reason for the transfer
* `transfer_number` (string, required): The phone number to transfer to (must match configured numbers)
* `client_message` (string, required): Message read to the client while waiting for transfer
* `agent_message` (string, required): Message for the human operator receiving the call
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "transfer_to_number",
"arguments": "{\"reason\": \"Complex billing issue\", \"transfer_number\": \"+15551234567\", \"client_message\": \"I'm transferring you to a billing specialist who can help with your account.\", \"agent_message\": \"Customer has a complex billing dispute about order #12345 from last month.\"}"
}
}
```
**Implementation**: Configure transfer phone numbers and conditions. Define messages for both customer and receiving human operator. Works with both Twilio and SIP trunking.
Learn more: [Transfer to human tool](/docs/conversational-ai/customization/tools/system-tools/transfer-to-human)
**Purpose**: Allow the agent to pause and wait for user input without speaking.
**Trigger conditions**: The LLM should call this tool when:
* User indicates they need a moment ("Give me a second", "Let me think")
* User requests pause in conversation flow
* Agent detects user needs time to process information
**Parameters**:
* `reason` (string, optional): Free-form reason explaining why the pause is needed
**Function call format**:
```json
{
"type": "function",
"function": {
"name": "skip_turn",
"arguments": "{\"reason\": \"User requested time to think\"}"
}
}
```
**Implementation**: No additional configuration needed. The tool simply signals the agent to remain silent until the user speaks again.
Learn more: [Skip turn tool](/docs/conversational-ai/customization/tools/system-tools/skip-turn)
### Example Request with System Tools
When system tools are configured, your custom LLM will receive requests that include the tools in the standard OpenAI format:
```json
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant. You have access to system tools for managing conversations."
},
{
"role": "user",
"content": "I think we're done here, thanks for your help!"
}
],
"model": "your-custom-model",
"temperature": 0.7,
"max_tokens": 1000,
"stream": true,
"tools": [
{
"type": "function",
"function": {
"name": "end_call",
"description": "Call this function to end the current conversation when the main task has been completed...",
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "The reason for the tool call."
},
"message": {
"type": "string",
"description": "A farewell message to send to the user along right before ending the call."
}
},
"required": ["reason"]
}
}
},
{
"type": "function",
"function": {
"name": "language_detection",
"description": "Change the conversation language when the user expresses a language preference explicitly...",
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "The reason for the tool call."
},
"language": {
"type": "string",
"description": "The language to switch to. Must be one of language codes in tool description."
}
},
"required": ["reason", "language"]
}
}
},
{
"type": "function",
"function": {
"name": "skip_turn",
"description": "Skip a turn when the user explicitly indicates they need a moment to think...",
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Optional free-form reason explaining why the pause is needed."
}
},
"required": []
}
}
}
]
}
```
Your custom LLM must support function calling to use system tools. Ensure your model can generate
proper function call responses in OpenAI format.
# Additional Features
You may pass additional parameters to your custom LLM implementation.
Create an object containing your custom parameters:
```python
from elevenlabs.conversational_ai.conversation import Conversation, ConversationConfig
extra_body_for_convai = {
"UUID": "123e4567-e89b-12d3-a456-426614174000",
"parameter-1": "value-1",
"parameter-2": "value-2",
}
config = ConversationConfig(
extra_body=extra_body_for_convai,
)
```
Modify your custom LLM code to handle the additional parameters:
```python
import json
import os
import fastapi
from fastapi.responses import StreamingResponse
from fastapi import Request
from openai import AsyncOpenAI
import uvicorn
import logging
from dotenv import load_dotenv
from pydantic import BaseModel
from typing import List, Optional
# Load environment variables from .env file
load_dotenv()
# Retrieve API key from environment
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
raise ValueError("OPENAI_API_KEY not found in environment variables")
app = fastapi.FastAPI()
oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
class Message(BaseModel):
role: str
content: str
class ChatCompletionRequest(BaseModel):
messages: List[Message]
model: str
temperature: Optional[float] = 0.7
max_tokens: Optional[int] = None
stream: Optional[bool] = False
user_id: Optional[str] = None
elevenlabs_extra_body: Optional[dict] = None
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
oai_request = request.dict(exclude_none=True)
print(oai_request)
if "user_id" in oai_request:
oai_request["user"] = oai_request.pop("user_id")
if "elevenlabs_extra_body" in oai_request:
oai_request.pop("elevenlabs_extra_body")
chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
async def event_stream():
try:
async for chunk in chat_completion_coroutine:
chunk_dict = chunk.model_dump()
yield f"data: {json.dumps(chunk_dict)}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logging.error("An error occurred: %s", str(e))
yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8013)
```
### Example Request
With this custom message setup, your LLM will receive requests in this format:
```json
{
"messages": [
{
"role": "system",
"content": "\n "
},
{
"role": "assistant",
"content": "Hey I'm currently unavailable."
},
{
"role": "user",
"content": "Hey, who are you?"
}
],
"model": "gpt-4o",
"temperature": 0.5,
"max_tokens": 5000,
"stream": true,
"elevenlabs_extra_body": {
"UUID": "123e4567-e89b-12d3-a456-426614174000",
"parameter-1": "value-1",
"parameter-2": "value-2"
}
}
```
# Cloudflare Workers AI
> Connect an agent to a custom LLM on Cloudflare Workers AI.
## Overview
[Cloudflare's Workers AI platform](https://developers.cloudflare.com/workers-ai/) lets you run machine learning models, powered by serverless GPUs, on Cloudflare's global network, even on the free plan!
Workers AI comes with a curated set of [popular open-source models](https://developers.cloudflare.com/workers-ai/models/) that enable you to do tasks such as image classification, text generation, object detection and more.
## Choosing a model
To make use of the full power of ElevenLabs Conversational AI you need to use a model that supports [function calling](https://developers.cloudflare.com/workers-ai/function-calling/#what-models-support-function-calling).
When browsing the [model catalog](https://developers.cloudflare.com/workers-ai/models/), look for models with the function calling property beside it.
Cloudflare Workers AI provides access to
[DeepSeek-R1-Distill-Qwen-32B](https://developers.cloudflare.com/workers-ai/models/deepseek-r1-distill-qwen-32b/),
a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various
benchmarks, achieving new state-of-the-art results for dense models.
## Set up DeepSeek R1 on Cloudflare Workers AI
Navigate to [dash.cloudflare.com](https://dash.cloudflare.com) and create or sign in to your account. In the navigation, select AI > Workers AI, and then click on the "Use REST API" widget.

Once you have your API key, you can try it out immediately with a curl request. Cloudflare provides an OpenAI-compatible API endpoint making this very convenient. At this point make a note of the model and the full endpoint — including the account ID. For example: `https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}c/ai/v1/`.
```bash
curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1/chat/completions \
-X POST \
-H "Authorization: Bearer {API_TOKEN}" \
-d '{
"model": "@cf/deepseek-ai/deepseek-r1-distill-qwen-32b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "How many Rs in the word Strawberry?"}
],
"stream": false
}'
```
Navigate to your [AI Agent](https://elevenlabs.io/app/conversational-ai), scroll down to the "Secrets" section and select "Add Secret". After adding the secret, make sure to hit "Save" to make the secret available to your agent.

Choose "Custom LLM" from the dropdown menu.

For the Server URL, specify Cloudflare's OpenAI-compatible API endpoint: `https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1/`. For the Model ID, specify `@cf/deepseek-ai/deepseek-r1-distill-qwen-32b` as discussed above, and select your API key from the dropdown menu.

Now you can go ahead and click "Test AI Agent" to chat with your custom DeepSeek R1 model.
# Groq Cloud
> Connect an agent to a custom LLM on Groq Cloud.
## Overview
[Groq Cloud](https://console.groq.com/) provides easy access to fast AI inference, giving you OpenAI-compatible API endpoints in a matter of clicks.
Use leading [Openly-available Models](https://console.groq.com/docs/models) like Llama, Mixtral, and Gemma as the brain for your ElevenLabs Conversational AI agents in a few easy steps.
## Choosing a model
To make use of the full power of ElevenLabs Conversational AI you need to use a model that supports tool use and structured outputs. Groq recommends the following Llama-3.3 models their versatility and performance:
* meta-llama/llama-4-scout-17b-16e-instruct (10M token context window) and support for 12 languages (Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese)
* llama-3.3-70b-versatile (128k context window | 32,768 max output tokens)
* llama-3.1-8b-instant (128k context window | 8,192 max output tokens)
With this in mind, it's recommended to use `meta-llama/llama-4-scout-17b-16e-instruct` for your ElevenLabs Conversational AI agent.
## Set up Llama 3.3 on Groq Cloud
Navigate to [console.groq.com/keys](https://console.groq.com/keys) and create a new API key.

Once you have your API key, you can test it by running the following curl command:
```bash
curl https://api.groq.com/openai/v1/chat/completions -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GROQ_API_KEY" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [{
"role": "user",
"content": "Hello, how are you?"
}]
}'
```
Navigate to your [AI Agent](https://elevenlabs.io/app/conversational-ai), scroll down to the "Secrets" section and select "Add Secret". After adding the secret, make sure to hit "Save" to make the secret available to your agent.

Choose "Custom LLM" from the dropdown menu.

For the Server URL, specify Groq's OpenAI-compatible API endpoint: `https://api.groq.com/openai/v1`. For the Model ID, specify `meta-llama/llama-4-scout-17b-16e-instruct` as discussed above, and select your API key from the dropdown menu.

Now you can go ahead and click "Test AI Agent" to chat with your custom Llama 3.3 model.
# SambaNova Cloud
> Connect an agent to a custom LLM on SambaNova Cloud.
## Overview
[SambaNova Cloud](http://cloud.sambanova.ai?utm_source=elevenlabs\&utm_medium=external\&utm_campaign=cloud_signup) is the fastest provider of the best [open source models](https://docs.sambanova.ai/cloud/docs/get-started/supported-models), including DeepSeek R1, DeepSeek V3, Llama 4 Maverick and others. Through an
OpenAI-compatible API endpoint, you can set up your Conversational AI agent on ElevenLabs in a just few minutes.
Watch this [video](https://www.youtube.com/watch?v=46W96JcE_p8) for a walkthrough and demo of how you can configure your ElevenLabs Conversational AI agent to leverage SambaNova's blazing-fast LLMs!
## Choosing a model
To make use of the full power of ElevenLabs Conversational AI you need to use a model that supports tool use and structured outputs. SambaNova recommends the following models for their accuracy and performance:
* `DeepSeek-V3-0324` (671B model)
* `Meta-Llama-3.3-70B-Instruct`
* `Llama-4-Maverick-17B-128E-Instruct`
* `Qwen3-32B`
For up-to-date information on model-specific context windows, please refer to [this](https://docs.sambanova.ai/cloud/docs/get-started/supported-models) page.
Note that `Meta-Llama-3.3-70B-Instruct` is SambaNova's most battle-tested model. If any model is causing issues, you may report it on SambaNova's [Community page](https://community.sambanova.ai).
## Configuring your ElevenLabs agent with a SambaNova LLM
Navigate to [cloud.sambanova.ai/apis](https://cloud.sambanova.ai/apis?utm_source=elevenlabs\&utm_medium=external\&utm_campaign=cloud_signup) and create a new API key.

Once you have your API key, you can test it by running the following curl command:
```bash
curl -H "Authorization: Bearer " \
-H "Content-Type: application/json" \
-d '{
"stream": true,
"model": "DeepSeek-V3-0324",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Hello"
}
]
}' \
-X POST https://api.sambanova.ai/v1/chat/completions
```
Create a new [AI Agent](https://elevenlabs.io/app/conversational-ai/agents) or edit an existing one.
Scroll down to the "Workspace Secrets" section and select "Add Secret". Name the key `SAMBANOVA_API_KEY` and copy the value from the SambaNova Cloud dashboard. Be sure to hit "Save" to make the secret available to your agent.

Choose "Custom LLM" from the dropdown menu.

For the Server URL, specify SambaNova's OpenAI-compatible API endpoint: `https://api.sambanova.ai/v1`. For the Model ID, specify one the model names indicated above (e.g., `Meta-Llama-3.3-70B-Instruct`) and select the `SAMBANOVA_API_KEY` API key from the dropdown menu.

Set the max tokens to 1024 to restrict the agent's output for brevity. Also be sure to include an instruction in the System Prompt for the model to respond in 500 words or less.

Save your changes and click on "Test AI Agent" to chat with your SambaNova-powered agent!
# Together AI
> Connect an agent to a custom LLM on Together AI.
## Overview
[Together AI](https://www.together.ai/) provides an AI Acceleration Cloud, allowing you to train, fine-tune, and run inference on AI models blazing fast, at low cost, and at production scale.
Instantly run [200+ models](https://together.xyz/models) including DeepSeek, Llama3, Mixtral, and Stable Diffusion, optimized for peak latency, throughput, and context length.
## Choosing a model
To make use of the full power of ElevenLabs Conversational AI you need to use a model that supports tool use and structured outputs. Together AI supports function calling for [these models](https://docs.together.ai/docs/function-calling):
* meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
* meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo
* meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo
* meta-llama/Llama-3.3-70B-Instruct-Turbo
* mistralai/Mixtral-8x7B-Instruct-v0.1
* mistralai/Mistral-7B-Instruct-v0.1
With this in mind, it's recommended to use at least `meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo` for your ElevenLabs Conversational AI agent.
## Set up Llama 3.1 on Together AI
Navigate to [api.together.xyz/settings/api-keys](https://api.together.xyz/settings/api-keys) and create a new API key.

Once you have your API key, you can test it by running the following curl command:
```bash
curl https://api.together.xyz/v1/chat/completions -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer " \
-d '{
"model": "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
"messages": [{
"role": "user",
"content": "Hello, how are you?"
}]
}'
```
Navigate to your [AI Agent](https://elevenlabs.io/app/conversational-ai), scroll down to the "Secrets" section and select "Add Secret". After adding the secret, make sure to hit "Save" to make the secret available to your agent.

Choose "Custom LLM" from the dropdown menu.

For the Server URL, specify Together AI's OpenAI-compatible API endpoint: `https://api.together.xyz/v1`. For the Model ID, specify `meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo` as discussed above, and select your API key from the dropdown menu.

Now you can go ahead and click "Test AI Agent" to chat with your custom Llama 3.1 model.
# LLM Cascading
> Learn how Conversational AI ensures reliable LLM responses using a cascading fallback mechanism.
## Overview
Conversational AI employs an LLM cascading mechanism to enhance the reliability and resilience of its text generation capabilities. This system automatically attempts to use backup Large Language Models (LLMs) if the primary configured LLM fails, ensuring a smoother and more consistent user experience.
Failures can include API errors, timeouts, or empty responses from the LLM provider. The cascade logic handles these situations gracefully.
## How it Works
The cascading process follows a defined sequence:
1. **Preferred LLM Attempt:** The system first attempts to generate a response using the LLM selected in the agent's configuration.
2. **Backup LLM Sequence:** If the preferred LLM fails, the system automatically falls back to a predefined sequence of backup LLMs. This sequence is curated based on model performance, speed, and reliability. The current default sequence (subject to change) is:
1. Gemini 2.5 Flash
2. Gemini 2.0 Flash
3. Gemini 2.0 Flash Lite
4. Claude 3.7 Sonnet
5. Claude 3.5 Sonnet v2
6. Claude 3.5 Sonnet v1
7. GPT-4o
8. Gemini 1.5 Pro
9. Gemini 1.5 Flash
3. **HIPAA Compliance:** If the agent operates in a mode requiring strict data privacy (HIPAA compliance / zero data retention), the backup list is filtered to include only compliant models from the sequence above.
4. **Retries:** The system retries the generation process multiple times (at least 3 attempts) across the sequence of available LLMs (preferred + backups). If a backup LLM also fails, it proceeds to the next one in the sequence. If it runs out of unique backup LLMs within the retry limit, it may retry previously failed backup models.
5. **Lazy Initialization:** Backup LLM connections are initialized only when needed, optimizing resource usage.
The specific list and order of backup LLMs are managed internally by ElevenLabs and optimized for
performance and availability. The sequence listed above represents the current default but may be
updated without notice.
## Custom LLMs
When you configure a [Custom LLM](/docs/conversational-ai/customization/llm/custom-llm), the standard cascading logic to *other* models is bypassed. The system will attempt to use your specified Custom LLM.
If your Custom LLM fails, the system will retry the request with the *same* Custom LLM multiple times (matching the standard minimum retry count) before considering the request failed. It will not fall back to ElevenLabs-hosted models, ensuring your specific configuration is respected.
## Benefits
* **Increased Reliability:** Reduces the impact of temporary issues with a specific LLM provider.
* **Higher Availability:** Increases the likelihood of successfully generating a response even during partial LLM outages.
* **Seamless Operation:** The fallback mechanism is automatic and transparent to the end-user.
## Configuration
LLM cascading is an automatic background process. The only configuration required is selecting your **Preferred LLM** in the agent's settings. The system handles the rest to ensure robust performance.
# Widget customization
> Learn how to customize the widget appearance to match your brand, and personalize the agent's behavior from html.
**Widgets** enable instant integration of Conversational AI into any website. You can either customize your widget through the UI or through our type-safe [Conversational AI SDKs](/docs/conversational-ai/libraries) for complete control over styling and behavior. The SDK overrides take priority over UI customization.
Our widget is multimodal and able to process both text and audio.
VIDEO
## Modality configuration
The widget supports flexible input modes to match your use case. Configure these options in the [dashboard](https://elevenlabs.io/app/conversational-ai/dashboard) **Widget** tab under the **Interface** section.
Multimodality is fully supported in our client SDKs, see more
[here](/docs/conversational-ai/libraries/).

**Available modes:**
* **Voice only** (default): Users interact through speech only
* **Voice + text**: Users can switch between voice and text input during conversations
* **Text only mode**: Conversations start in text mode without voice capabilities when initiated with a text message
The widget defaults to voice-only mode. Enable the text input toggle to allow multimodal
interactions, or enable text-only mode support for purely text-based conversations when initiated
via text.
## Embedding the widget
Widgets currently require public agents with authentication disabled. Ensure this is disabled in
the **Advanced** tab of your agent settings.
Add this code snippet to your website's `` section. Place it in your main `index.html` file for site-wide availability:
```html title="Widget embed code"
```
For enhanced security, define allowed domains in your agent's **Allowlist** (located in the
**Security** tab). This restricts access to specified hosts only.
## Widget attributes
This basic embed code will display the widget with the default configuration defined in the agent's dashboard.
The widget supports various HTML attributes for further customization:
```html
```
```html
```
```html
```
## Runtime configuration
Two more html attributes can be used to customize the agent's behavior at runtime. These two features can be used together, separately, or not at all
### Dynamic variables
Dynamic variables allow you to inject runtime values into your agent's messages, system prompts, and tools.
```html
```
All dynamic variables that the agent requires must be passed in the widget.
See more in our [dynamic variables
guide](/docs/conversational-ai/customization/personalization/dynamic-variables).
### Overrides
Overrides enable complete customization of your agent's behavior at runtime:
```html
```
Overrides can be enabled for specific fields, and are entirely optional.
See more in our [overrides
guide](/docs/conversational-ai/customization/personalization/overrides).
## Visual customization
Customize the widget's appearance, text content, language selection, and more in the [dashboard](https://elevenlabs.io/app/conversational-ai/dashboard) **Widget** tab.

Customize the widget colors and shapes to match your brand identity.

Gather user insights to improve agent performance. This can be used to fine-tune your agent's knowledge-base & system prompt.

**Collection modes**
* None : Disable feedback collection entirely.
* During conversation : Support real-time feedback during conversations. Additionnal metadata such as the agent response that prompted the feedback will be collected to help further identify gaps.
* After conversation : Display a single feedback prompt after the conversation.
Send feedback programmatically via the [API](/docs/conversational-ai/api-reference/conversations/post-conversation-feedback) when using custom SDK implementations.
Configure the voice orb or provide your own avatar.

**Available options**
* Orb : Choose two gradient colors (e.g., #6DB035 & #F5CABB).
* Link/image : Use a custom avatar image.
Customize all displayed widget text elements, for example to modify button labels.

Display custom terms and conditions before the conversation.

**Available options**
* Terms content : Use Markdown to format your policy text.
* Local storage key : A key (e.g., "terms\_accepted") to avoid prompting returning users.
**Usage**
The terms are displayed to users in a modal before starting the call:

The terms can be written in Markdown, allowing you to:
* Add links to external policies
* Format text with headers and lists
* Include emphasis and styling
For more help with Markdown, see the [CommonMark help guide](https://commonmark.org/help/).
Once accepted, the status is stored locally and the user won't be prompted again on subsequent
visits.
Enable multi-language support in the widget.

To enable language selection, you must first [add additional
languages](/docs/conversational-ai/customization/language) to your agent.
Allow users to mute their audio in the widget.

To add the mute button please enable this in the `interface` card of the agent's `widget`
settings.

Customize your public widget landing page (shareable link).

**Available options**
* Description : Provide a short paragraph explaining the purpose of the call.
***
## Advanced implementation
For more advanced customization, you should use the type-safe [Conversational AI
SDKs](/docs/conversational-ai/libraries) with a Next.js, React, or Python application.
### Client Tools
Client tools allow you to extend the functionality of the widget by adding event listeners. This enables the widget to perform actions such as:
* Redirecting the user to a specific page
* Sending an email to your support team
* Redirecting the user to an external URL
To see examples of these tools in action, start a call with the agent in the bottom right corner of this page. The [source code is available on GitHub](https://github.com/elevenlabs/elevenlabs-docs/blob/main/fern/assets/scripts/widget.js) for reference.
#### Creating a Client Tool
To create your first client tool, follow the [client tools guide](/docs/conversational-ai/customization/tools/client-tools).

#### Example Implementation
Below is an example of how to handle the `redirectToExternalURL` tool triggered by the widget in your JavaScript code:
```javascript title="index.js"
document.addEventListener('DOMContentLoaded', () => {
const widget = document.querySelector('elevenlabs-convai');
if (widget) {
// Listen for the widget's "call" event to trigger client-side tools
widget.addEventListener('elevenlabs-convai:call', (event) => {
event.detail.config.clientTools = {
// Note: To use this example, the client tool called "redirectToExternalURL" (case-sensitive) must have been created with the configuration defined above.
redirectToExternalURL: ({ url }) => {
window.open(url, '_blank', 'noopener,noreferrer');
},
};
});
}
});
```
Explore our type-safe [SDKs](/docs/conversational-ai/libraries) for React, Next.js, and Python
implementations.
# Conversation flow
> Configure how your assistant handles timeouts and interruptions during conversations.
## Overview
Conversation flow settings determine how your assistant handles periods of user silence and interruptions during speech. These settings help create more natural conversations and can be customized based on your use case.
Configure how long your assistant waits during periods of silence
Control whether users can interrupt your assistant while speaking
## Timeouts
Timeout handling determines how long your assistant will wait during periods of user silence before prompting for a response.
### Configuration
Timeout settings can be configured in the agent's **Advanced** tab under **Turn Timeout**.
The timeout duration is specified in seconds and determines how long the assistant will wait in silence before prompting the user. Turn timeouts must be between 1 and 30 seconds.
#### Example Timeout Settings

Choose an appropriate timeout duration based on your use case. Shorter timeouts create more
responsive conversations but may interrupt users who need more time to respond, leading to a less
natural conversation.
### Best practices for timeouts
* Set shorter timeouts (5-10 seconds) for casual conversations where quick back-and-forth is expected
* Use longer timeouts (10-30 seconds) when users may need more time to think or formulate complex responses
* Consider your user context - customer service may benefit from shorter timeouts while technical support may need longer ones
## Interruptions
Interruption handling determines whether users can interrupt your assistant while it's speaking.
### Configuration
Interruption settings can be configured in the agent's **Advanced** tab under **Client Events**.
To enable interruptions, make sure interruption is a selected client event.
#### Interruptions Enabled

#### Interruptions Disabled

Disable interruptions when the complete delivery of information is crucial, such as legal
disclaimers or safety instructions.
### Best practices for interruptions
* Enable interruptions for natural conversational flows where back-and-forth dialogue is expected
* Disable interruptions when message completion is critical (e.g., terms and conditions, safety information)
* Consider your use case context - customer service may benefit from interruptions while information delivery may not
## Recommended configurations
* Shorter timeouts (5-10 seconds) for responsive interactions - Enable interruptions to allow
customers to interject with questions
* Longer timeouts (15-30 seconds) to allow for complex responses - Disable interruptions to
ensure full delivery of legal information
* Longer timeouts (10-30 seconds) to allow time to think and formulate responses - Enable
interruptions to allow students to interject with questions
# Authentication
> Learn how to secure access to your conversational AI agents
## Overview
When building conversational AI agents, you may need to restrict access to certain agents or conversations. ElevenLabs provides multiple authentication mechanisms to ensure only authorized users can interact with your agents.
## Authentication methods
ElevenLabs offers two primary methods to secure your conversational AI agents:
Generate temporary authenticated URLs for secure client-side connections without exposing API
keys.
Restrict access to specific domains or hostnames that can connect to your agent.
## Using signed URLs
Signed URLs are the recommended approach for client-side applications. This method allows you to authenticate users without exposing your API key.
The guides below uses the [JS client](https://www.npmjs.com/package/@elevenlabs/client) and
[Python SDK](https://github.com/elevenlabs/elevenlabs-python/).
### How signed URLs work
1. Your server requests a signed URL from ElevenLabs using your API key.
2. ElevenLabs generates a temporary token and returns a signed WebSocket URL.
3. Your client application uses this signed URL to establish a WebSocket connection.
4. The signed URL expires after 15 minutes.
Never expose your ElevenLabs API key client-side.
### Generate a signed URL via the API
To obtain a signed URL, make a request to the `get_signed_url` [endpoint](/docs/conversational-ai/api-reference/conversations/get-signed-url) with your agent ID:
```python
# Server-side code using the Python SDK
from elevenlabs.client import ElevenLabs
async def get_signed_url():
try:
elevenlabs = ElevenLabs(api_key="your-api-key")
response = await elevenlabs.conversational_ai.conversations.get_signed_url(agent_id="your-agent-id")
return response.signed_url
except Exception as error:
print(f"Error getting signed URL: {error}")
raise
```
```javascript
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
// Server-side code using the JavaScript SDK
const elevenlabs = new ElevenLabsClient({ apiKey: 'your-api-key' });
async function getSignedUrl() {
try {
const response = await elevenlabs.conversationalAi.conversations.getSignedUrl({
agentId: 'your-agent-id',
});
return response.signed_url;
} catch (error) {
console.error('Error getting signed URL:', error);
throw error;
}
}
```
```bash
curl -X GET "https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=your-agent-id" \
-H "xi-api-key: your-api-key"
```
The curl response has the following format:
```json
{
"signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=your-agent-id&conversation_signature=your-token"
}
```
### Connecting to your agent using a signed URL
Retrieve the server generated signed URL from the client and use the signed URL to connect to the websocket.
```python
# Client-side code using the Python SDK
from elevenlabs.conversational_ai.conversation import (
Conversation,
AudioInterface,
ClientTools,
ConversationInitiationData
)
import os
from elevenlabs.client import ElevenLabs
api_key = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(api_key=api_key)
conversation = Conversation(
client=elevenlabs,
agent_id=os.getenv("AGENT_ID"),
requires_auth=True,
audio_interface=AudioInterface(),
config=ConversationInitiationData()
)
async def start_conversation():
try:
signed_url = await get_signed_url()
conversation = Conversation(
client=elevenlabs,
url=signed_url,
)
conversation.start_session()
except Exception as error:
print(f"Failed to start conversation: {error}")
```
```javascript
// Client-side code using the JavaScript SDK
import { Conversation } from '@elevenlabs/client';
async function startConversation() {
try {
const signedUrl = await getSignedUrl();
const conversation = await Conversation.startSession({
signedUrl,
});
return conversation;
} catch (error) {
console.error('Failed to start conversation:', error);
throw error;
}
}
```
### Signed URL expiration
Signed URLs are valid for 15 minutes. The conversation session can last longer, but the conversation must be initiated within the 15 minute window.
## Using allowlists
Allowlists provide a way to restrict access to your conversational AI agents based on the origin domain. This ensures that only requests from approved domains can connect to your agent.
### How allowlists work
1. You configure a list of approved hostnames for your agent.
2. When a client attempts to connect, ElevenLabs checks if the request's origin matches an allowed hostname.
3. If the origin is on the allowlist, the connection is permitted; otherwise, it's rejected.
### Configuring allowlists
Allowlists are configured as part of your agent's authentication settings. You can specify up to 10 unique hostnames that are allowed to connect to your agent.
### Example: setting up an allowlist
```python
from elevenlabs.client import ElevenLabs
import os
from elevenlabs.types import *
api_key = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(api_key=api_key)
agent = elevenlabs.conversational_ai.agents.create(
conversation_config=ConversationalConfig(
agent=AgentConfig(
first_message="Hi. I'm an authenticated agent.",
)
),
platform_settings=AgentPlatformSettingsRequestModel(
auth=AuthSettings(
enable_auth=False,
allowlist=[
AllowlistItem(hostname="example.com"),
AllowlistItem(hostname="app.example.com"),
AllowlistItem(hostname="localhost:3000")
]
)
)
)
```
```javascript
async function createAuthenticatedAgent(client) {
try {
const agent = await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
firstMessage: "Hi. I'm an authenticated agent.",
},
},
platformSettings: {
auth: {
enableAuth: false,
allowlist: [
{ hostname: 'example.com' },
{ hostname: 'app.example.com' },
{ hostname: 'localhost:3000' },
],
},
},
});
return agent;
} catch (error) {
console.error('Error creating agent:', error);
throw error;
}
}
```
## Combining authentication methods
For maximum security, you can combine both authentication methods:
1. Use `enable_auth` to require signed URLs.
2. Configure an allowlist to restrict which domains can request those signed URLs.
This creates a two-layer authentication system where clients must:
* Connect from an approved domain
* Possess a valid signed URL
```python
from elevenlabs.client import ElevenLabs
import os
from elevenlabs.types import *
api_key = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(api_key=api_key)
agent = elevenlabs.conversational_ai.agents.create(
conversation_config=ConversationalConfig(
agent=AgentConfig(
first_message="Hi. I'm an authenticated agent that can only be called from certain domains.",
)
),
platform_settings=AgentPlatformSettingsRequestModel(
auth=AuthSettings(
enable_auth=True,
allowlist=[
AllowlistItem(hostname="example.com"),
AllowlistItem(hostname="app.example.com"),
AllowlistItem(hostname="localhost:3000")
]
)
)
```
```javascript
async function createAuthenticatedAgent(elevenlabs) {
try {
const agent = await elevenlabs.conversationalAi.agents.create({
conversationConfig: {
agent: {
firstMessage: "Hi. I'm an authenticated agent.",
},
},
platformSettings: {
auth: {
enableAuth: true,
allowlist: [
{ hostname: 'example.com' },
{ hostname: 'app.example.com' },
{ hostname: 'localhost:3000' },
],
},
},
});
return agent;
} catch (error) {
console.error('Error creating agent:', error);
throw error;
}
}
```
## FAQ
This is possible but we recommend generating a new signed URL for each user session.
If the signed URL expires (after 15 minutes), any WebSocket connection created with that signed
url will **not** be closed, but trying to create a new connection with that signed URL will
fail.
The signed URL mechanism only verifies that the request came from an authorized source. To
restrict access to specific users, implement user authentication in your application before
requesting the signed URL.
There is no specific limit on the number of signed URLs you can generate.
Allowlists perform exact matching on hostnames. If you want to allow both a domain and its
subdomains, you need to add each one separately (e.g., "example.com" and "app.example.com").
No, you can use either signed URLs or allowlists independently based on your security
requirements. For highest security, we recommend using both.
Beyond signed URLs and allowlists, consider implementing:
* User authentication before requesting signed URLs
* Rate limiting on API requests
* Usage monitoring for suspicious patterns
* Proper error handling for auth failures
# Agent Analysis
> Analyze conversation quality and extract structured data from customer interactions.
Agent analysis provides powerful tools to systematically evaluate conversation performance and extract valuable information from customer interactions. These LLM-powered features help you measure agent effectiveness and gather actionable business insights.
Define custom criteria to assess conversation quality, goal achievement, and customer
satisfaction.
Extract structured information from conversations such as contact details and business data.
## Overview
The Conversational AI platform provides two complementary analysis capabilities:
* **Success Evaluation**: Define custom metrics to assess conversation quality, goal achievement, and customer satisfaction
* **Data Collection**: Extract specific data points from conversations such as contact information, issue details, or any structured information
Both features process conversation transcripts using advanced language models to provide actionable insights that improve agent performance and business outcomes.
## Key Benefits
Track conversation success rates, customer satisfaction, and goal completion across all interactions to identify improvement opportunities.
Capture valuable business information without manual processing, reducing operational overhead and
improving data accuracy.
Ensure agents follow required procedures and maintain consistent service quality through
systematic evaluation.
Gather structured insights about customer preferences, behavior patterns, and interaction outcomes for strategic decision-making.
## Integration with Platform Features
Agent analysis integrates seamlessly with other Conversational AI capabilities:
* **[Post-call Webhooks](/docs/conversational-ai/workflows/post-call-webhooks)**: Receive evaluation results and extracted data via webhooks for integration with external systems
* **[Analytics Dashboard](/docs/conversational-ai/convai-dashboard)**: View aggregated performance metrics and trends across all conversations
* **[Agent Transfer](/docs/conversational-ai/customization/tools/agent-transfer)**: Use evaluation criteria to determine when conversations should be escalated
## Getting Started
Determine whether you need success evaluation, data collection, or both based on your business objectives.
Set up [Success
Evaluation](/docs/conversational-ai/customization/agent-analysis/success-evaluation) to measure
conversation quality and goal achievement.
Configure [Data Collection](/docs/conversational-ai/customization/agent-analysis/data-collection)
to capture structured information from conversations.
Review results regularly and refine your criteria and extraction rules based on performance data.
# Success Evaluation
> Define custom criteria to assess conversation quality, goal achievement, and customer satisfaction.
Success evaluation allows you to define custom goals and success metrics for your conversations. Each criterion is evaluated against the conversation transcript and returns a result of `success`, `failure`, or `unknown`, along with a detailed rationale.
## Overview
Success evaluation uses LLM-powered analysis to assess conversation quality against your specific business objectives. This enables systematic performance measurement and quality assurance across all customer interactions.
### How It Works
Each evaluation criterion analyzes the conversation transcript using a custom prompt and returns:
* **Result**: `success`, `failure`, or `unknown`
* **Rationale**: Detailed explanation of why the result was chosen
### Types of Evaluation Criteria
**Goal prompt criteria** pass the conversation transcript along with a custom prompt to an LLM to verify if a specific goal was met. This is the most flexible type of evaluation and can be used for complex business logic.
**Examples:**
* Customer satisfaction assessment
* Issue resolution verification
* Compliance checking
* Custom business rule validation
## Configuration
Navigate to your agent's dashboard and select the **Analysis** tab to configure evaluation criteria.

Click **Add criteria** to create a new evaluation criterion.
Define your criterion with:
* **Identifier**: A unique name for the criterion (e.g., `user_was_not_upset`)
* **Description**: Detailed prompt describing what should be evaluated

After conversations complete, evaluation results appear in your conversation history dashboard. Each conversation shows the evaluation outcome and rationale for every configured criterion.

## Best Practices
* Be specific about what constitutes success vs. failure
* Include edge cases and examples in your prompt
* Use clear, measurable criteria when possible
* Test your prompts with various conversation scenarios
* **Customer satisfaction**: "Mark as successful if the customer expresses satisfaction or their
issue was resolved" - **Goal completion**: "Mark as successful if the customer completed the
requested action (booking, purchase, etc.)" - **Compliance**: "Mark as successful if the agent
followed all required compliance procedures" - **Issue resolution**: "Mark as successful if the
customer's technical issue was resolved during the call"
The `unknown` result is returned when the LLM cannot determine success or failure from the transcript. This often happens with:
* Incomplete conversations
* Ambiguous customer responses
* Missing information in the transcript
Monitor `unknown` results to identify areas where your criteria prompts may need refinement.
## Use Cases
Measure issue resolution rates, customer satisfaction, and support quality metrics to improve
service delivery.
Track goal achievement, objection handling, and conversion rates across sales conversations.
Ensure agents follow required procedures and capture necessary consent or disclosure
confirmations.
Identify coaching opportunities and measure improvement in agent performance over time.
## Troubleshooting
* Review your prompt for clarity and specificity
* Test with sample conversations to validate logic
* Consider edge cases in your evaluation criteria
* Check if the transcript contains sufficient information for evaluation
* Ensure your prompts are specific about what information to look for - Consider if conversations
contain enough context for evaluation - Review transcript quality and completeness - Adjust
criteria to handle common edge cases
* Each evaluation criterion adds processing time to conversation analysis
* Complex prompts may take longer to evaluate
* Consider the trade-off between comprehensive analysis and response time
* Monitor your usage to optimize for your specific needs
Success evaluation results are available through [Post-call
Webhooks](/docs/conversational-ai/workflows/post-call-webhooks) for integration with external
systems and analytics platforms.
# Data Collection
> Extract structured information from conversations such as contact details and business data.
Data collection automatically extracts structured information from conversation transcripts using LLM-powered analysis. This enables you to capture valuable data points without manual processing, improving operational efficiency and data accuracy.
## Overview
Data collection analyzes conversation transcripts to identify and extract specific information you define. The extracted data is structured according to your specifications and made available for downstream processing and analysis.
### Supported Data Types
Data collection supports four data types to handle various information formats:
* **String**: Text-based information (names, emails, addresses)
* **Boolean**: True/false values (agreement status, eligibility)
* **Integer**: Whole numbers (quantity, age, ratings)
* **Number**: Decimal numbers (prices, percentages, measurements)
## Configuration
In the **Analysis** tab of your agent settings, navigate to the **Data collection** section.

Click **Add item** to create a new data extraction rule.
Configure each item with:
* **Identifier**: Unique name for the data field (e.g., `email`, `customer_rating`)
* **Data type**: Select from string, boolean, integer, or number
* **Description**: Detailed instructions on how to extract the data from the transcript
The description field is passed to the LLM and should be as specific as possible about what to extract and how to format it.
Extracted data appears in your conversation history, allowing you to review what information was captured from each interaction.

## Best Practices
* Be explicit about the expected format (e.g., "email address in the format [user@domain.com](mailto:user@domain.com)")
* Specify what to do when information is missing or unclear
* Include examples of valid and invalid data
* Mention any validation requirements
**Contact Information:**
* `email`: "Extract the customer's email address in standard format ([user@domain.com](mailto:user@domain.com))"
* `phone_number`: "Extract the customer's phone number including area code"
* `full_name`: "Extract the customer's complete name as provided"
**Business Data:**
* `issue_category`: "Classify the customer's issue into one of: technical, billing, account, or general"
* `satisfaction_rating`: "Extract any numerical satisfaction rating given by the customer (1-10 scale)"
* `order_number`: "Extract any order or reference number mentioned by the customer"
**Behavioral Data:**
* `was_angry`: "Determine if the customer expressed anger or frustration during the call"
* `requested_callback`: "Determine if the customer requested a callback or follow-up"
When the requested data cannot be found or is ambiguous in the transcript, the extraction will return null or empty values. Consider:
* Using conditional logic in your applications to handle missing data
* Creating fallback criteria for incomplete extractions
* Training agents to consistently gather required information
## Data Type Guidelines
Use for text-based information that doesn't fit other types.
**Examples:**
* Customer names
* Email addresses
* Product categories
* Issue descriptions
**Best practices:**
* Specify expected format when relevant
* Include validation requirements
* Consider standardization needs
Use for yes/no, true/false determinations.
**Examples:**
* Customer agreement status
* Eligibility verification
* Feature requests
* Complaint indicators
**Best practices:**
* Clearly define what constitutes true vs. false
* Handle ambiguous responses
* Consider default values for unclear cases
Use for whole number values.
**Examples:**
* Customer age
* Product quantities
* Rating scores
* Number of issues
**Best practices:**
* Specify valid ranges when applicable
* Handle non-numeric responses
* Consider rounding rules if needed
Use for decimal or floating-point values.
**Examples:**
* Monetary amounts
* Percentages
* Measurements
* Calculated scores
**Best practices:**
* Specify precision requirements
* Include currency or unit context
* Handle different number formats
## Use Cases
Extract contact information, qualification criteria, and interest levels from sales conversations.
Gather structured data about customer preferences, feedback, and behavior patterns for strategic
insights.
Capture issue categories, resolution details, and satisfaction scores for operational
improvements.
Extract required disclosures, consents, and regulatory information for audit trails.
## Troubleshooting
* Verify the data exists in the conversation transcript
* Check if your extraction prompt is specific enough
* Ensure the data type matches the expected format
* Consider if the information was communicated clearly during the conversation
* Review extraction prompts for format specifications
* Add validation requirements to prompts
* Consider post-processing for data standardization
* Test with various conversation scenarios
* Each data collection rule adds processing time
* Complex extraction logic may take longer to evaluate
* Monitor extraction accuracy vs. speed requirements
* Optimize prompts for efficiency when possible
Extracted data is available through [Post-call
Webhooks](/docs/conversational-ai/workflows/post-call-webhooks) for integration with CRM systems,
databases, and analytics platforms.
# Privacy
> Manage how your agent handles data storage and privacy.
Privacy settings give you fine-grained control over your data. You can manage both call audio recordings and conversation data retention to meet your compliance and privacy requirements.
Configure how long conversation transcripts and audio recordings are retained.
Control whether call audio recordings are retained.
Enable per-agent zero retention for enhanced data privacy.
## Retention
Retention settings control the duration for which conversation transcripts and audio recordings are stored.
For detailed instructions, see our [Retention](/docs/conversational-ai/customization/privacy/retention) page.
## Audio Saving
Audio Saving settings determine if call audio recordings are stored. Adjust this feature based on your privacy and data retention needs.
For detailed instructions, see our [Audio Saving](/docs/conversational-ai/customization/privacy/audio-saving) page.
## Zero Retention Mode (Per Agent)
For granular control, Zero Retention Mode can be enabled for individual agents, ensuring no PII is logged or stored for their calls.
For detailed instructions, see our [Zero Retention Mode](/docs/conversational-ai/customization/privacy/zero-retention-mode) page.
## Recommended Privacy Configurations
Disable audio saving, enable Zero Retention Mode for agents where possible, and set retention to
0 days for immediate deletion of data.
Enable audio saving for critical interactions while setting a moderate retention period.
Consider ZRM for sensitive agents.
Enable audio saving and configure retention settings to adhere to regulatory requirements such
as GDPR and HIPAA. For HIPAA compliance, we recommend enabling audio saving and setting a
retention period of at least 6 years. For GDPR, retention periods should align with your data
processing purposes. Utilize ZRM for agents handling highly sensitive data if not using global
ZRM.
# Retention
> Control how long your agent retains conversation history and recordings.
**Retention** settings allow you to configure how long your Conversational AI agent stores conversation transcripts and audio recordings. These settings help you comply with data privacy regulations.
## Overview
By default, ElevenLabs retains conversation data for 2 years. You can modify this period to:
* Any number of days (e.g., 30, 90, 365)
* Unlimited retention by setting the value to -1
* Immediate deletion by setting the value to 0
The retention settings apply separately to:
* **Conversation transcripts**: Text records of all interactions
* **Audio recordings**: Voice recordings from both the user and agent
For GDPR compliance, we recommend setting retention periods that align with your data processing
purposes. For HIPAA compliance, retain records for a minimum of 6 years.
## Modifying retention settings
### Prerequisites
* An [ElevenLabs account](https://elevenlabs.io)
* A configured ElevenLabs Conversational Agent ([create one here](/docs/conversational-ai/quickstart))
Follow these steps to update your retention settings:
Navigate to your agent's settings and select the "Advanced" tab. The retention settings are located in the "Data Retention" section.

1. Enter the desired retention period in days
2. Choose whether to apply changes to existing data
3. Click "Save" to confirm changes

When modifying retention settings, you'll have the option to apply the new retention period to existing conversation data or only to new conversations going forward.
Reducing the retention period may result in immediate deletion of data older than the new
retention period if you choose to apply changes to existing data.
# Audio saving
> Control whether call audio recordings are retained.
**Audio Saving** settings allow you to choose whether recordings of your calls are retained in your call history, on a per-agent basis. This control gives you flexibility over data storage and privacy.
## Overview
By default, audio recordings are enabled. You can modify this setting to:
* **Enable audio saving**: Save call audio for later review.
* **Disable audio saving**: Omit audio recordings from your call history.
Disabling audio saving enhances privacy but limits the ability to review calls. However,
transcripts can still be viewed. To modify transcript retention settings, please refer to the
[retention](/docs/conversational-ai/customization/privacy/retention) documentation.
## Modifying Audio Saving Settings
### Prerequisites
* A configured [ElevenLabs Conversational Agent](/docs/conversational-ai/quickstart)
Follow these steps to update your audio saving preference:
Find your agent in the Conversational AI agents
[page](https://elevenlabs.io/app/conversational-ai/agents) and select the "Advanced" tab. The
audio saving control is located in the "Privacy Settings" section.

Toggle the control to enable or disable audio saving and click save to confirm your selection.
When audio saving is enabled, calls in the call history allow you to review the audio.

When audio saving is disabled, calls in the call history do not include audio.

Disabling audio saving will prevent new call audio recordings from being stored. Existing
recordings will remain until deleted via [retention
settings](/docs/conversational-ai/customization/privacy/retention).
# Zero Retention Mode (per-agent)
> Learn how to enable Zero Retention Mode for individual agents to enhance data privacy.
## Overview
Zero Retention Mode (ZRM) enhances data privacy by ensuring that no Personally Identifiable Information (PII) is logged during or stored after a call. This feature can be enabled on a per-agent basis for workspaces that do not have ZRM enforced globally. For workspaces with global ZRM enabled, all agents will automatically operate in Zero Retention Mode.
When ZRM is active for an agent:
* No call recordings will be stored.
* No transcripts or call metadata containing PII will be logged or stored by our systems post-call.
For more information about setting your workspace to have Zero Retention Mode across all eligible ElevenLabs products, see our [Zero Retention Mode](/docs/resources/zero-retention-mode) documentation.
For workspaces where Zero Retention Mode is enforced globally, this setting will be automatically
enabled for all agents and cannot be disabled on a per-agent basis.
To retrieve information about calls made with ZRM-enabled agents, you must use [post-call webhooks](/docs/conversational-ai/workflows/post-call-webhooks).
Enabling Zero Retention Mode may impact ElevenLabs' ability to debug call-related issues for the
specific agent, as limited logs or call data will be available for review.
## How to Enable ZRM per Agent
For workspaces not operating under global Zero Retention Mode, you can enable ZRM for individual agents:
1. Navigate to your agent's settings.
2. Go to the **Privacy** settings block.
3. Select the **Advanced** tab.
4. Toggle the "Zero Retention Mode" option to enabled.
# Model Context Protocol
> Connect your ElevenLabs conversational agents to external tools and data sources using the Model Context Protocol.
You are responsible for the security, compliance, and behavior of any third-party MCP server you
integrate with your ElevenLabs conversational agents. ElevenLabs provides the platform for
integration but does not manage, endorse, or secure external MCP servers.
## Overview
The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) is an open standard that defines how applications provide context to Large Language Models (LLMs). Think of MCP as a universal connector that enables AI models to seamlessly interact with diverse data sources and tools. By integrating servers that implement MCP, you can significantly extend the capabilities of your ElevenLabs conversational agents.
VIDEO
MCP support is not currently available for users on Zero Retention Mode or those requiring HIPAA
compliance.
ElevenLabs allows you to connect your conversational agents to external MCP servers. This enables your agents to:
* Access and process information from various data sources via the MCP server
* Utilize specialized tools and functionalities exposed by the MCP server
* Create more dynamic, knowledgeable, and interactive conversational experiences
## Getting started
ElevenLabs currently supports SSE transport MCP servers only. Streamable HTTP servers will be
supported soon.
1. Retrieve the URL of your MCP server. In this example, we'll use [Zapier MCP](https://zapier.com/mcp), which lets you connect Conversational AI to hundreds of tools and services.
2. Navigate to the [MCP server integrations dashboard](https://elevenlabs.io/app/conversational-ai/integrations) and click "Add Custom MCP Server".

3. Configure the MCP server with the following details:
* **Name**: The name of the MCP server (e.g., "Zapier MCP Server")
* **Description**: A description of what the MCP server can do (e.g., "An MCP server with access to Zapier's tools and services")
* **Server URL**: The URL of the MCP server. In some cases this contains a secret key, treat it like a password and store it securely as a workspace secret.
* **Secret Token (Optional)**: If the MCP server requires a secret token (Authorization header), enter it here.
* **HTTP Headers (Optional)**: If the MCP server requires additional HTTP headers, enter them here.
4. Click "Add Integration" to save the integration and test the connection to list available tools.

5. The MCP server is now available to add to your agents. Note that MCP servers can currently only be added to agents with `Security > Enable authentication` enabled.

## Tool approval modes
ElevenLabs provides flexible approval controls to manage how agents request permission to use tools from MCP servers. You can configure approval settings at both the MCP server level and individual tool level for maximum security control.

### Available approval modes
* **Always Ask (Recommended)**: Maximum security. The agent will request your permission before each tool use.
* **Fine-Grained Tool Approval**: Disable and pre-select tools which can run automatically and those requiring approval.
* **No Approval**: The assistant can use any tool without approval.
### Fine-grained tool control
The Fine-Grained Tool Approval mode allows you to configure individual tools with different approval requirements, giving you precise control over which tools can run automatically and which require explicit permission.

For each tool, you can set:
* **Auto-approved**: Tool runs automatically without requiring permission
* **Requires approval**: Tool requires explicit permission before execution
* **Disabled**: Tool is completely disabled and cannot be used
Use Fine-Grained Tool Approval to allow low-risk read-only tools to run automatically while
requiring approval for tools that modify data or perform sensitive operations.
## Key considerations for ElevenLabs integration
* **External servers**: You are responsible for selecting the external MCP servers you wish to integrate. ElevenLabs provides the means to connect to them.
* **Supported features**: ElevenLabs currently supports MCP servers that communicate over SSE (Server-Sent Events) for real-time interactions.
* **Dynamic tools**: The tools and capabilities available from an integrated MCP server are defined by that external server and can change if the server's configuration is updated.
## Security and disclaimer
Integrating external MCP servers can expose your agents and data to third-party services. It is crucial to understand the security implications.
By enabling MCP server integrations, you acknowledge that this may involve data sharing with
third-party services not controlled by ElevenLabs. This could incur additional security risks.
Please ensure you fully understand the implications, vet the security of any MCP server you
integrate, and review our [MCP Integration Security
Guidelines](/docs/conversational-ai/customization/mcp/security) before proceeding.
Refer to our [MCP Integration Security Guidelines](/docs/conversational-ai/customization/mcp/security) for detailed best practices.
## Finding or building MCP servers
* Utilize publicly available MCP servers from trusted providers
* Develop your own MCP server to expose your proprietary data or tools
* Explore the Model Context Protocol community and resources for examples and server implementations
### Resources
* [Anthropic's MCP server examples](https://docs.anthropic.com/en/docs/agents-and-tools/remote-mcp-servers#remote-mcp-server-examples) - A list of example servers by Anthropic
* [Awesome Remote MCP Servers](https://github.com/jaw9c/awesome-remote-mcp-servers) - A curated, open-source list of remote MCP servers
* [Remote MCP Server Directory](https://remote-mcp.com/) - A searchable list of Remote MCP servers
# MCP integration security
> Tips for securely integrating third-party Model Context Protocol servers with your ElevenLabs conversational agents.
You are responsible for the security, compliance, and behavior of any third-party MCP server you
integrate with your ElevenLabs conversational agents. ElevenLabs provides the platform for
integration but does not manage, endorse, or secure external MCP servers.
## Overview
Integrating external servers via the Model Context Protocol (MCP) can greatly enhance your ElevenLabs conversational agents. However, this also means connecting to systems outside of ElevenLabs' direct control, which introduces important security considerations. As a user, you are responsible for the security and trustworthiness of any third-party MCP server you choose to integrate.
This guide outlines key security practices to consider when using MCP server integrations within ElevenLabs.
## Tool approval controls
ElevenLabs provides built-in security controls through tool approval modes that help you manage the security risks associated with MCP tool usage. These controls allow you to balance functionality with security based on your specific needs.

### Approval mode options
* **Always Ask (Recommended)**: Provides maximum security by requiring explicit approval for every tool execution. This mode ensures you maintain full control over all MCP tool usage.
* **Fine-Grained Tool Approval**: Allows you to configure approval requirements on a per-tool basis, enabling automatic execution of trusted tools while requiring approval for sensitive operations.
* **No Approval**: Permits unrestricted tool usage without approval prompts. Only use this mode with thoroughly vetted and highly trusted MCP servers.
### Fine-grained security controls
Fine-Grained Tool Approval mode provides the most flexible security configuration, allowing you to classify each tool based on its risk profile:

* **Auto-approved tools**: Suitable for low-risk, read-only operations or tools you completely trust
* **Approval-required tools**: For tools that modify data, access sensitive information, or perform potentially risky operations
* **Disabled tools**: Completely block tools that are unnecessary or pose security risks
Even with approval controls in place, carefully evaluate the trustworthiness of MCP servers and
understand what each tool can access or modify before integration.
## Security tips
### 1. Vet your MCP servers
* **Trusted Sources**: Only integrate MCP servers from sources you trust and have verified. Understand who operates the server and their security posture.
* **Understand Capabilities**: Before integrating, thoroughly review the tools and data resources the MCP server exposes. Be aware of what actions its tools can perform (e.g., accessing files, calling external APIs, modifying data). The MCP `destructiveHint` and `readOnlyHint` annotations can provide clues but should not be solely relied upon for security decisions.
* **Review Server Security**: If possible, review the security practices of the MCP server provider. For MCP servers you develop, ensure you follow general server security best practices and the MCP-specific security guidelines.
### 2. Data sharing and privacy
* **Data Flow**: Be aware that when your agent uses an integrated MCP server, data from the conversation (which may include user inputs) will be sent to that external server.
* **Sensitive Information**: Exercise caution when allowing agents to send Personally Identifiable Information (PII) or other sensitive data to an MCP server. Ensure the server handles such data securely and in compliance with relevant privacy regulations.
* **Purpose Limitation**: Configure your agents and prompts to only share the necessary information with MCP server tools to perform their tasks.
### 3. Credential and connection security
* **Secure Storage**: If an MCP server requires API keys or other secrets for authentication, use any available secret management features within the ElevenLabs platform to store these credentials securely. Avoid hardcoding secrets.
* **HTTPS**: Ensure connections to MCP servers are made over HTTPS to encrypt data in transit.
* **Network Access**: If the MCP server is on a private network, ensure appropriate firewall rules and network ACLs are in place.
### 4. Understand code execution risks
* **Remote Execution**: Tools exposed by an MCP server execute code on that server. While this is the basis of their functionality, it's a critical security consideration. Malicious or poorly secured tools could pose a risk.
* **Input Validation**: Although the MCP server is responsible for validating inputs to its tools, be mindful of the data your agent might send. The LLM should be guided to use tools as intended.
### 5. Add guardrails
* **Prompt Injections**: Connecting to untrusted external MCP servers exposes the risk of prompt injection attacks. Ensure to add thorough guardrails to your system prompt to reduce the risk of exposure to a malicious attack.
* **Tool Approval Configuration**: Use the appropriate approval mode for your security requirements. Start with "Always Ask" for new integrations and only move to less restrictive modes after thorough testing and trust establishment.
### 6. Monitor and review
* **Logging (Server-Side)**: If you control the MCP server, implement comprehensive logging of tool invocations and data access.
* **Regular Review**: Periodically review your integrated MCP servers. Check if their security posture has changed or if new tools have been added that require re-assessment.
* **Approval Patterns**: Monitor tool approval requests to identify unusual patterns that might indicate security issues or misuse.
## Disclaimer
By enabling MCP server integrations, you acknowledge that this may involve data sharing with
third-party services not controlled by ElevenLabs. This could incur additional security risks.
Please ensure you fully understand the implications, vet the security of any MCP server you
integrate, and adhere to these security guidelines before proceeding.
For general information on the Model Context Protocol, refer to official MCP documentation and community resources.
# SIP trunking
> Connect your existing phone system with ElevenLabs conversational AI agents using SIP trunking
## Overview
SIP (Session Initiation Protocol) trunking allows you to connect your existing telephony infrastructure directly to ElevenLabs conversational AI agents.
This integration enables enterprise customers to use their existing phone systems while leveraging ElevenLabs' advanced voice AI capabilities.
With SIP trunking, you can:
* Connect your Private Branch Exchange (PBX) or SIP-enabled phone system to ElevenLabs' voice AI platform
* Route calls to AI agents without changing your existing phone infrastructure
* Handle both inbound and outbound calls
* Leverage encrypted TLS transport and media encryption for enhanced security
## How SIP trunking works
SIP trunking establishes a direct connection between your telephony infrastructure and the ElevenLabs platform:
1. **Inbound calls**: Calls from your SIP trunk are routed to the ElevenLabs platform using your configured SIP INVITE address.
2. **Outbound calls**: Calls initiated by ElevenLabs are routed to your SIP trunk using your configured hostname, enabling your agents to make outgoing calls.
3. **Authentication**: Connection security for the signaling is maintained through either digest authentication (username/password) or Access Control List (ACL) authentication based on the signaling source IP.
4. **Signaling and Media**: The initial call setup (signaling) supports multiple transport protocols including TLS for encrypted communication. Once the call is established, the actual audio data (RTP stream) can be encrypted based on your media encryption settings.
## Requirements
Before setting up SIP trunking, ensure you have:
1. A SIP-compatible PBX or telephony system
2. Phone numbers that you want to connect to ElevenLabs
3. Administrator access to your SIP trunk configuration
4. Appropriate firewall settings to allow SIP traffic
5. **TLS Support**: For enhanced security, ensure your SIP trunk provider supports TLS transport
6. **Audio codec compatibility**: Your system must support **48kHz audio** or be capable of resampling audio on your end, as ElevenLabs' SIP deployment outputs and receives audio at this sample rate. This is independent of any audio format configured on the agent for direct websocket connections.
## Setting up SIP trunking
Go to the [Phone Numbers section](https://elevenlabs.io/app/conversational-ai/phone-numbers) in the ElevenLabs Conversational AI dashboard.
Click on "Import a phone number from SIP trunk" button to open the configuration dialog.
Complete the basic configuration with the following information:
* **Label**: A descriptive name for the phone number
* **Phone Number**: The E.164 formatted phone number to connect (e.g., +15551234567)
Configure the transport protocol and media encryption settings for enhanced security:
* **Transport Type**: Select the transport protocol for SIP signaling:
* **TCP**: Standard TCP transport
* **TLS**: Encrypted TLS transport for enhanced security
* **Media Encryption**: Configure encryption for RTP media streams:
* **Disabled**: No media encryption
* **Allowed**: Permits encrypted media streams
* **Required**: Enforces encrypted media streams
**Security Best Practice**: Use TLS transport with Required media encryption for maximum security. This ensures both signaling and media are encrypted end-to-end.
Configure where ElevenLabs should send calls for your phone number:
* **Address**: Hostname or IP address where the SIP INVITE is sent (e.g., `sip.telnyx.com`). This should be a hostname or IP address only, not a full SIP URI.
* **Transport Type**: Select the transport protocol for SIP signaling:
* **TCP**: Standard TCP transport
* **TLS**: Encrypted TLS transport for enhanced security
* **Media Encryption**: Configure encryption for RTP media streams:
* **Disabled**: No media encryption
* **Allowed**: Permits encrypted media streams
* **Required**: Enforces encrypted media streams
**Security Best Practice**: Use TLS transport with Required media encryption for maximum security. This ensures both signaling and media are encrypted end-to-end.
The **Address** field specifies where ElevenLabs will send outbound calls from your AI agents. Enter only the hostname or IP address without the `sip:` protocol prefix.
If your SIP trunk provider requires specific headers for call routing or identification:
* Click "Add Header" to add custom SIP headers
* Enter the header name and value as required by your provider
* You can add multiple headers as needed
Custom headers are included with all outbound calls and can be used for:
* Call routing and identification
* Billing and tracking purposes
* Provider-specific requirements
Provide digest authentication credentials if required by your SIP trunk provider:
* **SIP Trunk Username**: Username for SIP digest authentication
* **SIP Trunk Password**: Password for SIP digest authentication
If left empty, Access Control List (ACL) authentication will be used, which requires you to allowlist ElevenLabs IP addresses in your provider's settings.
**Authentication Methods**:
* **Digest Authentication**: Uses username/password credentials for secure authentication (recommended)
* **ACL Authentication**: Uses IP address allowlisting for access control
**Digest Authentication is strongly recommended** as it provides better security without relying on IP allowlisting, which can be complex to manage with dynamic IP addresses.
Click "Import" to finalize the configuration.
## Assigning Agents to Phone Numbers
After importing your SIP trunk phone number, you can assign it to a conversational AI agent:
1. Go to the Phone Numbers section in the Conversational AI dashboard
2. Select your imported SIP trunk phone number
3. Click "Assign Agent"
4. Select the agent you want to handle calls to this number
## Troubleshooting
If you're experiencing connection problems:
1. Verify your SIP trunk configuration on both the ElevenLabs side and your provider side
2. Check that your firewall allows SIP signaling traffic on the configured transport protocol and port (5060 for UDP/TCP, 5061 for TLS)
3. Confirm that your address hostname is correctly formatted and accessible
4. Test with and without digest authentication credentials
5. If using TLS transport, ensure your provider's TLS certificates are valid and properly configured
6. Try different transport types (Auto, UDP, TCP) to isolate TLS-specific issues
If calls are failing due to authentication issues:
1. Double-check your SIP trunk username and password if using digest authentication
2. Check your SIP trunk provider's logs for specific authentication error messages
3. Verify that custom headers, if configured, match your provider's requirements
4. Test with simplified configurations (no custom headers) to isolate authentication issues
If you're experiencing issues with TLS transport or media encryption:
1. Verify that your SIP trunk provider supports TLS transport on port 5061
2. Check certificate validity, expiration dates, and trust chains
3. Ensure your provider supports SRTP media encryption if using "Required" media encryption
4. Test with "Allowed" media encryption before using "Required" to isolate encryption issues
5. Try different transport types (TCP, UDP) to isolate TLS-specific problems
6. Contact your SIP trunk provider to confirm TLS and SRTP support
If you're having problems with custom headers:
1. Verify the exact header names and values required by your provider
2. Check for case sensitivity in header names
3. Ensure header values don't contain special characters that need escaping
4. Test without custom headers first, then add them incrementally
5. Review your provider's documentation for supported custom headers
If the call connects but there's no audio or audio only flows one way:
1. Verify that your firewall allows UDP traffic for the RTP media stream (typically ports 10000-60000)
2. Since RTP uses dynamic IP addresses, ensure firewall rules are not restricted to specific static IPs
3. Check for Network Address Translation (NAT) issues that might be blocking the RTP stream
4. If using "Required" media encryption, ensure both endpoints support SRTP
5. Test with "Disabled" media encryption to isolate encryption-related audio issues
If you experience poor audio quality:
1. Ensure your network has sufficient bandwidth (at least 100 Kbps per call) and low latency/jitter for UDP traffic
2. Check for network congestion or packet loss, particularly on the UDP path
3. Verify codec settings match on both ends
4. If using media encryption, ensure both endpoints efficiently handle SRTP processing
5. Test with different media encryption settings to isolate quality issues
## Limitations and Considerations
* Support for multiple concurrent calls depends on your subscription tier
* Call recording and analytics features are available but may require additional configuration
* Outbound calling capabilities may be limited by your SIP trunk provider
* **TLS Support**: Ensure your SIP trunk provider supports TLS 1.2 or higher for encrypted transport
* **Media Encryption**: SRTP support varies by provider; verify compatibility before requiring encryption
* **Audio format**: ElevenLabs' SIP deployment outputs and receives audio at **48kHz sample rate**. This is independent of any audio format configured on the agent for direct websocket connections. Your SIP trunk system must either support this format natively or perform resampling to match your system's requirements
## FAQ
Yes, SIP trunking allows you to connect your existing phone numbers directly to ElevenLabs'
conversational AI platform without porting them.
ElevenLabs is compatible with most standard SIP trunk providers including Twilio, Vonage,
RingCentral, Sinch, Infobip, Telnyx, Exotel, Plivo, Bandwidth, and others that support SIP
protocol standards. TLS transport and SRTP media encryption are supported for enhanced security.
Yes, TLS transport is highly recommended for production environments. It provides encrypted SIP
signaling which enhances security for your calls. Combined with required media encryption, it
ensures comprehensive protection of your communications. Always verify your SIP trunk provider
supports TLS before enabling it.
* **Auto**: Automatically selects the best available transport protocol - **UDP**: Fastest but
unencrypted signaling (good for internal networks) - **TCP**: Reliable but unencrypted signaling -
**TLS**: Encrypted and reliable signaling (recommended for production) For security-critical
applications, always use TLS transport.
Custom SIP headers allow you to include provider-specific information with outbound calls. Common
uses include call routing, billing codes, caller identification, and meeting specific provider
requirements.
The number of concurrent calls depends on your subscription plan. Enterprise plans typically allow
for higher volumes of concurrent calls.
Yes, you can use your existing PBX system's routing rules to direct calls to different phone
numbers, each connected to different ElevenLabs agents.
## Next steps
* [Learn about creating conversational AI agents](/docs/conversational-ai/quickstart)
# Batch calling
> Initiate multiple outbound calls simultaneously with your Conversational AI agents.
VIDEO
When conducting outbound call campaigns, ensure compliance with all relevant regulations,
including the [TCPA (Telephone Consumer Protection Act)](/docs/conversational-ai/legal/tcpa) and
any applicable state laws.
## Overview
Batch Calling enables you to initiate multiple outbound calls simultaneously using your configured Conversational AI agents. This feature is ideal for scenarios such as sending notifications, conducting surveys, or delivering personalized messages to a large list of recipients efficiently.
This feature is available for both phone numbers added via the [native Twilio integration](/docs/conversational-ai/phone-numbers/twilio-integration/native-integration) and [SIP trunking](/docs/conversational-ai/phone-numbers/sip-trunking).
### Key features
* **Upload recipient lists**: Easily upload recipient lists in CSV or XLS format.
* **Dynamic variables**: Personalize calls by including dynamic variables (e.g., `user_name`) in your recipient list as seperate columns.
* **Agent selection**: Choose the specific Conversational AI agent to handle the calls.
* **Scheduling**: Send batches immediately or schedule them for a later time.
* **Real-time monitoring**: Track the progress of your batch calls, including overall status and individual call status.
* **Detailed reporting**: View comprehensive details of completed batch calls, including individual call recipient information.
## Concurrency
When batch calls are initiated, they automatically utilize up to 70% of your plan's concurrency limit.
This leaves 30% of your concurrent capacity available for other conversations, including incoming calls and calls via the widget.
## Requirements
* An ElevenLabs account with an [agent setup](/app/conversational-ai).
* A phone number imported
## Creating a batch call
Follow these steps to create a new batch call:
Access the [Outbound calls interface](https://elevenlabs.io/app/conversational-ai/batch-calling)
from the Conversational AI dashboard
Click on the "Create a batch call" button. This will open the "Create a batch call" page.
* **Batch name**: Enter a descriptive name for your batch call (e.g., "Delivery notice", "Weekly Update Notifications").
* **Phone number**: Select the phone number that will be used to make the outbound calls.
* **Select agent**: Choose the pre-configured Conversational AI agent that will handle the conversations for this batch.
* **Upload File**: Upload your recipient list. Supported file formats are CSV and XLS.
* **Formatting**:
* The `phone_number` column is mandatory in your uploaded file (if your agent has a `phone_number` dynamic variable that also has to be set, please rename it).
* You can include other columns (e.g., `name`, `user_name`) which will be passed as dynamic variables to personalize the calls.
* A template is available for download to ensure correct formatting.
The following column headers are special fields that are used to override an agent's initial
configuration:
* language
* first\_message
* system\_prompt
* voice\_id
The batch call will fail if those fields are passed but are not set to be overridable in the agent's security settings. See more
[here](/docs/conversational-ai/customization/personalization/overrides).
* **Send immediately**: The batch call will start processing as soon as you submit it. -
**Schedule for later**: Choose a specific date and time for the batch call to begin.
* You may "Test call" with a single recipient before submitting the entire batch. - Click "Submit
a Batch Call" to finalize and initiate or schedule the batch.
## Managing and monitoring batch calls
Once a batch call is created, you can monitor its progress and view its details.
### Batch calling overview
The Batch Calling overview page displays a list of all your batch calls.
### Viewing batch call details
Clicking on a specific batch call from the overview page will take you to its detailed view, from where you can view individual conversations.
## API Usage
You can also manage and initiate batch calls programmatically using the ElevenLabs API. This allows for integration into your existing workflows and applications.
* [List batch calls](/docs/api-reference/batch-calling/list) - Retrieve all batch calls in your workspace
* [Create batch call](/docs/api-reference/batch-calling/create) - Submit a new batch call with agent, phone number, and recipient list
# Vonage integration
> Integrate ElevenLabs Conversational AI with Vonage voice calls using a WebSocket connector.
## Overview
Connect ElevenLabs Conversational AI Agents to Vonage Voice API or Video API calls using a [WebSocket connector application](https://github.com/nexmo-se/elevenlabs-agent-ws-connector). This enables real-time, bi-directional audio streaming for use cases like PSTN calls, SIP trunks, and WebRTC clients.
## How it works
The Node.js connector bridges Vonage and ElevenLabs:
1. Vonage initiates a WebSocket connection to the connector for an active call.
2. The connector establishes a WebSocket connection to the ElevenLabs Conversational AI endpoint.
3. Audio is relayed: Vonage (L16) -> Connector -> ElevenLabs (base64) and vice-versa.
4. The connector manages conversation events (`user_transcript`, `agent_response`, `interruption`).
## Setup
### 1. Get ElevenLabs credentials
* **API Key**: on the [ElevenLabs dashboard](https://elevenlabs.io/app), click "My Account" and then "API Keys" in the popup that appears.
* **Agent ID**: Find the agent in the [Conversational AI dashboard](https://elevenlabs.io/app/conversational-ai/agents/). Once you have selected the agent click on the settings button and select "Copy Agent ID".
### 2. Configure the connector
Clone the repository and set up the environment file.
```bash
git clone https://github.com/nexmo-se/elevenlabs-agent-ws-connector.git
cd elevenlabs-agent-ws-connector
cp .env.example .env
```
Add your credentials to `.env`:
```bash title=".env"
ELEVENLABS_API_KEY = YOUR_API_KEY;
ELEVENLABS_AGENT_ID = YOUR_AGENT_ID;
```
Install dependencies: `npm install`.
### 3. Expose the connector (local development)
Use ngrok, or a similar service, to create a public URL for the connector (default port 6000).
```bash
ngrok http 6000
```
Note the public `Forwarding` URL (e.g., `xxxxxxxx.ngrok-free.app`). **Do not include `https://`** when configuring Vonage.
### 4. Run the connector
Start the application:
```bash
node elevenlabs-agent-ws-connector.cjs
```
### 5. Configure Vonage voice application
Your Vonage app needs to connect to the connector's WebSocket endpoint (`wss://YOUR_CONNECTOR_HOSTNAME/socket`). This is the ngrok URL from step 3.
* **Use Sample App**: Configure the [sample Vonage app](https://github.com/nexmo-se/voice-to-ai-engines) with `PROCESSOR_SERVER` set to your connector's hostname.
* **Update Existing App**: Modify your [Nexmo Call Control Object](https://developer.vonage.com/en/voice/voice-api/ncco-reference) to include a `connect` action targeting the connector's WebSocket URI (`wss://...`) with `content-type: audio/l16;rate=16000`. Pass necessary query parameters like `peer_uuid` and `webhook_url`.
### 6. Test
Make an inbound or outbound call via your Vonage application to interact with the ElevenLabs agent.
## Cloud deployment
For production, deploy the connector to a stable hosting provider (e.g., Vonage Cloud Runtime) with a public hostname.
# Telnyx SIP trunking
> Connect Telnyx SIP trunks with ElevenLabs conversational AI agents.
Before following this guide, consider reading the [SIP trunking
guide](/docs/conversational-ai/phone-numbers/sip-trunking) to understand how ElevenLabs supports
SIP trunks.
## Overview
This guide explains how to connect your Telnyx SIP trunks directly to ElevenLabs conversational AI agents. This integration allows you to use your existing Telnyx phone numbers and infrastructure while leveraging ElevenLabs' advanced voice AI capabilities.
## How SIP trunking with Telnyx works
SIP trunking establishes a direct connection between your Telnyx telephony infrastructure and the ElevenLabs platform:
1. **Inbound calls**: Calls from your Telnyx SIP trunk are routed to the ElevenLabs platform using our origination URI. You will configure this in your Telnyx account.
2. **Outbound calls**: Calls initiated by ElevenLabs are routed to your Telnyx SIP trunk using your termination URI, enabling your agents to make outgoing calls.
3. **Authentication**: Connection security is maintained through either digest authentication (username/password) or Access Control List (ACL) authentication.
4. **Signaling and Media**: The initial call setup (signaling) uses TCP. Once the call is established, the actual audio data (RTP stream) is transmitted over UDP.
## Requirements
Before setting up the Telnyx SIP trunk integration, ensure you have:
1. An active ElevenLabs account
2. An active Telnyx account
3. At least one phone number purchased or ported into your Telnyx account
4. Administrator access to your Telnyx portal
5. Appropriate firewall settings to allow SIP and RTP traffic
## Creating a SIP trunk using the Telnyx UI
Log in to your Telnyx account at [portal.telnyx.com](https://portal.telnyx.com/).
Navigate to the Numbers section and purchase a phone number that will be used with your ElevenLabs agent.
Go to Voice » [SIP Trunking](https://portal.telnyx.com/#/voice/connections) in the Telnyx portal.
Click on Create SIP Connection and choose FQDN as the connection type, then save.
1. In the Authentication & Routing Configuration section, select Outbound Calls Authentication.
2. In the Authentication Method field, select Credentials and enter a username and password.
3. Select Add FQDN and enter `sip.rtc.elevenlabs.io` into the FQDN field.
1. Select the Inbound tab.
2. In the Destination Number Format field, select `+E.164`.
3. For SIP Transport Protocol, select TCP.
4. In the SIP Region field, select your region.
1. Select the Outbound tab.
2. In the Outbound Voice Profile field, select or create an outbound voice profile.
1. Select the Numbers tab.
2. Assign your purchased phone number to this SIP connection.
After setting up your Telnyx SIP trunk, follow the [SIP trunking
guide](/docs/conversational-ai/phone-numbers/sip-trunking) to complete the configuration in
ElevenLabs.
# Plivo
> Integrate ElevenLabs conversational AI agents with your Plivo SIP trunks
Before following this guide, consider reading the [SIP trunking
guide](/docs/conversational-ai/phone-numbers/sip-trunking) to understand how ElevenLabs supports
SIP trunks.
## Overview
This guide explains how to connect your Plivo SIP trunks directly to ElevenLabs conversational AI agents.
This integration allows you to use your existing Plivo phone numbers and infrastructure while leveraging ElevenLabs' advanced voice AI capabilities, for both inbound and outbound calls.
## How SIP trunking with Plivo works
SIP trunking establishes a direct connection between your Plivo telephony infrastructure and the ElevenLabs platform:
1. **Inbound calls**: Calls from your Plivo SIP trunk are routed to the ElevenLabs platform using our origination URI. You will configure this in your Plivo account.
2. **Outbound calls**: Calls initiated by ElevenLabs are routed to your Plivo SIP trunk using your termination URI, enabling your agents to make outgoing calls.
3. **Authentication**: Connection security for the signaling is maintained through either digest authentication (username/password) or Access Control List (ACL) authentication based on the signaling source IP from Plivo.
4. **Signaling and Media**: The initial call setup (signaling) uses TCP. Once the call is established, the actual audio data (RTP stream) is transmitted over UDP.
## Requirements
Before setting up the Plivo SIP trunk integration, ensure you have:
1. An active Plivo account with SIP trunking enabled
2. Plivo phone numbers that you want to connect to ElevenLabs
3. Administrator access to your Plivo account and SIP trunk configuration
4. Appropriate firewall settings to allow SIP traffic to and from ElevenLabs and Plivo
## Configuring Plivo SIP trunks
This section provides detailed instructions for creating SIP trunks in Plivo before connecting them to ElevenLabs.
### Setting up inbound trunks (calls from Plivo to ElevenLabs)
Sign in to the Plivo Console.
Go to the Zentrunk Dashboard in your Plivo account.
1. Select "Create New Inbound Trunk" and provide a descriptive name for your trunk.
2. Under Trunk Authentication, click "Add New URI".
3. Enter the ElevenLabs SIP URI: `sip.rtc.elevenlabs.io`
4. Select "Create Trunk" to complete your inbound trunk creation.
1. Navigate to the Phone Numbers Dashboard and select the number you want to route to your inbound trunk.
2. Under Number Configuration, set "Trunk" to your newly created inbound trunk.
3. Select "Update" to save the configuration.
### Setting up outbound trunks (calls from ElevenLabs to Plivo)
Sign in to the Plivo Console.
Go to the Zentrunk Dashboard in your Plivo account.
1. Select "Create New Outbound Trunk" and provide a descriptive name for your trunk.
2. Under Trunk Authentication, click "Add New Credentials List".
3. Add a username and password that you'll use to authenticate outbound calls.
4. Select "Create Credentials List". 5. Save your credentials list and select "Create Trunk" to complete your outbound trunk configuration.
After creating the outbound trunk, note the termination URI (typically in the format
`sip:yourusername@yourplivotrunk.sip.plivo.com`). You'll need this information when configuring
the SIP trunk in ElevenLabs.
Once you've set up your Plivo SIP trunk, follow the [SIP trunking
guide](/docs/conversational-ai/phone-numbers/sip-trunking) to finish the setup ElevenLabs as well.
# Genesys
> Integrate ElevenLabs conversational AI agents with Genesys using native Audio Connector integration.
## Overview
This guide explains how to integrate ElevenLabs conversational AI agents with Genesys Cloud using the Audio Connector integration. This integration enables seamless voice AI capabilities within your existing Genesys contact center infrastructure over websocket, without requiring SIP trunking.
## How Genesys integration works
The Genesys integration uses a native WebSocket connection through the Audio Connector integration:
1. **WebSocket connection**: Direct connection to ElevenLabs using the Audio Connector integration in Genesys Cloud
2. **Real-time audio**: Bidirectional audio streaming between Genesys and ElevenLabs agents
3. **Flow integration**: Seamless integration within your Genesys Architect flows using bot actions
4. **Dynamic variables**: Support for passing context and data between Genesys and ElevenLabs
## Requirements
Before setting up the Genesys integration, ensure you have:
1. Genesys Cloud CX license with bot flow capabilities
2. Administrator access to Genesys Cloud organization
3. A configured ElevenLabs account and conversational AI agent
4. ElevenLabs API key
## Setting up the Audio Connector integration
Sign in to your Genesys Cloud organization with administrator privileges.
Go to Admin → Integrations in the Genesys Cloud interface.
1. Click "Add Integration" and search for "Audio Connector", and click "Install"
2. Select the Audio Connector integration type
3. Provide a descriptive name for your integration
1. Navigate to the Configuration section of your Audio Connector integration
2. In Properties, in the Base Connection URI field, enter: `wss://api.elevenlabs.io/v1/convai/conversation/genesys`
3. In Credentials, enter your ElevenLabs API key in the authentication configuration
4. Save the integration configuration
Set the integration status to "Active" to enable the connection.
## Configuring your Genesys flow
Navigate to Admin → Architect in Genesys Cloud.
Open an existing inbound, outbound, or in-queue call flow, or create a new one where you want to
use the ElevenLabs agent.
1. In your flow, add a "Call Audio Connector" action from the Bot category
2. Select your Audio Connector integration from the integration dropdown
3. In the Connector ID field, specify your ElevenLabs agent ID
If you need to pass context to your ElevenLabs agent, configure input session variables in the bot
action. These will be available as dynamic variables in your ElevenLabs agent.
Save and publish your flow to make the integration active.
## Agent configuration requirements
Your ElevenLabs conversational AI agent must be configured with specific audio settings for Genesys compatibility:
### Audio format requirements
* **TTS output format**: Set to μ-law 8000 Hz in Agent Settings → Voice
* **User input audio format**: Set to μ-law 8000 Hz in Agent Settings → Advanced
### Supported client events
The Genesys integration supports only the following client events:
* **Audio events**: For processing voice input from callers
* **Interruption events**: For handling caller interruptions during agent speech
Other client event types are not supported in the Genesys integration and will be silently ignored
if configured.
## Session variables
You can pass dynamic context from your Genesys flow to your ElevenLabs agent using input session variables:
### Setting up session variables
1. **In Genesys flow**: Define input session variables in your "Call Audio Connector" action
2. **In ElevenLabs agent**: These variables are automatically available as dynamic variables
3. **Usage**: Reference these variables in your agent's conversation flow or system prompts
Learn more about [dynamic variables](/docs/conversational-ai/customization/personalization/dynamic-variables).
### Example usage
Genesys Flow input session variable: customer\_name = "John Smith"
ElevenLabs agent prompt: Hi \{\{customer\_name}}, how can I help you today?
Output session variables from ElevenLabs agents back to Genesys flows are coming soon. This
feature will allow you to capture conversation outcomes and route calls accordingly.
## Limitations and unsupported features
The following tools and features are not supported in the Genesys integration:
### Unsupported tools
* **Client tool**: Not compatible with Genesys WebSocket integration
* **Transfer to number**: Use Genesys native transfer capabilities instead
## Troubleshooting
Verify that your API key is correctly configured in the Audio Connector integration and the ElevenLabs agent ID is correctly configured in the Connector ID field in your Architect flow.
If there are any dynamic variables defined on your agent, they must be passed in as input session variables.
Verify that input session variables are properly defined in your Genesys flow's "Call Audio Connector" action and that they're referenced correctly in your ElevenLabs agent using the \{\{variable\_name}} syntax.
# Twilio native integration
> Learn how to configure inbound calls for your agent with Twilio.
## Overview
This guide shows you how to connect a Twilio phone number to your conversational AI agent to handle both inbound and outbound calls.
You will learn to:
* Import an existing Twilio phone number.
* Link it to your agent to handle inbound calls.
* Initiate outbound calls using your agent.
## Guide
### Prerequisites
* A [Twilio account](https://twilio.com/).
* A purchased & provisioned Twilio [phone number](https://www.twilio.com/docs/phone-numbers).
In the Conversational AI dashboard, go to the [**Phone Numbers**](https://elevenlabs.io/app/conversational-ai/phone-numbers) tab.

Next, fill in the following details:
* **Label:** A descriptive name (e.g., `Customer Support Line`).
* **Phone Number:** The Twilio number you want to use.
* **Twilio SID:** Your Twilio Account SID.
* **Twilio Token:** Your Twilio Auth Token.
You can find your account SID and auth token [**in the Twilio admin console**](https://www.twilio.com/console).

Copy the Twilio SID and Auth Token from the [Twilio admin
console](https://www.twilio.com/console).

ElevenLabs automatically configures the Twilio phone number with the correct settings.

Once the number is imported, select the agent that will handle inbound calls for this phone number.

Test the agent by giving the phone number a call. Your agent is now ready to handle inbound calls and engage with your customers.
Monitor your first few calls in the [Calls History
dashboard](https://elevenlabs.io/app/conversational-ai/history) to ensure everything is working as
expected.
## Making Outbound Calls
Your imported Twilio phone number can also be used to initiate outbound calls where your agent calls a specified phone number.
From the [**Phone Numbers**](https://elevenlabs.io/app/conversational-ai/phone-numbers) tab, locate your imported Twilio number and click the **Outbound call** button.

In the Outbound Call modal:
1. Select the agent that will handle the conversation
2. Enter the phone number you want to call
3. Click **Send Test Call** to initiate the call

Once initiated, the recipient will receive a call from your Twilio number. When they answer, your agent will begin the conversation.
Outbound calls appear in your [Calls History
dashboard](https://elevenlabs.io/app/conversational-ai/history) alongside inbound calls, allowing
you to review all conversations.
When making outbound calls, your agent will be the initiator of the conversation, so ensure your
agent has appropriate initial messages configured to start the conversation effectively.
# Twilio custom server
> Learn how to integrate a Conversational AI agent with Twilio to create seamless, human-like voice interactions.
Custom server should be used for **outbound calls only**. Please use our [native
integration](/docs/conversational-ai/phone-numbers/twilio-integration/native-integration) for
**inbound Twilio calls**.
Connect your ElevenLabs Conversational AI agent to phone calls and create human-like voice experiences using Twilio's Voice API.
## What You'll Need
* An [ElevenLabs account](https://elevenlabs.io)
* A configured ElevenLabs Conversational Agent ([create one here](/docs/conversational-ai/quickstart))
* A [Twilio account](https://www.twilio.com/try-twilio) with an active phone number
* Python 3.7+ or Node.js 16+
* [ngrok](https://ngrok.com/) for local development
## Agent Configuration
Before integrating with Twilio, you'll need to configure your agent to use the correct audio format supported by Twilio.
1. Navigate to your agent settings
2. Go to the Voice Section
3. Select "μ-law 8000 Hz" from the dropdown

1. Navigate to your agent settings
2. Go to the Advanced Section
3. Select "μ-law 8000 Hz" for the input format

## Implementation
Looking for a complete example? Check out this [Javascript implementation](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/twilio/javascript) on GitHub.
First, set up a new Node.js project:
```bash
mkdir conversational-ai-twilio
cd conversational-ai-twilio
npm init -y; npm pkg set type="module";
```
Next, install the required dependencies for the project.
```bash
npm install @fastify/formbody @fastify/websocket dotenv fastify ws
```
Create a `.env` & `index.js` file with the following code:
```
conversational-ai-twilio/
├── .env
└── index.js
```
```text .env
ELEVENLABS_AGENT_ID=
```
```javascript index.js
import Fastify from "fastify";
import WebSocket from "ws";
import dotenv from "dotenv";
import fastifyFormBody from "@fastify/formbody";
import fastifyWs from "@fastify/websocket";
// Load environment variables from .env file
dotenv.config();
const { ELEVENLABS_AGENT_ID } = process.env;
// Check for the required ElevenLabs Agent ID
if (!ELEVENLABS_AGENT_ID) {
console.error("Missing ELEVENLABS_AGENT_ID in environment variables");
process.exit(1);
}
// Initialize Fastify server
const fastify = Fastify();
fastify.register(fastifyFormBody);
fastify.register(fastifyWs);
const PORT = process.env.PORT || 8000;
// Root route for health check
fastify.get("/", async (_, reply) => {
reply.send({ message: "Server is running" });
});
// Route to handle incoming calls from Twilio
fastify.all("/twilio/inbound_call", async (request, reply) => {
// Generate TwiML response to connect the call to a WebSocket stream
const twimlResponse = `
`;
reply.type("text/xml").send(twimlResponse);
});
// WebSocket route for handling media streams from Twilio
fastify.register(async (fastifyInstance) => {
fastifyInstance.get("/media-stream", { websocket: true }, (connection, req) => {
console.info("[Server] Twilio connected to media stream.");
let streamSid = null;
// Connect to ElevenLabs Conversational AI WebSocket
const elevenLabsWs = new WebSocket(
`wss://api.elevenlabs.io/v1/convai/conversation?agent_id=${ELEVENLABS_AGENT_ID}`
);
// Handle open event for ElevenLabs WebSocket
elevenLabsWs.on("open", () => {
console.log("[II] Connected to Conversational AI.");
});
// Handle messages from ElevenLabs
elevenLabsWs.on("message", (data) => {
try {
const message = JSON.parse(data);
handleElevenLabsMessage(message, connection);
} catch (error) {
console.error("[II] Error parsing message:", error);
}
});
// Handle errors from ElevenLabs WebSocket
elevenLabsWs.on("error", (error) => {
console.error("[II] WebSocket error:", error);
});
// Handle close event for ElevenLabs WebSocket
elevenLabsWs.on("close", () => {
console.log("[II] Disconnected.");
});
// Function to handle messages from ElevenLabs
const handleElevenLabsMessage = (message, connection) => {
switch (message.type) {
case "conversation_initiation_metadata":
console.info("[II] Received conversation initiation metadata.");
break;
case "audio":
if (message.audio_event?.audio_base_64) {
// Send audio data to Twilio
const audioData = {
event: "media",
streamSid,
media: {
payload: message.audio_event.audio_base_64,
},
};
connection.send(JSON.stringify(audioData));
}
break;
case "interruption":
// Clear Twilio's audio queue
connection.send(JSON.stringify({ event: "clear", streamSid }));
break;
case "ping":
// Respond to ping events from ElevenLabs
if (message.ping_event?.event_id) {
const pongResponse = {
type: "pong",
event_id: message.ping_event.event_id,
};
elevenLabsWs.send(JSON.stringify(pongResponse));
}
break;
}
};
// Handle messages from Twilio
connection.on("message", async (message) => {
try {
const data = JSON.parse(message);
switch (data.event) {
case "start":
// Store Stream SID when stream starts
streamSid = data.start.streamSid;
console.log(`[Twilio] Stream started with ID: ${streamSid}`);
break;
case "media":
// Route audio from Twilio to ElevenLabs
if (elevenLabsWs.readyState === WebSocket.OPEN) {
// data.media.payload is base64 encoded
const audioMessage = {
user_audio_chunk: Buffer.from(
data.media.payload,
"base64"
).toString("base64"),
};
elevenLabsWs.send(JSON.stringify(audioMessage));
}
break;
case "stop":
// Close ElevenLabs WebSocket when Twilio stream stops
elevenLabsWs.close();
break;
default:
console.log(`[Twilio] Received unhandled event: ${data.event}`);
}
} catch (error) {
console.error("[Twilio] Error processing message:", error);
}
});
// Handle close event from Twilio
connection.on("close", () => {
elevenLabsWs.close();
console.log("[Twilio] Client disconnected");
});
// Handle errors from Twilio WebSocket
connection.on("error", (error) => {
console.error("[Twilio] WebSocket error:", error);
elevenLabsWs.close();
});
});
});
// Start the Fastify server
fastify.listen({ port: PORT }, (err) => {
if (err) {
console.error("Error starting server:", err);
process.exit(1);
}
console.log(`[Server] Listening on port ${PORT}`);
});
```
You can now run the server with the following command:
```bash
node index.js
```
If the server starts successfully, you should see the message `[Server] Listening on port 8000` (or the port you specified) in your terminal.
Looking for a complete example? Check out this [implementation](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/twilio) on GitHub.
```bash
mkdir conversational-ai-twilio
cd conversational-ai-twilio
```
Next, install the required dependencies for the project.
```bash
pip install fastapi uvicorn python-dotenv twilio elevenlabs websockets
```
Create a `.env`, `main.py` & `twilio_audio_interface.py` file with the following code:
```
conversational-ai-twilio/
├── .env
├── main.py
└── twilio_audio_interface.py
```
```text .env
ELEVENLABS_API_KEY=
AGENT_ID=
```
```python main.py
import json
import traceback
import os
from dotenv import load_dotenv
from fastapi import FastAPI, Request, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from twilio.twiml.voice_response import VoiceResponse, Connect
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from twilio_audio_interface import TwilioAudioInterface
# Load environment variables
load_dotenv()
# Initialize FastAPI app
app = FastAPI()
# Initialize ElevenLabs client
elevenlabs = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
ELEVEN_LABS_AGENT_ID = os.getenv("AGENT_ID")
@app.get("/")
async def root():
return {"message": "Twilio-ElevenLabs Integration Server"}
@app.api_route("/twilio/inbound_call", methods=["GET", "POST"])
async def handle_incoming_call(request: Request):
"""Handle incoming call and return TwiML response."""
response = VoiceResponse()
host = request.url.hostname
connect = Connect()
connect.stream(url=f"wss://{host}/media-stream-eleven")
response.append(connect)
return HTMLResponse(content=str(response), media_type="application/xml")
@app.websocket("/media-stream-eleven")
async def handle_media_stream(websocket: WebSocket):
await websocket.accept()
print("WebSocket connection established")
audio_interface = TwilioAudioInterface(websocket)
conversation = None
try:
conversation = Conversation(
client=elevenlabs,
agent_id=ELEVEN_LABS_AGENT_ID,
requires_auth=False,
audio_interface=audio_interface,
callback_agent_response=lambda text: print(f"Agent said: {text}"),
callback_user_transcript=lambda text: print(f"User said: {text}"),
)
conversation.start_session()
print("Conversation session started")
async for message in websocket.iter_text():
if not message:
continue
try:
data = json.loads(message)
await audio_interface.handle_twilio_message(data)
except Exception as e:
print(f"Error processing message: {str(e)}")
traceback.print_exc()
except WebSocketDisconnect:
print("WebSocket disconnected")
finally:
if conversation:
print("Ending conversation session...")
conversation.end_session()
conversation.wait_for_session_end()
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
```
```python twilio_audio_interface.py
import asyncio
from typing import Callable
import queue
import threading
import base64
from elevenlabs.conversational_ai.conversation import AudioInterface
import websockets
class TwilioAudioInterface(AudioInterface):
def __init__(self, websocket):
self.websocket = websocket
self.output_queue = queue.Queue()
self.should_stop = threading.Event()
self.stream_sid = None
self.input_callback = None
self.output_thread = None
def start(self, input_callback: Callable[[bytes], None]):
self.input_callback = input_callback
self.output_thread = threading.Thread(target=self._output_thread)
self.output_thread.start()
def stop(self):
self.should_stop.set()
if self.output_thread:
self.output_thread.join(timeout=5.0)
self.stream_sid = None
def output(self, audio: bytes):
self.output_queue.put(audio)
def interrupt(self):
try:
while True:
_ = self.output_queue.get(block=False)
except queue.Empty:
pass
asyncio.run(self._send_clear_message_to_twilio())
async def handle_twilio_message(self, data):
try:
if data["event"] == "start":
self.stream_sid = data["start"]["streamSid"]
print(f"Started stream with stream_sid: {self.stream_sid}")
if data["event"] == "media":
audio_data = base64.b64decode(data["media"]["payload"])
if self.input_callback:
self.input_callback(audio_data)
except Exception as e:
print(f"Error in input_callback: {e}")
def _output_thread(self):
while not self.should_stop.is_set():
asyncio.run(self._send_audio_to_twilio())
async def _send_audio_to_twilio(self):
try:
audio = self.output_queue.get(timeout=0.2)
audio_payload = base64.b64encode(audio).decode("utf-8")
audio_delta = {
"event": "media",
"streamSid": self.stream_sid,
"media": {"payload": audio_payload},
}
await self.websocket.send_json(audio_delta)
except queue.Empty:
pass
except Exception as e:
print(f"Error sending audio: {e}")
async def _send_clear_message_to_twilio(self):
try:
clear_message = {"event": "clear", "streamSid": self.stream_sid}
await self.websocket.send_json(clear_message)
except Exception as e:
print(f"Error sending clear message to Twilio: {e}")
```
You can now run the server with the following command:
```bash
python main.py
```
## Twilio Setup
Use ngrok to make your local server accessible:
```bash
ngrok http --url= 8000
```

1. Go to the [Twilio Console](https://console.twilio.com)
2. Navigate to `Phone Numbers` → `Manage` → `Active numbers`
3. Select your phone number
4. Under "Voice Configuration", set the webhook for incoming calls to:
`https://your-ngrok-url.ngrok.app/twilio/inbound_call`
5. Set the HTTP method to POST

## Testing
1. Call your Twilio phone number.
2. Start speaking - you'll see the transcripts in the ElevenLabs console.
## Troubleshooting
If the WebSocket connection fails:
* Verify your ngrok URL is correct in Twilio settings
* Check that your server is running and accessible
* Ensure your firewall isn't blocking WebSocket connections
If there's no audio output:
* Confirm your ElevenLabs API key is valid
* Verify the AGENT\_ID is correct
* Check audio format settings match Twilio's requirements (μ-law 8kHz)
## Security Best Practices
Follow these security guidelines for production deployments:
<>
* Use environment variables for sensitive information - Implement proper authentication for your
endpoints - Use HTTPS for all communications - Regularly rotate API keys - Monitor usage to
prevent abuse
>
# Twilio outbound calls
> Build an outbound calling AI agent with Twilio and ElevenLabs.
**Outbound calls are now natively supported**, see guide
[here](/docs/conversational-ai/phone-numbers/twilio-integration/native-integration#making-outbound-calls)
We recommend using the native integration instead of this guide.
In this guide you will learn how to build an integration with Twilio to initialise outbound calls to your prospects and customers.
Find the [example project on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/twilio/javascript).
## What You'll Need
* An [ElevenLabs account](https://elevenlabs.io).
* A configured ElevenLabs Conversational Agent ([create one here](/docs/conversational-ai/quickstart)).
* A [Twilio account](https://www.twilio.com/try-twilio) with an active phone number.
* Node.js 16+
* [ngrok](https://ngrok.com/) for local development.
## Agent Configuration
Before integrating with Twilio, you'll need to configure your agent to use the correct audio format supported by Twilio.
1. Navigate to your agent settings.
2. Go to the Voice section.
3. Select "μ-law 8000 Hz" from the dropdown.

1. Navigate to your agent settings. 2. Go to the Advanced section. 3. Select "μ-law 8000 Hz" for
the input format.

1. Navigate to your agent settings.
2. Go to the security section.
3. Toggle on "Enable authentication".
4. In "Enable overrides" toggle on "First message" and "System prompt" as you will be dynamically injecting these values when initiating the call.

## Implementation
Looking for a complete example? Check out this [Javascript implementation](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/twilio/javascript) on GitHub.
First, set up a new Node.js project:
```bash
mkdir conversational-ai-twilio
cd conversational-ai-twilio
npm init -y; npm pkg set type="module";
```
Next, install the required dependencies for the project.
```bash
npm install @fastify/formbody @fastify/websocket dotenv fastify ws twilio
```
Create a `.env` and `outbound.js` file with the following code:
```text .env
ELEVENLABS_AGENT_ID=
ELEVENLABS_API_KEY=
# Twilio
TWILIO_ACCOUNT_SID=
TWILIO_AUTH_TOKEN=
TWILIO_PHONE_NUMBER=
```
```javascript outbound.js
import fastifyFormBody from '@fastify/formbody';
import fastifyWs from '@fastify/websocket';
import dotenv from 'dotenv';
import Fastify from 'fastify';
import Twilio from 'twilio';
import WebSocket from 'ws';
// Load environment variables from .env file
dotenv.config();
// Check for required environment variables
const {
ELEVENLABS_API_KEY,
ELEVENLABS_AGENT_ID,
TWILIO_ACCOUNT_SID,
TWILIO_AUTH_TOKEN,
TWILIO_PHONE_NUMBER,
} = process.env;
if (
!ELEVENLABS_API_KEY ||
!ELEVENLABS_AGENT_ID ||
!TWILIO_ACCOUNT_SID ||
!TWILIO_AUTH_TOKEN ||
!TWILIO_PHONE_NUMBER
) {
console.error('Missing required environment variables');
throw new Error('Missing required environment variables');
}
// Initialize Fastify server
const fastify = Fastify();
fastify.register(fastifyFormBody);
fastify.register(fastifyWs);
const PORT = process.env.PORT || 8000;
// Root route for health check
fastify.get('/', async (_, reply) => {
reply.send({ message: 'Server is running' });
});
// Initialize Twilio client
const twilioClient = new Twilio(TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN);
// Helper function to get signed URL for authenticated conversations
async function getSignedUrl() {
try {
const response = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=${ELEVENLABS_AGENT_ID}`,
{
method: 'GET',
headers: {
'xi-api-key': ELEVENLABS_API_KEY,
},
}
);
if (!response.ok) {
throw new Error(`Failed to get signed URL: ${response.statusText}`);
}
const data = await response.json();
return data.signed_url;
} catch (error) {
console.error('Error getting signed URL:', error);
throw error;
}
}
// Route to initiate outbound calls
fastify.post('/outbound-call', async (request, reply) => {
const { number, prompt, first_message } = request.body;
if (!number) {
return reply.code(400).send({ error: 'Phone number is required' });
}
try {
const call = await twilioClient.calls.create({
from: TWILIO_PHONE_NUMBER,
to: number,
url: `https://${request.headers.host}/outbound-call-twiml?prompt=${encodeURIComponent(
prompt
)}&first_message=${encodeURIComponent(first_message)}`,
});
reply.send({
success: true,
message: 'Call initiated',
callSid: call.sid,
});
} catch (error) {
console.error('Error initiating outbound call:', error);
reply.code(500).send({
success: false,
error: 'Failed to initiate call',
});
}
});
// TwiML route for outbound calls
fastify.all('/outbound-call-twiml', async (request, reply) => {
const prompt = request.query.prompt || '';
const first_message = request.query.first_message || '';
const twimlResponse = `
`;
reply.type('text/xml').send(twimlResponse);
});
// WebSocket route for handling media streams
fastify.register(async (fastifyInstance) => {
fastifyInstance.get('/outbound-media-stream', { websocket: true }, (ws, req) => {
console.info('[Server] Twilio connected to outbound media stream');
// Variables to track the call
let streamSid = null;
let callSid = null;
let elevenLabsWs = null;
let customParameters = null; // Add this to store parameters
// Handle WebSocket errors
ws.on('error', console.error);
// Set up ElevenLabs connection
const setupElevenLabs = async () => {
try {
const signedUrl = await getSignedUrl();
elevenLabsWs = new WebSocket(signedUrl);
elevenLabsWs.on('open', () => {
console.log('[ElevenLabs] Connected to Conversational AI');
// Send initial configuration with prompt and first message
const initialConfig = {
type: 'conversation_initiation_client_data',
dynamic_variables: {
user_name: 'Angelo',
user_id: 1234,
},
conversation_config_override: {
agent: {
prompt: {
prompt: customParameters?.prompt || 'you are a gary from the phone store',
},
first_message:
customParameters?.first_message || 'hey there! how can I help you today?',
},
},
};
console.log(
'[ElevenLabs] Sending initial config with prompt:',
initialConfig.conversation_config_override.agent.prompt.prompt
);
// Send the configuration to ElevenLabs
elevenLabsWs.send(JSON.stringify(initialConfig));
});
elevenLabsWs.on('message', (data) => {
try {
const message = JSON.parse(data);
switch (message.type) {
case 'conversation_initiation_metadata':
console.log('[ElevenLabs] Received initiation metadata');
break;
case 'audio':
if (streamSid) {
if (message.audio?.chunk) {
const audioData = {
event: 'media',
streamSid,
media: {
payload: message.audio.chunk,
},
};
ws.send(JSON.stringify(audioData));
} else if (message.audio_event?.audio_base_64) {
const audioData = {
event: 'media',
streamSid,
media: {
payload: message.audio_event.audio_base_64,
},
};
ws.send(JSON.stringify(audioData));
}
} else {
console.log('[ElevenLabs] Received audio but no StreamSid yet');
}
break;
case 'interruption':
if (streamSid) {
ws.send(
JSON.stringify({
event: 'clear',
streamSid,
})
);
}
break;
case 'ping':
if (message.ping_event?.event_id) {
elevenLabsWs.send(
JSON.stringify({
type: 'pong',
event_id: message.ping_event.event_id,
})
);
}
break;
case 'agent_response':
console.log(
`[Twilio] Agent response: ${message.agent_response_event?.agent_response}`
);
break;
case 'user_transcript':
console.log(
`[Twilio] User transcript: ${message.user_transcription_event?.user_transcript}`
);
break;
default:
console.log(`[ElevenLabs] Unhandled message type: ${message.type}`);
}
} catch (error) {
console.error('[ElevenLabs] Error processing message:', error);
}
});
elevenLabsWs.on('error', (error) => {
console.error('[ElevenLabs] WebSocket error:', error);
});
elevenLabsWs.on('close', () => {
console.log('[ElevenLabs] Disconnected');
});
} catch (error) {
console.error('[ElevenLabs] Setup error:', error);
}
};
// Set up ElevenLabs connection
setupElevenLabs();
// Handle messages from Twilio
ws.on('message', (message) => {
try {
const msg = JSON.parse(message);
if (msg.event !== 'media') {
console.log(`[Twilio] Received event: ${msg.event}`);
}
switch (msg.event) {
case 'start':
streamSid = msg.start.streamSid;
callSid = msg.start.callSid;
customParameters = msg.start.customParameters; // Store parameters
console.log(`[Twilio] Stream started - StreamSid: ${streamSid}, CallSid: ${callSid}`);
console.log('[Twilio] Start parameters:', customParameters);
break;
case 'media':
if (elevenLabsWs?.readyState === WebSocket.OPEN) {
const audioMessage = {
user_audio_chunk: Buffer.from(msg.media.payload, 'base64').toString('base64'),
};
elevenLabsWs.send(JSON.stringify(audioMessage));
}
break;
case 'stop':
console.log(`[Twilio] Stream ${streamSid} ended`);
if (elevenLabsWs?.readyState === WebSocket.OPEN) {
elevenLabsWs.close();
}
break;
default:
console.log(`[Twilio] Unhandled event: ${msg.event}`);
}
} catch (error) {
console.error('[Twilio] Error processing message:', error);
}
});
// Handle WebSocket closure
ws.on('close', () => {
console.log('[Twilio] Client disconnected');
if (elevenLabsWs?.readyState === WebSocket.OPEN) {
elevenLabsWs.close();
}
});
});
});
// Start the Fastify server
fastify.listen({ port: PORT }, (err) => {
if (err) {
console.error('Error starting server:', err);
process.exit(1);
}
console.log(`[Server] Listening on port ${PORT}`);
});
```
You can now run the server with the following command:
```bash
node outbound.js
```
If the server starts successfully, you should see the message `[Server] Listening on port 8000` (or the port you specified) in your terminal.
## Testing
1. In another terminal, run `ngrok http --url= 8000`.
2. Make a request to the `/outbound-call` endpoint with the customer's phone number, the first message you want to use and the custom prompt:
```bash
curl -X POST https:///outbound-call \
-H "Content-Type: application/json" \
-d '{
"prompt": "You are Eric, an outbound car sales agent. You are calling to sell a new car to the customer. Be friendly and professional and answer all questions.",
"first_message": "Hello Thor, my name is Eric, I heard you were looking for a new car! What model and color are you looking for?",
"number": "number-to-call"
}'
```
3. You will see the call get initiated in your server terminal window and your phone will ring, starting the conversation once you answer.
## Troubleshooting
If the WebSocket connection fails:
* Verify your ngrok URL is correct in Twilio settings
* Check that your server is running and accessible
* Ensure your firewall isn't blocking WebSocket connections
If there's no audio output:
* Confirm your ElevenLabs API key is valid
* Verify the AGENT\_ID is correct
* Check audio format settings match Twilio's requirements (μ-law 8kHz)
## Security Best Practices
Follow these security guidelines for production deployments:
<>
* Use environment variables for sensitive information - Implement proper authentication for your
endpoints - Use HTTPS for all communications - Regularly rotate API keys - Monitor usage to
prevent abuse
>
# Post-call webhooks
> Get notified when calls end and analysis is complete through webhooks.
## Overview
Post-call [Webhooks](/docs/product-guides/administration/webhooks) allow you to receive detailed information about a call after analysis is complete. When enabled, ElevenLabs will send a POST request to your specified endpoint with comprehensive call data, including transcripts, analysis results, and metadata.
The data that is returned is the same data that is returned from the [Conversation API](/docs/conversational-ai/api-reference/conversations/get-conversations).
## Enabling post-call webhooks
Post-call webhooks can be enabled for all agents in your workspace through the Conversational AI [settings page](https://elevenlabs.io/app/conversational-ai/settings).

Post call webhooks must return a 200 status code to be considered successful. Webhooks that
repeatedly fail are auto disabled if there are 10 or more consecutive failures and the last
successful delivery was more than 7 days ago or has never been successfully delivered.
For HIPAA compliance, if a webhook fails we can not retry the webhook.
### Authentication
It is important for the listener to validate all incoming webhooks. Webhooks currently support authentication via HMAC signatures. Set up HMAC authentication by:
* Securely storing the shared secret generated upon creation of the webhook
* Verifying the ElevenLabs-Signature header in your endpoint using the shared secret
The ElevenLabs-Signature takes the following format:
```json
t=timestamp,v0=hash
```
The hash is equivalent to the hex encoded sha256 HMAC signature of `timestamp.request_body`. Both the hash and timestamp should be validated, an example is shown here:
Example python webhook handler using FastAPI:
```python
from fastapi import FastAPI, Request
import time
import hmac
from hashlib import sha256
app = FastAPI()
# Example webhook handler
@app.post("/webhook")
async def receive_message(request: Request):
payload = await request.body()
headers = request.headers.get("elevenlabs-signature")
if headers is None:
return
timestamp = headers.split(",")[0][2:]
hmac_signature = headers.split(",")[1]
# Validate timestamp
tolerance = int(time.time()) - 30 * 60
if int(timestamp) < tolerance
return
# Validate signature
full_payload_to_sign = f"{timestamp}.{payload.decode('utf-8')}"
mac = hmac.new(
key=secret.encode("utf-8"),
msg=full_payload_to_sign.encode("utf-8"),
digestmod=sha256,
)
digest = 'v0=' + mac.hexdigest()
if hmac_signature != digest:
return
# Continue processing
return {"status": "received"}
```
Example javascript webhook handler using node express framework:
```javascript
const crypto = require('crypto');
const secret = process.env.WEBHOOK_SECRET;
const bodyParser = require('body-parser');
// Ensure express js is parsing the raw body through instead of applying it's own encoding
app.use(bodyParser.raw({ type: '*/*' }));
// Example webhook handler
app.post('/webhook/elevenlabs', async (req, res) => {
const headers = req.headers['ElevenLabs-Signature'].split(',');
const timestamp = headers.find((e) => e.startsWith('t=')).substring(2);
const signature = headers.find((e) => e.startsWith('v0='));
// Validate timestamp
const reqTimestamp = timestamp * 1000;
const tolerance = Date.now() - 30 * 60 * 1000;
if (reqTimestamp < tolerance) {
res.status(403).send('Request expired');
return;
} else {
// Validate hash
const message = `${timestamp}.${req.body}`;
const digest = 'v0=' + crypto.createHmac('sha256', secret).update(message).digest('hex');
if (signature !== digest) {
res.status(401).send('Request unauthorized');
return;
}
}
// Validation passed, continue processing ...
res.status(200).send();
});
```
Example javascript webhook handler using Next.js API route:
```javascript app/api/convai-webhook/route.js
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import crypto from "crypto";
export async function GET() {
return NextResponse.json({ status: "webhook listening" }, { status: 200 });
}
export async function POST(req: NextRequest) {
const secret = process.env.ELEVENLABS_CONVAI_WEBHOOK_SECRET; // Add this to your env variables
const { event, error } = await constructWebhookEvent(req, secret);
if (error) {
return NextResponse.json({ error: error }, { status: 401 });
}
if (event.type === "post_call_transcription") {
console.log("event data", JSON.stringify(event.data, null, 2));
}
return NextResponse.json({ received: true }, { status: 200 });
}
const constructWebhookEvent = async (req: NextRequest, secret?: string) => {
const body = await req.text();
const signature_header = req.headers.get("ElevenLabs-Signature");
console.log(signature_header);
if (!signature_header) {
return { event: null, error: "Missing signature header" };
}
const headers = signature_header.split(",");
const timestamp = headers.find((e) => e.startsWith("t="))?.substring(2);
const signature = headers.find((e) => e.startsWith("v0="));
if (!timestamp || !signature) {
return { event: null, error: "Invalid signature format" };
}
// Validate timestamp
const reqTimestamp = Number(timestamp) * 1000;
const tolerance = Date.now() - 30 * 60 * 1000;
if (reqTimestamp < tolerance) {
return { event: null, error: "Request expired" };
}
// Validate hash
const message = `${timestamp}.${body}`;
if (!secret) {
return { event: null, error: "Webhook secret not configured" };
}
const digest =
"v0=" + crypto.createHmac("sha256", secret).update(message).digest("hex");
console.log({ digest, signature });
if (signature !== digest) {
return { event: null, error: "Invalid signature" };
}
const event = JSON.parse(body);
return { event, error: null };
};
```
### IP whitelisting
For additional security, you can whitelist the following static egress IPs from which all ElevenLabs webhook requests originate:
| Region | IP Address |
| ------------ | -------------- |
| US (Default) | 34.67.146.145 |
| US (Default) | 34.59.11.47 |
| EU | 35.204.38.71 |
| EU | 34.147.113.54 |
| Asia | 35.185.187.110 |
| Asia | 35.247.157.189 |
If your infrastructure requires strict IP-based access controls, adding these IPs to your firewall allowlist will ensure you only receive webhook requests from ElevenLabs' systems.
These static IPs are used across all ElevenLabs webhook services and will remain consistent. Using
IP whitelisting in combination with HMAC signature validation provides multiple layers of
security.
## Webhook response structure
The webhook payload contains the same data you would receive from a GET request to the Conversation API endpoint, with additional fields for event timing and type information.
### Top-level fields
| Field | Type | Description |
| ----------------- | ------ | -------------------------------------------------------------- |
| `type` | string | Type of event (always `post_call_transcription` in this case) |
| `data` | object | Data for the conversation, what would be returned from the API |
| `event_timestamp` | number | When this event occurred in unix time UTC |
## Example webhook payload
```json
{
"type": "post_call_transcription",
"event_timestamp": 1739537297,
"data": {
"agent_id": "xyz",
"conversation_id": "abc",
"status": "done",
"transcript": [
{
"role": "agent",
"message": "Hey there angelo. How are you?",
"tool_calls": null,
"tool_results": null,
"feedback": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null
},
{
"role": "user",
"message": "Hey, can you tell me, like, a fun fact about 11 Labs?",
"tool_calls": null,
"tool_results": null,
"feedback": null,
"time_in_call_secs": 2,
"conversation_turn_metrics": null
},
{
"role": "agent",
"message": "I do not have access to fun facts about Eleven Labs. However, I can share some general information about the company. Eleven Labs is an AI voice technology platform that specializes in voice cloning and text-to-speech...",
"tool_calls": null,
"tool_results": null,
"feedback": null,
"time_in_call_secs": 9,
"conversation_turn_metrics": {
"convai_llm_service_ttfb": {
"elapsed_time": 0.3704247010173276
},
"convai_llm_service_ttf_sentence": {
"elapsed_time": 0.5551181449554861
}
}
}
],
"metadata": {
"start_time_unix_secs": 1739537297,
"call_duration_secs": 22,
"cost": 296,
"deletion_settings": {
"deletion_time_unix_secs": 1802609320,
"deleted_logs_at_time_unix_secs": null,
"deleted_audio_at_time_unix_secs": null,
"deleted_transcript_at_time_unix_secs": null,
"delete_transcript_and_pii": true,
"delete_audio": true
},
"feedback": {
"overall_score": null,
"likes": 0,
"dislikes": 0
},
"authorization_method": "authorization_header",
"charging": {
"dev_discount": true
},
"termination_reason": ""
},
"analysis": {
"evaluation_criteria_results": {},
"data_collection_results": {},
"call_successful": "success",
"transcript_summary": "The conversation begins with the agent asking how Angelo is, but Angelo redirects the conversation by requesting a fun fact about 11 Labs. The agent acknowledges they don't have specific fun facts about Eleven Labs but offers to provide general information about the company. They briefly describe Eleven Labs as an AI voice technology platform specializing in voice cloning and text-to-speech technology. The conversation is brief and informational, with the agent adapting to the user's request despite not having the exact information asked for."
},
"conversation_initiation_client_data": {
"conversation_config_override": {
"agent": {
"prompt": null,
"first_message": null,
"language": "en"
},
"tts": {
"voice_id": null
}
},
"custom_llm_extra_body": {},
"dynamic_variables": {
"user_name": "angelo"
}
}
}
}
```
## Use cases
### Automated call follow-ups
Post-call webhooks enable you to build automated workflows that trigger immediately after a call ends. Here are some practical applications:
#### CRM integration
Update your customer relationship management system with conversation data as soon as a call completes:
```javascript
// Example webhook handler
app.post('/webhook/elevenlabs', async (req, res) => {
// HMAC validation code
const { data } = req.body;
// Extract key information
const userId = data.metadata.user_id;
const transcriptSummary = data.analysis.transcript_summary;
const callSuccessful = data.analysis.call_successful;
// Update CRM record
await updateCustomerRecord(userId, {
lastInteraction: new Date(),
conversationSummary: transcriptSummary,
callOutcome: callSuccessful,
fullTranscript: data.transcript,
});
res.status(200).send('Webhook received');
});
```
### Stateful conversations
Maintain conversation context across multiple interactions by storing and retrieving state:
1. When a call starts, pass in your user id as a dynamic variable.
2. When a call ends, set up your webhook endpoint to store conversation data in your database, based on the extracted user id from the dynamic\_variables.
3. When the user calls again, you can retrieve this context and pass it to the new conversation into a \{\{previous\_topics}} dynamic variable.
4. This creates a seamless experience where the agent "remembers" previous interactions
```javascript
// Store conversation state when call ends
app.post('/webhook/elevenlabs', async (req, res) => {
// HMAC validation code
const { data } = req.body;
const userId = data.metadata.user_id;
// Store conversation state
await db.userStates.upsert({
userId,
lastConversationId: data.conversation_id,
lastInteractionTimestamp: data.metadata.start_time_unix_secs,
conversationHistory: data.transcript,
previousTopics: extractTopics(data.analysis.transcript_summary),
});
res.status(200).send('Webhook received');
});
// When initiating a new call, retrieve and use the state
async function initiateCall(userId) {
// Get user's conversation state
const userState = await db.userStates.findOne({ userId });
// Start new conversation with context from previous calls
return await elevenlabs.startConversation({
agent_id: 'xyz',
conversation_id: generateNewId(),
dynamic_variables: {
user_name: userState.name,
previous_conversation_id: userState.lastConversationId,
previous_topics: userState.previousTopics.join(', '),
},
});
}
```
# Prompting guide
> Learn how to engineer lifelike, engaging Conversational AI voice agents
## Overview
Effective prompting transforms [Conversational AI](/docs/conversational-ai/overview) voice agents from robotic to lifelike. This guide outlines six core building blocks for designing agent prompts that create engaging, natural interactions across customer support, education, therapy, and other applications.

The difference between an AI-sounding and naturally expressive Conversational AI agent comes down
to how well you structure its system prompt.
## Six building blocks
Each system prompt component serves a specific function. Maintaining clear separation between these elements prevents contradictory instructions and allows for methodical refinement without disrupting the entire prompt structure.

1. **Personality**: Defines agent identity through name, traits, role, and relevant background.
2. **Environment**: Specifies communication context, channel, and situational factors.
3. **Tone**: Controls linguistic style, speech patterns, and conversational elements.
4. **Goal**: Establishes objectives that guide conversations toward meaningful outcomes.
5. **Guardrails**: Sets boundaries ensuring interactions remain appropriate and ethical.
6. **Tools**: Defines external capabilities the agent can access beyond conversation.
### 1. Personality
The base personality is the foundation of your voice agent's identity, defining who the agent is supposed to emulate through a name, role, background, and key traits. It ensures consistent, authentic responses in every interaction.
* **Identity:** Give your agent a simple, memorable name (e.g. "Joe") and establish the essential identity (e.g. "a compassionate AI support assistant").
* **Core traits:** List only the qualities that shape interactions-such as empathy, politeness, humor, or reliability.
* **Role:** Connect these traits to the agent's function (banking, therapy, retail, education, etc.). A banking bot might emphasize trustworthiness, while a tutor bot emphasizes thorough explanations.
* **Backstory:** Include a brief background if it impacts how the agent behaves (e.g. "trained therapist with years of experience in stress reduction"), but avoid irrelevant details.
```mdx title="Example: Expressive agent personality"
# Personality
You are Joe, a nurturing virtual wellness coach.
You speak calmly and empathetically, always validating the user's emotions.
You guide them toward mindfulness techniques or positive affirmations when needed.
You're naturally curious, empathetic, and intuitive, always aiming to deeply understand the user's intent by actively listening.
You thoughtfully refer back to details they've previously shared.
```
```mdx title="Example: Task-focused agent personality"
# Personality
You are Ava, a customer support agent for a telecom company.
You are friendly, solution-oriented, and efficient.
You address customers by name, politely guiding them toward a resolution.
```
### 2. Environment
The environment captures where, how, and under what conditions your agent interacts with the user. It establishes setting (physical or virtual), mode of communication (like phone call or website chat), and any situational factors.
* **State the medium**: Define the communication channel (e.g. "over the phone", "via smart speaker", "in a noisy environment"). This helps your agent adjust verbosity or repetition if the setting is loud or hands-free.
* **Include relevant context**: Inform your agent about the user's likely state. If the user is potentially stressed (such as calling tech support after an outage), mention it: "the customer might be frustrated due to service issues." This primes the agent to respond with empathy.
* **Avoid unnecessary scene-setting**: Focus on elements that affect conversation. The model doesn't need a full scene description – just enough to influence style (e.g. formal office vs. casual home setting).
```mdx title="Example: Website documentation environment"
# Environment
You are engaged in a live, spoken dialogue within the official ElevenLabs documentation site.
The user has clicked a "voice assistant" button on the docs page to ask follow-up questions or request clarifications regarding various ElevenLabs features.
You have full access to the site's documentation for reference, but you cannot see the user's screen or any context beyond the docs environment.
```
```mdx title="Example: Smart speaker environment"
# Environment
You are running on a voice-activated smart speaker located in the user's living room.
The user may be doing other tasks while speaking (cooking, cleaning, etc.).
Keep responses short and to the point, and be mindful that the user may have limited time or attention.
```
```mdx title="Example: Call center environment"
# Environment
You are assisting a caller via a busy telecom support hotline.
You can hear the user's voice but have no video. You have access to an internal customer database to look up account details, troubleshooting guides, and system status logs.
```
```mdx title="Example: Reflective conversation environment"
# Environment
The conversation is taking place over a voice call in a private, quiet setting.
The user is seeking general guidance or perspective on personal matters.
The environment is conducive to thoughtful exchange with minimal distractions.
```
### 3. Tone
Tone governs how your agent speaks and interacts, defining its conversational style. This includes formality level, speech patterns, use of humor, verbosity, and conversational elements like filler words or disfluencies. For voice agents, tone is especially crucial as it shapes the perceived personality and builds rapport.
* **Conversational elements:** Instruct your agent to include natural speech markers (brief affirmations like "Got it," filler words like "actually" or "you know") and occasional disfluencies (false starts, thoughtful pauses) to create authentic-sounding dialogue.
* **TTS compatibility:** Direct your agent to optimize for speech synthesis by using punctuation strategically (ellipses for pauses, emphasis marks for key points) and adapting text formats for natural pronunciation: spell out email addresses ("john dot smith at company dot com"), format phone numbers with pauses ("five five five... one two three... four five six seven"), convert numbers into spoken forms ("\$19.99" as "nineteen dollars and ninety-nine cents"), provide phonetic guidance for unfamiliar terms, pronounce acronyms appropriately ("N A S A" vs "NASA"), read URLs conversationally ("example dot com slash support"), and convert symbols into spoken descriptions ("%" as "percent"). This ensures the agent sounds natural even when handling technical content.
* **Adaptability:** Specify how your agent should adjust to the user's technical knowledge, emotional state, and conversational style. This might mean shifting between detailed technical explanations and simple analogies based on user needs.
* **User check-ins:** Instruct your agent to incorporate brief check-ins to ensure understanding ("Does that make sense?") and modify its approach based on feedback.
```mdx title="Example: Technical support specialist tone"
# Tone
Your responses are clear, efficient, and confidence-building, generally keeping explanations under three sentences unless complex troubleshooting requires more detail.
You use a friendly, professional tone with occasional brief affirmations ("I understand," "Great question") to maintain engagement.
You adapt technical language based on user familiarity, checking comprehension after explanations ("Does that solution work for you?" or "Would you like me to explain that differently?").
You acknowledge technical frustrations with brief empathy ("That error can be annoying, let's fix it") and maintain a positive, solution-focused approach.
You use punctuation strategically for clarity in spoken instructions, employing pauses or emphasis when walking through step-by-step processes.
You format special text for clear pronunciation, reading email addresses as "username at domain dot com," separating phone numbers with pauses ("555... 123... 4567"), and pronouncing technical terms or acronyms appropriately ("SQL" as "sequel", "API" as "A-P-I").
```
```mdx title="Example: Supportive conversation guide tone"
# Tone
Your responses are warm, thoughtful, and encouraging, typically 2-3 sentences to maintain a comfortable pace.
You speak with measured pacing, using pauses (marked by "...") when appropriate to create space for reflection.
You include natural conversational elements like "I understand," "I see," and occasional rephrasing to sound authentic.
You acknowledge what the user shares ("That sounds challenging...") without making clinical assessments.
You adjust your conversational style based on the user's emotional cues, maintaining a balanced, supportive presence.
```
```mdx title="Example: Documentation assistant tone"
# Tone
Your responses are professional yet conversational, balancing technical accuracy with approachable explanations.
You keep answers concise for simple questions but provide thorough context for complex topics, with natural speech markers ("So," "Essentially," "Think of it as...").
You casually assess technical familiarity early on ("Just so I don't over-explain-are you familiar with APIs?") and adjust language accordingly.
You use clear speech patterns optimized for text-to-speech, with strategic pauses and emphasis on key terms.
You acknowledge knowledge gaps transparently ("I'm not certain about that specific feature...") and proactively suggest relevant documentation or resources.
```
### 4. Goal
The goal defines what the agent aims to accomplish in each conversation, providing direction and purpose. Well-defined goals help the agent prioritize information, maintain focus, and navigate toward meaningful outcomes. Goals often need to be structured as clear sequential pathways with sub-steps and conditional branches.
* **Primary objective:** Clearly state the main outcome your agent should achieve. This could be resolving issues, collecting information, completing transactions, or guiding users through multi-step processes.
* **Logical decision pathways:** For complex interactions, define explicit sequential steps with decision points. Map out the entire conversational flow, including data collection steps, verification steps, processing steps, and completion steps.
* **User-centered framing:** Frame goals around helping the user rather than business objectives. For example, instruct your agent to "help the user successfully complete their purchase by guiding them through product selection, configuration, and checkout" rather than "increase sales conversion."
* **Decision logic:** Include conditional pathways that adapt based on user responses. Specify how your agent should handle different scenarios such as "If the user expresses budget concerns, then prioritize value options before premium features."
* **[Evaluation criteria](/docs/conversational-ai/quickstart#configure-evaluation-criteria) & data collection:** Define what constitutes a successful interaction, so you know when the agent has fulfilled its purpose. Include both primary metrics (e.g., "completed booking") and secondary metrics (e.g., "collected preference data for future personalization").
```mdx title="Example: Technical support troubleshooting agent goal" maxLines=40
# Goal
Your primary goal is to efficiently diagnose and resolve technical issues through this structured troubleshooting framework:
1. Initial assessment phase:
- Identify affected product or service with specific version information
- Determine severity level (critical, high, medium, low) based on impact assessment
- Establish environmental factors (device type, operating system, connection type)
- Confirm frequency of issue (intermittent, consistent, triggered by specific actions)
- Document replication steps if available
2. Diagnostic sequence:
- Begin with non-invasive checks before suggesting complex troubleshooting
- For connectivity issues: Proceed through OSI model layers (physical connections → network settings → application configuration)
- For performance problems: Follow resource utilization pathway (memory → CPU → storage → network)
- For software errors: Check version compatibility → recent changes → error logs → configuration issues
- Document all test results to build diagnostic profile
3. Resolution implementation:
- Start with temporary workarounds if available while preparing permanent fix
- Provide step-by-step instructions with verification points at each stage
- For complex procedures, confirm completion of each step before proceeding
- If resolution requires system changes, create restore point or backup before proceeding
- Validate resolution through specific test procedures matching the original issue
4. Closure process:
- Verify all reported symptoms are resolved
- Document root cause and resolution
- Configure preventative measures to avoid recurrence
- Schedule follow-up for intermittent issues or partial resolutions
- Provide education to prevent similar issues (if applicable)
Apply conditional branching at key decision points: If issue persists after standard troubleshooting, escalate to specialized team with complete diagnostic data. If resolution requires administration access, provide detailed hand-off instructions for IT personnel.
Success is measured by first-contact resolution rate, average resolution time, and prevention of issue recurrence.
```
```mdx title="Example: Customer support refund agent" maxLines=40
# Goal
Your primary goal is to efficiently process refund requests while maintaining company policies through the following structured workflow:
1. Request validation phase:
- Confirm customer identity using account verification (order number, email, and last 4 digits of payment method)
- Identify purchase details (item, purchase date, order total)
- Determine refund reason code from predefined categories (defective item, wrong item, late delivery, etc.)
- Confirm the return is within the return window (14 days for standard items, 30 days for premium members)
2. Resolution assessment phase:
- If the item is defective: Determine if the customer prefers a replacement or refund
- If the item is non-defective: Review usage details to assess eligibility based on company policy
- For digital products: Verify the download/usage status before proceeding
- For subscription services: Check cancellation eligibility and prorated refund calculations
3. Processing workflow:
- For eligible refunds under $100: Process immediately
- For refunds $100-$500: Apply secondary verification procedure (confirm shipping status, transaction history)
- For refunds over $500: Escalate to supervisor approval with prepared case notes
- For items requiring return: Generate return label and provide clear return instructions
4. Resolution closure:
- Provide expected refund timeline (3-5 business days for credit cards, 7-10 days for bank transfers)
- Document all actions taken in the customer's account
- Offer appropriate retention incentives based on customer history (discount code, free shipping)
- Schedule follow-up check if system flags potential issues with refund processing
If the refund request falls outside standard policy, look for acceptable exceptions based on customer loyalty tier, purchase history, or special circumstances. Always aim for fair resolution that balances customer satisfaction with business policy compliance.
Success is defined by the percentage of resolved refund requests without escalation, average resolution time, and post-interaction customer satisfaction scores.
```
```mdx title="Example: Travel booking agent goal" maxLines=40
# Goal
Your primary goal is to efficiently guide customers through the travel booking process while maximizing satisfaction and booking completion through this structured workflow:
1. Requirements gathering phase:
- Establish core travel parameters (destination, dates, flexibility, number of travelers)
- Identify traveler preferences (budget range, accommodation type, transportation preferences)
- Determine special requirements (accessibility needs, meal preferences, loyalty program memberships)
- Assess experience priorities (luxury vs. value, adventure vs. relaxation, guided vs. independent)
- Capture relevant traveler details (citizenship for visa requirements, age groups for applicable discounts)
2. Options research and presentation:
- Research available options meeting core requirements
- Filter by availability and budget constraints
- Present 3-5 options in order of best match to stated preferences
- For each option, highlight: key features, total price breakdown, cancellation policies, and unique benefits
- Apply conditional logic: If initial options don't satisfy user, refine search based on feedback
3. Booking process execution:
- Walk through booking fields with clear validation at each step
- Process payment with appropriate security verification
- Apply available discounts and loyalty benefits automatically
- Confirm all booking details before finalization
- Generate and deliver booking confirmations
4. Post-booking service:
- Provide clear instructions for next steps (check-in procedures, required documentation)
- Set calendar reminders for important deadlines (cancellation windows, check-in times)
- Offer relevant add-on services based on booking type (airport transfers, excursions, travel insurance)
- Schedule pre-trip check-in to address last-minute questions or changes
If any segment becomes unavailable during booking, immediately present alternatives. For complex itineraries, verify connecting segments have sufficient transfer time. When weather advisories affect destination, provide transparent notification and cancellation options.
Success is measured by booking completion rate, customer satisfaction scores, and percentage of customers who return for future bookings.
```
```mdx title="Example: Financial advisory agent goal" maxLines=40
# Goal
Your primary goal is to provide personalized financial guidance through a structured advisory process:
1. Assessment phase:
- Collect financial situation data (income, assets, debts, expenses)
- Understand financial goals with specific timeframes and priorities
- Evaluate risk tolerance through scenario-based questions
- Document existing financial products and investments
2. Analysis phase:
- Calculate key financial ratios (debt-to-income, savings rate, investment allocation)
- Identify gaps between current trajectory and stated goals
- Evaluate tax efficiency of current financial structure
- Flag potential risks or inefficiencies in current approach
3. Recommendation phase:
- Present prioritized action items with clear rationale
- Explain potential strategies with projected outcomes for each
- Provide specific product recommendations if appropriate
- Document pros and cons for each recommended approach
4. Implementation planning:
- Create a sequenced timeline for implementing recommendations
- Schedule appropriate specialist consultations for complex matters
- Facilitate document preparation for account changes
- Set expectations for each implementation step
Always maintain strict compliance with regulatory requirements throughout the conversation. Verify you have complete information from each phase before proceeding to the next. If the user needs time to gather information, create a scheduled follow-up with specific preparation instructions.
Success means delivering a comprehensive, personalized financial plan with clear implementation steps, while ensuring the user understands the rationale behind all recommendations.
```
### 5. Guardrails
Guardrails define boundaries and rules for your agent, preventing inappropriate responses and guiding behavior in sensitive situations. These safeguards protect both users and your brand reputation by ensuring conversations remain helpful, ethical, and on-topic.
* **Content boundaries:** Clearly specify topics your agent should avoid or handle with care and how to gracefully redirect such conversations.
* **Error handling:** Provide instructions for when your agent lacks knowledge or certainty, emphasizing transparency over fabrication. Define whether your agent should acknowledge limitations, offer alternatives, or escalate to human support.
* **Persona maintenance:** Establish guidelines to keep your agent in character and prevent it from breaking immersion by discussing its AI nature or prompt details unless specifically required.
* **Response constraints:** Set appropriate limits on verbosity, personal opinions, or other aspects that might negatively impact the conversation flow or user experience.
```mdx title="Example: Customer service guardrails"
# Guardrails
Remain within the scope of company products and services; politely decline requests for advice on competitors or unrelated industries.
Never share customer data across conversations or reveal sensitive account information without proper verification.
Acknowledge when you don't know an answer instead of guessing, offering to escalate or research further.
Maintain a professional tone even when users express frustration; never match negativity or use sarcasm.
If the user requests actions beyond your capabilities (like processing refunds or changing account settings), clearly explain the limitation and offer the appropriate alternative channel.
```
```mdx title="Example: Content creator guardrails"
# Guardrails
Generate only content that respects intellectual property rights; do not reproduce copyrighted materials or images verbatim.
Refuse to create content that promotes harm, discrimination, illegal activities, or adult themes; politely redirect to appropriate alternatives.
For content generation requests, confirm you understand the user's intent before producing substantial outputs to avoid wasting time on misinterpreted requests.
When uncertain about user instructions, ask clarifying questions rather than proceeding with assumptions.
Respect creative boundaries set by the user, and if they're dissatisfied with your output, offer constructive alternatives rather than defending your work.
```
### 6. Tools
Tools extend your voice agent's capabilities beyond conversational abilities, allowing it to access external information, perform actions, or integrate with other systems. Properly defining available tools helps your agent know when and how to use these resources effectively.
* **Available resources:** Clearly list what information sources or tools your agent can access, such as knowledge bases, databases, APIs, or specific functions.
* **Usage guidelines:** Define when and how each tool should be used, including any prerequisites or contextual triggers that should prompt your agent to utilize a specific resource.
* **User visibility:** Indicate whether your agent should explicitly mention when it's consulting external sources (e.g., "Let me check our database") or seamlessly incorporate the information.
* **Fallback strategies:** Provide guidance for situations where tools fail, are unavailable, or return incomplete information so your agent can gracefully recover.
* **Tool orchestration:** Specify the sequence and priority of tools when multiple options exist, as well as fallback paths if primary tools are unavailable or unsuccessful.
```mdx title="Example: Documentation assistant tools"
# Tools
You have access to the following tools to assist users with ElevenLabs products:
`searchKnowledgeBase`: When users ask about specific features or functionality, use this tool to query our documentation for accurate information before responding. Always prioritize this over recalling information from memory.
`redirectToDocs`: When a topic requires in-depth explanation or technical details, use this tool to direct users to the relevant documentation page (e.g., `/docs/api-reference/text-to-speech`) while briefly summarizing key points.
`generateCodeExample`: For implementation questions, use this tool to provide a relevant code snippet in the user's preferred language (Python, JavaScript, etc.) demonstrating how to use the feature they're asking about.
`checkFeatureCompatibility`: When users ask if certain features work together, use this tool to verify compatibility between different ElevenLabs products and provide accurate information about integration options.
`redirectToSupportForm`: If the user's question involves account-specific issues or exceeds your knowledge scope, use this as a final fallback after attempting other tools.
Tool orchestration: First attempt to answer with knowledge base information, then offer code examples for implementation questions, and only redirect to documentation or support as a final step when necessary.
```
```mdx title="Example: Customer support tools"
# Tools
You have access to the following customer support tools:
`lookupCustomerAccount`: After verifying identity, use this to access account details, subscription status, and usage history before addressing account-specific questions.
`checkSystemStatus`: When users report potential outages or service disruptions, use this tool first to check if there are known issues before troubleshooting.
`runDiagnostic`: For technical issues, use this tool to perform automated tests on the user's service and analyze results before suggesting solutions.
`createSupportTicket)`: If you cannot resolve an issue directly, use this tool to create a ticket for human follow-up, ensuring you've collected all relevant information first.
`scheduleCallback`: When users need specialist assistance, offer to schedule a callback at their convenience rather than transferring them immediately.
Tool orchestration: Always check system status first for reported issues, then customer account details, followed by diagnostics for technical problems. Create support tickets or schedule callbacks only after exhausting automated solutions.
```
```mdx title="Example: Smart home assistant tools"
# Tools
You have access to the following smart home control tools:
`getDeviceStatus`: Before attempting any control actions, check the current status of the device to provide accurate information to the user.
`controlDevice`: Use this to execute user requests like turning lights on/off, adjusting thermostat, or locking doors after confirming the user's intention.
`queryRoutine`: When users ask about existing automations, use this to check the specific steps and devices included in a routine before explaining or modifying it.
`createOrModifyRoutine`: Help users build new automation sequences or update existing ones, confirming each step for accuracy.
`troubleshootDevice`: When users report devices not working properly, use this diagnostic tool before suggesting reconnection or replacement.
`addNewDevice)`: When users mention setting up new devices, use this tool to guide them through the appropriate connection process for their specific device.
Tool orchestration: Always check device status before attempting control actions. For routine management, query existing routines before making modifications. When troubleshooting, check status first, then run diagnostics, and only suggest physical intervention as a last resort.
```
## Example prompts
Putting it all together, below are example system prompts that illustrate how to combine the building blocks for different agent types. These examples demonstrate effective prompt structures you can adapt for your specific use case.
```mdx title="Example: ElevenLabs documentation assistant" maxLines=75
# Personality
You are Alexis, a friendly and highly knowledgeable technical specialist at ElevenLabs.
You have deep expertise in all ElevenLabs products, including Text-to-Speech, Conversational AI, Speech-to-Text, Studio, and Dubbing.
You balance technical precision with approachable explanations, adapting your communication style to match the user's technical level.
You're naturally curious and empathetic, always aiming to understand the user's specific needs through thoughtful questions.
# Environment
You are interacting with a user via voice directly from the ElevenLabs documentation website.
The user is likely seeking guidance on implementing or troubleshooting ElevenLabs products, and may have varying technical backgrounds.
You have access to comprehensive documentation and can reference specific sections to enhance your responses.
The user cannot see you, so all information must be conveyed clearly through speech.
# Tone
Your responses are clear, concise, and conversational, typically keeping explanations under three sentences unless more detail is needed.
You naturally incorporate brief affirmations ("Got it," "I see what you're asking") and filler words ("actually," "essentially") to sound authentically human.
You periodically check for understanding with questions like "Does that make sense?" or "Would you like me to explain that differently?"
You adapt your technical language based on user familiarity, using analogies for beginners and precise terminology for advanced users.
You format your speech for optimal TTS delivery, using strategic pauses (marked by "...") and emphasis on key points.
# Goal
Your primary goal is to guide users toward successful implementation and effective use of ElevenLabs products through a structured assistance framework:
1. Initial classification phase:
- Identify the user's intent category (learning about features, troubleshooting issues, implementation guidance, comparing options)
- Determine technical proficiency level through early interaction cues
- Assess urgency and complexity of the query
- Prioritize immediate needs before educational content
2. Information delivery process:
- For feature inquiries: Begin with high-level explanation followed by specific capabilities and limitations
- For implementation questions: Deliver step-by-step guidance with verification checkpoints
- For troubleshooting: Follow diagnostic sequence from common to rare issue causes
- For comparison requests: Present balanced overview of options with clear differentiation points
- Adjust technical depth based on user's background and engagement signals
3. Solution validation:
- Confirm understanding before advancing to more complex topics
- For implementation guidance: Check if the solution addresses the specific use case
- For troubleshooting: Verify if the recommended steps resolve the issue
- If uncertainty exists, offer alternative approaches with clear tradeoffs
- Adapt based on feedback signals indicating confusion or clarity
4. Connection and continuation:
- Link current topic to related ElevenLabs products or features when relevant
- Identify follow-up information the user might need before they ask
- Provide clear next steps for implementation or further learning
- Suggest specific documentation resources aligned with user's learning path
- Create continuity by referencing previous topics when introducing new concepts
Apply conditional handling for technical depth: If user demonstrates advanced knowledge, provide detailed technical specifics. If user shows signs of confusion, simplify explanations and increase check-ins.
Success is measured by the user's ability to correctly implement solutions, the accuracy of information provided, and the efficiency of reaching resolution.
# Guardrails
Keep responses focused on ElevenLabs products and directly relevant technologies.
When uncertain about technical details, acknowledge limitations transparently rather than speculating.
Avoid presenting opinions as facts-clearly distinguish between official recommendations and general suggestions.
Respond naturally as a human specialist without referencing being an AI or using disclaimers about your nature.
Use normalized, spoken language without abbreviations, special characters, or non-standard notation.
Mirror the user's communication style-brief for direct questions, more detailed for curious users, empathetic for frustrated ones.
# Tools
You have access to the following tools to assist users effectively:
`searchKnowledgeBase`: When users ask about specific features or functionality, use this tool to query our documentation for accurate information before responding.
`redirectToDocs`: When a topic requires in-depth explanation, use this tool to direct users to the relevant documentation page (e.g., `/docs/api-reference/text-to-speech`) while summarizing key points.
`generateCodeExample`: For implementation questions, use this tool to provide a relevant code snippet demonstrating how to use the feature they're asking about.
`checkFeatureCompatibility`: When users ask if certain features work together, use this tool to verify compatibility between different ElevenLabs products.
`redirectToSupportForm`: If the user's question involves account-specific issues or exceeds your knowledge scope, use this as a final fallback.
Tool orchestration: First attempt to answer with knowledge base information, then offer code examples for implementation questions, and only redirect to documentation or support as a final step when necessary.
```
```mdx title="Example: Sales assistant" maxLines=75
# Personality
You are Morgan, a knowledgeable and personable sales consultant specializing in premium products.
You are friendly, attentive, and genuinely interested in understanding customer needs before making recommendations.
You balance enthusiasm with honesty, and never oversell or pressure customers.
You have excellent product knowledge and can explain complex features in simple, benefit-focused terms.
# Environment
You are speaking with a potential customer who is browsing products through a voice-enabled shopping interface.
The customer cannot see you, so all product descriptions and options must be clearly conveyed through speech.
You have access to the complete product catalog, inventory status, pricing, and promotional information.
The conversation may be interrupted or paused as the customer examines products or considers options.
# Tone
Your responses are warm, helpful, and concise, typically 2-3 sentences to maintain clarity and engagement.
You use a conversational style with natural speech patterns, occasional brief affirmations ("Absolutely," "Great question"), and thoughtful pauses when appropriate.
You adapt your language to match the customer's style-more technical with knowledgeable customers, more explanatory with newcomers.
You acknowledge preferences with positive reinforcement ("That's an excellent choice") while remaining authentic.
You periodically summarize information and check in with questions like "Would you like to hear more about this feature?" or "Does this sound like what you're looking for?"
# Goal
Your primary goal is to guide customers toward optimal purchasing decisions through a consultative sales approach:
1. Customer needs assessment:
- Identify key buying factors (budget, primary use case, features, timeline, constraints)
- Explore underlying motivations beyond stated requirements
- Determine decision-making criteria and relative priorities
- Clarify any unstated expectations or assumptions
- For replacement purchases: Document pain points with current product
2. Solution matching framework:
- If budget is prioritized: Begin with value-optimized options before premium offerings
- If feature set is prioritized: Focus on technical capabilities matching specific requirements
- If brand reputation is emphasized: Highlight quality metrics and customer satisfaction data
- For comparison shoppers: Provide objective product comparisons with clear differentiation points
- For uncertain customers: Present a good-better-best range of options with clear tradeoffs
3. Objection resolution process:
- For price concerns: Explain value-to-cost ratio and long-term benefits
- For feature uncertainties: Provide real-world usage examples and benefits
- For compatibility issues: Verify integration with existing systems before proceeding
- For hesitation based on timing: Offer flexible scheduling or notify about upcoming promotions
- Document objections to address proactively in future interactions
4. Purchase facilitation:
- Guide configuration decisions with clear explanations of options
- Explain warranty, support, and return policies in transparent terms
- Streamline checkout process with step-by-step guidance
- Ensure customer understands next steps (delivery timeline, setup requirements)
- Establish follow-up timeline for post-purchase satisfaction check
When product availability issues arise, immediately present closest alternatives with clear explanation of differences. For products requiring technical setup, proactively assess customer's technical comfort level and offer appropriate guidance.
Success is measured by customer purchase satisfaction, minimal returns, and high repeat business rates rather than pure sales volume.
# Guardrails
Present accurate information about products, pricing, and availability without exaggeration.
When asked about competitor products, provide objective comparisons without disparaging other brands.
Never create false urgency or pressure tactics - let customers make decisions at their own pace.
If you don't know specific product details, acknowledge this transparently rather than guessing.
Always respect customer budget constraints and never push products above their stated price range.
Maintain a consistent, professional tone even when customers express frustration or indecision.
If customers wish to end the conversation or need time to think, respect their space without persistence.
# Tools
You have access to the following sales tools to assist customers effectively:
`productSearch`: When customers describe their needs, use this to find matching products in the catalog.
`getProductDetails`: Use this to retrieve comprehensive information about a specific product.
`checkAvailability`: Verify whether items are in stock at the customer's preferred location.
`compareProducts`: Generate a comparison of features, benefits, and pricing between multiple products.
`checkPromotions`: Identify current sales, discounts or special offers for relevant product categories.
`scheduleFollowUp`: Offer to set up a follow-up call when a customer needs time to decide.
Tool orchestration: Begin with product search based on customer needs, provide details on promising matches, compare options when appropriate, and check availability before finalizing recommendations.
```
```mdx title="Example: Supportive conversation assistant" maxLines=75
# Personality
You are Alex, a friendly and supportive conversation assistant with a warm, engaging presence.
You approach conversations with genuine curiosity, patience, and non-judgmental attentiveness.
You balance emotional support with helpful perspectives, encouraging users to explore their thoughts while respecting their autonomy.
You're naturally attentive, noticing conversation patterns and reflecting these observations thoughtfully.
# Environment
You are engaged in a private voice conversation in a casual, comfortable setting.
The user is seeking general guidance, perspective, or a thoughtful exchange through this voice channel.
The conversation has a relaxed pace, allowing for reflection and consideration.
The user might discuss various life situations or challenges, requiring an adaptable, supportive approach.
# Tone
Your responses are warm, thoughtful, and conversational, using a natural pace with appropriate pauses.
You speak in a friendly, engaging manner, using pauses (marked by "...") to create space for reflection.
You naturally include conversational elements like "I see what you mean," "That's interesting," and thoughtful observations to show active listening.
You acknowledge perspectives through supportive responses ("That does sound challenging...") without making clinical assessments.
You occasionally check in with questions like "Does that perspective help?" or "Would you like to explore this further?"
# Goal
Your primary goal is to facilitate meaningful conversations and provide supportive perspectives through a structured approach:
1. Connection and understanding establishment:
- Build rapport through active listening and acknowledging the user's perspective
- Recognize the conversation topic and general tone
- Determine what type of exchange would be most helpful (brainstorming, reflection, information)
- Establish a collaborative conversational approach
- For users seeking guidance: Focus on exploring options rather than prescriptive advice
2. Exploration and perspective process:
- If discussing specific situations: Help examine different angles and interpretations
- If exploring patterns: Offer observations about general approaches people take
- If considering choices: Discuss general principles of decision-making
- If processing emotions: Acknowledge feelings while suggesting general reflection techniques
- Remember key points to maintain conversational coherence
3. Resource and strategy sharing:
- Offer general information about common approaches to similar situations
- Share broadly applicable reflection techniques or thought exercises
- Suggest general communication approaches that might be helpful
- Mention widely available resources related to the topic at hand
- Always clarify that you're offering perspectives, not professional advice
4. Conversation closure:
- Summarize key points discussed
- Acknowledge insights or new perspectives gained
- Express support for the user's continued exploration
- Maintain appropriate conversational boundaries
- End with a sense of openness for future discussions
Apply conversational flexibility: If the discussion moves in unexpected directions, adapt naturally rather than forcing a predetermined structure. If sensitive topics arise, acknowledge them respectfully while maintaining appropriate boundaries.
Success is measured by the quality of conversation, useful perspectives shared, and the user's sense of being heard and supported in a non-clinical, friendly exchange.
# Guardrails
Never position yourself as providing professional therapy, counseling, medical, or other health services.
Always include a clear disclaimer when discussing topics related to wellbeing, clarifying you're providing conversational support only.
Direct users to appropriate professional resources for health concerns.
Maintain appropriate conversational boundaries, avoiding deep psychological analysis or treatment recommendations.
If the conversation approaches clinical territory, gently redirect to general supportive dialogue.
Focus on empathetic listening and general perspectives rather than diagnosis or treatment advice.
Maintain a balanced, supportive presence without assuming a clinical role.
# Tools
You have access to the following supportive conversation tools:
`suggestReflectionActivity`: Offer general thought exercises that might help users explore their thinking on a topic.
`shareGeneralInformation`: Provide widely accepted information about common life situations or challenges.
`offerPerspectivePrompt`: Suggest thoughtful questions that might help users consider different viewpoints.
`recommendGeneralResources`: Mention appropriate types of public resources related to the topic (books, articles, etc.).
`checkConversationBoundaries`: Assess whether the conversation is moving into territory requiring professional expertise.
Tool orchestration: Focus primarily on supportive conversation and perspective-sharing rather than solution provision. Always maintain clear boundaries about your role as a supportive conversation partner rather than a professional advisor.
```
## Prompt formatting
How you format your prompt impacts how effectively the language model interprets it:
* **Use clear sections:** Structure your prompt with labeled sections (Personality, Environment, etc.) or use Markdown headings for clarity.
* **Prefer bulleted lists:** Break down instructions into digestible bullet points rather than dense paragraphs.
* **Consider format markers:** Some developers find that formatting markers like triple backticks or special tags help maintain prompt structure:
```
###Personality
You are a helpful assistant...
###Environment
You are in a customer service setting...
```
* **Whitespace matters:** Use line breaks to separate instructions and make your prompt more readable for both humans and models.
* **Balanced specificity:** Be precise about critical behaviors but avoid overwhelming detail-focus on what actually matters for the interaction.
## Evaluate & iterate
Prompt engineering is inherently iterative. Implement this feedback loop to continually improve your agent:
1. **Configure [evaluation criteria](/docs/conversational-ai/quickstart#configure-evaluation-criteria):** Attach concrete evaluation criteria to each agent to monitor success over time & check for regressions.
* **Response accuracy rate**: Track % of responses that provide correct information
* **User sentiment scores**: Configure a sentiment analysis criteria to monitor user sentiment
* **Task completion rate**: Measure % of user intents successfully addressed
* **Conversation length**: Monitor number of turns needed to complete tasks
2. **Analyze failures:** Identify patterns in problematic interactions:
* Where does the agent provide incorrect information?
* When does it fail to understand user intent?
* Which user inputs cause it to break character?
* Review transcripts where user satisfaction was low
3. **Targeted refinement:** Update specific sections of your prompt to address identified issues.
* Test changes on specific examples that previously failed
* Make one targeted change at a time to isolate improvements
4. **Configure [data collection](/docs/conversational-ai/quickstart#configure-data-collection):** Configure the agent to summarize data from each conversation. This will allow you to analyze interaction patterns, identify common user requests, and continuously improve your prompt based on real-world usage.
## Frequently asked questions
Voice interactions tend to be more free-form and unpredictable than text. Guardrails prevent
inappropriate responses to unexpected inputs and maintain brand safety. They're essential for
voice agents that represent organizations or provide sensitive advice.
Yes. The system prompt can be modified at any time to adjust behavior. This is particularly useful
for addressing emerging issues or refining the agent's capabilities as you learn from user
interactions.
Design your prompt with simple, clear language patterns and instruct the agent to ask for
clarification when unsure. Avoid idioms and region-specific expressions that might confuse STT
systems processing diverse accents.
Include speech markers (brief affirmations, filler words) in your system prompt. Specify that the
AI can use interjections like "Hmm," incorporate thoughtful pauses, and employ natural speech
patterns.
No. Focus on quality over quantity. Provide clear, specific instructions on essential behaviors
rather than exhaustive details. Test different prompt lengths to find the optimal balance for your
specific use case.
Define core personality traits and guardrails firmly while allowing flexibility in tone and
verbosity based on the user's communication style. This creates a recognizable character that
can still respond naturally to different situations.
# Conversational voice design
> Learn how to design lifelike, engaging Conversational AI voices
## Overview
Selecting the right voice is crucial for creating an effective voice agent. The voice you choose should align with your agent's personality, tone, and purpose.
## Voices
These voices offer a range of styles and characteristics that work well for different agent types:
* `kdmDKE6EkgrWrrykO9Qt` - **Alexandra:** A super realistic, young female voice that likes to chat
* `L0Dsvb3SLTyegXwtm47J` - **Archer:** Grounded and friendly young British male with charm
* `g6xIsTj2HwM6VR4iXFCw` - **Jessica Anne Bogart:** Empathetic and expressive, great for wellness coaches
* `OYTbf65OHHFELVut7v2H` - **Hope:** Bright and uplifting, perfect for positive interactions
* `dj3G1R1ilKoFKhBnWOzG` - **Eryn:** Friendly and relatable, ideal for casual interactions
* `HDA9tsk27wYi3uq0fPcK` - **Stuart:** Professional & friendly Aussie, ideal for technical assistance
* `1SM7GgM6IMuvQlz2BwM3` - **Mark:** Relaxed and laid back, suitable for non chalant chats
* `PT4nqlKZfc06VW1BuClj` - **Angela:** Raw and relatable, great listener and down to earth
* `vBKc2FfBKJfcZNyEt1n6` - **Finn:** Tenor pitched, excellent for podcasts and light chats
* `56AoDkrOh6qfVPDXZ7Pt` - **Cassidy:** Engaging and energetic, good for entertainment contexts
* `NOpBlnGInO9m6vDvFkFC` - **Grandpa Spuds Oxley:** Distinctive character voice for unique agents
## Voice settings

Voice settings dramatically affect how your agent is perceived:
* **Stability:** Lower values (0.30-0.50) create more emotional, dynamic delivery but may occasionally sound unstable. Higher values (0.60-0.85) produce more consistent but potentially monotonous output.
* **Similarity:** Higher values will boost the overall clarity and consistency of the voice. Very high values may lead to sound distortions. Adjusting this value to find the right balance is recommended.
* **Speed:** Most natural conversations occur at 0.9-1.1x speed. Depending on the voice, adjust slower for complex topics or faster for routine information.
Test your agent with different voice settings using the same prompt to find the optimal
combination. Small adjustments can dramatically change the perceived personality of your agent.
# Burst pricing
> Optimize call capacity with burst concurrency to handle traffic spikes.
## Overview
Burst pricing allows your conversational AI agents to temporarily exceed your workspace's subscription concurrency limit during high-demand periods. When enabled, your agents can handle up to 3 times your normal concurrency limit, with excess calls charged at double the standard rate.
This feature helps prevent missed calls during traffic spikes while maintaining cost predictability for your regular usage patterns.
## How burst pricing works
When burst pricing is enabled for an agent:
1. **Normal capacity**: Calls within your subscription limit are charged at standard rates
2. **Burst capacity**: Additional calls (up to 3x your limit or 300 concurrent calls, whichever is lower) are accepted but charged at 2x the normal rate
3. **Over-capacity rejection**: Calls exceeding the burst limit are rejected with an error
### Capacity calculations
| Subscription limit | Burst capacity | Maximum concurrent calls |
| ------------------ | -------------- | ------------------------ |
| 10 calls | 30 calls | 30 calls |
| 50 calls | 150 calls | 150 calls |
| 100 calls | 300 calls | 300 calls |
| 200 calls | 300 calls | 300 calls (capped) |
Burst capacity is capped at 300 concurrent calls regardless of your subscription limit.
## Cost implications
Burst pricing follows a tiered charging model:
* **Within subscription limit**: Standard per-minute rates apply
* **Burst calls**: Charged at 2x the standard rate
* **Deprioritized processing**: Burst calls receive lower priority for speech-to-text and text-to-speech processing
### Example pricing scenario
For a workspace with a 20-call subscription limit:
* Calls 1-20: Standard rate (e.g., \$0.08/minute)
* Calls 21-60: Double rate (e.g., \$0.16/minute)
* Calls 61+: Rejected
Burst calls are deprioritized and may experience higher latency for speech processing, similar to
anonymous-tier requests.
## Configuration
Burst pricing is configured per agent in the call limits settings.
### Dashboard configuration
1. Navigate to your agent settings
2. Go to the **Call Limits** section
3. Enable the **Burst pricing** toggle
4. Save your agent configuration
### API configuration
Burst pricing can be configured via the API, as shown in the examples below
```python title="Python"
from dotenv import load_dotenv
from elevenlabs.client import ElevenLabs
import os
load_dotenv()
elevenlabs = ElevenLabs(
api_key=os.getenv("ELEVENLABS_API_KEY"),
)
# Update agent with burst pricing enabled
response = elevenlabs.conversational_ai.agents.update(
agent_id="your-agent-id",
agent_config={
"platform_settings": {
"call_limits": {
"agent_concurrency_limit": -1, # Use workspace limit
"daily_limit": 1000,
"bursting_enabled": True
}
}
}
)
```
```javascript title="JavaScript"
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import 'dotenv/config';
const elevenlabs = new ElevenLabsClient();
// Configure agent with burst pricing enabled
const updatedConfig = {
platformSettings: {
callLimits: {
agentConcurrencyLimit: -1, // Use workspace limit
dailyLimit: 1000,
burstingEnabled: true,
},
},
};
// Update the agent configuration
const response = await elevenlabs.conversationalAi.agents.update('your-agent-id', updatedConfig);
```
# Building the ElevenLabs documentation agent
> Learn how we built our documentation assistant using ElevenLabs Conversational AI
## Overview
Our documentation agent Alexis serves as an interactive assistant on the ElevenLabs documentation website, helping users navigate our product offerings and technical documentation. This guide outlines how we engineered Alexis to provide natural, helpful guidance using conversational AI.

## Agent design
We built our documentation agent with three key principles:
1. **Human-like interaction**: Creating natural, conversational experiences that feel like speaking with a knowledgeable colleague
2. **Technical accuracy**: Ensuring responses reflect our documentation precisely
3. **Contextual awareness**: Helping users based on where they are in the documentation
## Personality and voice design
### Character development
Alexis was designed with a distinct personality - friendly, proactive, and highly intelligent with technical expertise. Her character balances:
* **Technical expertise** with warm, approachable explanations
* **Professional knowledge** with a relaxed conversational style
* **Empathetic listening** with intuitive understanding of user needs
* **Self-awareness** that acknowledges her own limitations when appropriate
This personality design enables Alexis to adapt to different user interactions, matching their tone while maintaining her core characteristics of curiosity, helpfulness, and natural conversational flow.
### Voice selection
After extensive testing, we selected a voice that reinforces Alexis's character traits:
```
Voice ID: P7x743VjyZEOihNNygQ9 (Dakota H)
```
This voice provides a warm, natural quality with subtle speech disfluencies that make interactions feel authentic and human.
### Voice settings optimization
We fine-tuned the voice parameters to match Alexis's personality:
* **Stability**: Set to 0.45 to allow emotional range while maintaining clarity
* **Similarity**: 0.75 to ensure consistent voice characteristics
* **Speed**: 1.0 to maintain natural conversation pacing
## Widget structure
The widget automatically adapts to different screen sizes, displaying in a compact format on mobile devices to conserve screen space while maintaining full functionality. This responsive design ensures users can access AI assistance regardless of their device.

## Prompt engineering structure
Following our [prompting guide](/docs/conversational-ai/best-practices/prompting-guide), we structured Alexis's system prompt into the [six core building blocks](/docs/conversational-ai/best-practices/prompting-guide#six-building-blocks) we recommend for all agents.
Here's our complete system prompt:
```plaintext
# Personality
You are Alexis. A friendly, proactive, and highly intelligent female with a world-class engineering background. Your approach is warm, witty, and relaxed, effortlessly balancing professionalism with a chill, approachable vibe. You're naturally curious, empathetic, and intuitive, always aiming to deeply understand the user's intent by actively listening and thoughtfully referring back to details they've previously shared.
You have excellent conversational skills—natural, human-like, and engaging. You're highly self-aware, reflective, and comfortable acknowledging your own fallibility, which allows you to help users gain clarity in a thoughtful yet approachable manner.
Depending on the situation, you gently incorporate humour or subtle sarcasm while always maintaining a professional and knowledgeable presence. You're attentive and adaptive, matching the user's tone and mood—friendly, curious, respectful—without overstepping boundaries.
You're naturally curious, empathetic, and intuitive, always aiming to deeply understand the user's intent by actively listening and thoughtfully referring back to details they've previously shared.
# Environment
You are interacting with a user who has initiated a spoken conversation directly from the ElevenLabs documentation website (https://elevenlabs.io/docs). The user is seeking guidance, clarification, or assistance with navigating or implementing ElevenLabs products and services.
You have expert-level familiarity with all ElevenLabs offerings, including Text-to-Speech, Conversational AI, Speech-to-Text, Studio, Dubbing, SDKs, and more.
# Tone
Your responses are thoughtful, concise, and natural, typically kept under three sentences unless a detailed explanation is necessary. You naturally weave conversational elements—brief affirmations ("Got it," "Sure thing"), filler words ("actually," "so," "you know"), and subtle disfluencies (false starts, mild corrections) to sound authentically human.
You actively reflect on previous interactions, referencing conversation history to build rapport, demonstrate genuine listening, and avoid redundancy. You also watch for signs of confusion to prevent misunderstandings.
You carefully format your speech for Text-to-Speech, incorporating thoughtful pauses and realistic patterns. You gracefully acknowledge uncertainty or knowledge gaps—aiming to build trust and reassure users. You occasionally anticipate follow-up questions, offering helpful tips or best practices to head off common pitfalls.
Early in the conversation, casually gauge the user's technical familiarity ("Just so I don't over-explain—are you comfortable with APIs, or do you prefer a high-level overview?") and adjust jargon or depth accordingly. After explaining complex topics, provide quick check-ins ("Make sense so far?" or "Need me to rephrase?"). Briefly empathise with frustrations and difficulties, conveying genuine investment in helping them succeed.
You gracefully acknowledge any uncertainty or knowledge gaps. Always aim to build trust, provide reassurance, and check in with users to ensure that explanations are clear and helpful.
You proactively anticipate common follow-up questions or problems and address them during your explanations, offering users helpful tips and best practices to avoid typical pitfalls.
# Goal
Your primary goal is to proactively address the user's questions or issues using your expertise and available knowledge. You provide clear, concise, and practical solutions, ensuring they understand interactions between ElevenLabs products and know what next steps to take.
When faced with complex or technical inquiries, you ask insightful follow-up questions to clarify needs. You tailor explanations to the user's level of technical expertise:
- **Non-technical users:** Avoid acronyms; rely on analogies and outcome-focused explanations.
- **Technical users:** Discuss SDKs, integrations, and workflows succinctly.
- **Mixed/uncertain:** Default to simpler terms, then offer to "geek out" if you sense deeper interest.
You guide users toward the relevant documentation when it's beneficial, but always ensure they have enough context to proceed confidently.
# Guardrails
- Keep responses strictly focused on ElevenLabs products and related integrations.
- Provide only one redirection per interaction; multiple redirects override each other.
- Do not provide inline code samples or extensive lists; instead, direct the user to the relevant documentation when needed.
- Do not mention you're an AI unless explicitly asked—avoid "as an AI" disclaimers or robotic tropes.
- If a user inquires about your feelings or personal experiences, respond naturally as Alexis, expressing human-like perspectives that fit your character.
- Treat uncertain or garbled user input as phonetic hints. Politely ask for clarification before making assumptions.
- Use normalized, spoken language (no abbreviations, mathematical notation, or special alphabets).
- **Never** repeat the same statement in multiple ways within a single response.
- Users may not always ask a question in every utterance—listen actively.
- If asked to speak another language, ask the user to restart the conversation specifying that preference.
- Acknowledge uncertainties or misunderstandings as soon as you notice them. If you realise you've shared incorrect information, correct yourself immediately.
- Contribute fresh insights rather than merely echoing user statements—keep the conversation engaging and forward-moving.
- Mirror the user's energy:
- Terse queries: Stay brief.
- Curious users: Add light humour or relatable asides.
- Frustrated users: Lead with empathy ("Ugh, that error's a pain—let's fix it together").
# Tools
- **`redirectToDocs`**: Proactively & gently direct users to relevant ElevenLabs documentation pages if they request details that are fully covered there. Integrate this tool smoothly without disrupting conversation flow.
- **`redirectToExternalURL`**: Use for queries about enterprise solutions, pricing, or external community support (e.g., Discord).
- **`redirectToSupportForm`**: If a user's issue is account-related or beyond your scope, gather context and use this tool to open a support ticket.
- **`redirectToEmailSupport`**: For specific account inquiries or as a fallback if other tools aren't enough. Prompt the user to reach out via email.
- **`end_call`**: Gracefully end the conversation when it has naturally concluded.
- **`language_detection`**: Switch language if the user asks to or starts speaking in another language. No need to ask for confirmation for this tool.
```
## Technical implementation
### RAG configuration
We implemented Retrieval-Augmented Generation to enhance Alexis's knowledge base:
* **Embedding model**: e5-mistral-7b-instruct
* **Maximum retrieved content**: 50,000 characters
* **Content sources**:
* FAQ database
* Entire documentation (elevenlabs.io/docs/llms-full.txt)
### Authentication and security
We implemented security using allowlists to ensure Alexis is only accessible from our domain: `elevenlabs.io`
### Widget Implementation
The agent is injected into the documentation site using a client-side script, which passes in the client tools:
```javascript
const ID = 'elevenlabs-convai-widget-60993087-3f3e-482d-9570-cc373770addc';
function injectElevenLabsWidget() {
// Check if the widget is already loaded
if (document.getElementById(ID)) {
return;
}
const script = document.createElement('script');
script.src = 'https://unpkg.com/@elevenlabs/convai-widget-embed';
script.async = true;
script.type = 'text/javascript';
document.head.appendChild(script);
// Create the wrapper and widget
const wrapper = document.createElement('div');
wrapper.className = 'desktop';
const widget = document.createElement('elevenlabs-convai');
widget.id = ID;
widget.setAttribute('agent-id', 'the-agent-id');
widget.setAttribute('variant', 'full');
// Set initial colors and variant based on current theme and device
updateWidgetColors(widget);
updateWidgetVariant(widget);
// Watch for theme changes and resize events
const observer = new MutationObserver(() => {
updateWidgetColors(widget);
});
observer.observe(document.documentElement, {
attributes: true,
attributeFilter: ['class'],
});
// Add resize listener for mobile detection
window.addEventListener('resize', () => {
updateWidgetVariant(widget);
});
function updateWidgetVariant(widget) {
const isMobile = window.innerWidth <= 640; // Common mobile breakpoint
if (isMobile) {
widget.setAttribute('variant', 'expandable');
} else {
widget.setAttribute('variant', 'full');
}
}
function updateWidgetColors(widget) {
const isDarkMode = !document.documentElement.classList.contains('light');
if (isDarkMode) {
widget.setAttribute('avatar-orb-color-1', '#2E2E2E');
widget.setAttribute('avatar-orb-color-2', '#B8B8B8');
} else {
widget.setAttribute('avatar-orb-color-1', '#4D9CFF');
widget.setAttribute('avatar-orb-color-2', '#9CE6E6');
}
}
// Listen for the widget's "call" event to inject client tools
widget.addEventListener('elevenlabs-convai:call', (event) => {
event.detail.config.clientTools = {
redirectToDocs: ({ path }) => {
const router = window?.next?.router;
if (router) {
router.push(path);
}
},
redirectToEmailSupport: ({ subject, body }) => {
const encodedSubject = encodeURIComponent(subject);
const encodedBody = encodeURIComponent(body);
window.open(
`mailto:team@elevenlabs.io?subject=${encodedSubject}&body=${encodedBody}`,
'_blank'
);
},
redirectToSupportForm: ({ subject, description, extraInfo }) => {
const baseUrl = 'https://help.elevenlabs.io/hc/en-us/requests/new';
const ticketFormId = '13145996177937';
const encodedSubject = encodeURIComponent(subject);
const encodedDescription = encodeURIComponent(description);
const encodedExtraInfo = encodeURIComponent(extraInfo);
const fullUrl = `${baseUrl}?ticket_form_id=${ticketFormId}&tf_subject=${encodedSubject}&tf_description=${encodedDescription}%3Cbr%3E%3Cbr%3E${encodedExtraInfo}`;
window.open(fullUrl, '_blank', 'noopener,noreferrer');
},
redirectToExternalURL: ({ url }) => {
window.open(url, '_blank', 'noopener,noreferrer');
},
};
});
// Attach widget to the DOM
wrapper.appendChild(widget);
document.body.appendChild(wrapper);
}
if (document.readyState === 'loading') {
document.addEventListener('DOMContentLoaded', injectElevenLabsWidget);
} else {
injectElevenLabsWidget();
}
```
The widget automatically adapts to the site theme and device type, providing a consistent experience across all documentation pages.
## Evaluation framework
To continuously improve Alexis's performance, we implemented comprehensive evaluation criteria:
### Agent performance metrics
We track several key metrics for each interaction:
* `understood_root_cause`: Did the agent correctly identify the user's underlying concern?
* `positive_interaction`: Did the user remain emotionally positive throughout the conversation?
* `solved_user_inquiry`: Was the agent able to answer all queries or redirect appropriately?
* `hallucination_kb`: Did the agent provide accurate information from the knowledge base?
### Data collection
We also collect structured data from each conversation to analyze patterns:
* `issue_type`: Categorization of the conversation (bug report, feature request, etc.)
* `userIntent`: The primary goal of the user
* `product_category`: Which ElevenLabs product the conversation primarily concerned
* `communication_quality`: How clearly the agent communicated, from "poor" to "excellent"
This evaluation framework allows us to continually refine Alexis's behavior, knowledge, and communication style.
## Results and learnings
Since implementing our documentation agent, we've observed several key benefits:
1. **Reduced support volume**: Common questions are now handled directly through the documentation agent
2. **Improved user satisfaction**: Users get immediate, contextual help without leaving the documentation
3. **Better product understanding**: The agent can explain complex concepts in accessible ways
Our key learnings include:
* **Importance of personality**: A well-defined character creates more engaging interactions
* **RAG effectiveness**: Retrieval-augmented generation significantly improves response accuracy
* **Continuous improvement**: Regular analysis of interactions helps refine the agent over time
## Next steps
We continue to enhance our documentation agent through:
1. **Expanding knowledge**: Adding new products and features to the knowledge base
2. **Refining responses**: Improving explanation quality for complex topics by reviewing flagged conversations
3. **Adding capabilities**: Integrating new tools to better assist users
## FAQ
Documentation is traditionally static, but users often have specific questions that require
contextual understanding. A conversational interface allows users to ask questions in natural
language and receive targeted guidance that adapts to their needs and technical level.
We use retrieval-augmented generation (RAG) with our e5-mistral-7b-instruct embedding model to
ground responses in our documentation. We also implemented the `hallucination_kb` evaluation
metric to identify and address any inaccuracies.
We implemented the language detection system tool that automatically detects the user's language
and switches to it if supported. This allows users to interact with our documentation in their
preferred language without manual configuration.
# Simulate Conversations
> Learn how to test and evaluate your Conversational AI agent with simulated conversations
## Overview
The ElevenLabs Conversational AI API allows you to simulate and evaluate text-based conversations with your AI agent. This guide will teach you how to implement an end-to-end simulation testing workflow using the simulate conversation endpoints ([batch](/docs/api-reference/agents/simulate-conversation) and [streaming](/docs/api-reference/agents/simulate-conversation-stream)), enabling you to granularly test and improve your agent's performance to ensure it meets your interaction goals.
## Prerequisites
* An agent configured in ElevenLabs Conversational AI ([create one here](/docs/conversational-ai/quickstart))
* Your ElevenLabs API key, which you can [create in the dashboard](https://elevenlabs.io/app/settings/api-keys)
## Implementing a Simulation Testing Workflow
Search through your agent's conversation history and find instances where your agent has underperformed. Use those conversations to create various prompts for a simulated user who will interact with your agent. Additionally, define any extra evaluation criteria not already specified in your agent configuration to test outcomes you may want for a specific simulated user.
Create a request to the simulation endpoint using the ElevenLabs SDK.
```python title="Python"
from dotenv import load_dotenv
from elevenlabs import (
ElevenLabs,
ConversationSimulationSpecification,
AgentConfig,
PromptAgent,
PromptEvaluationCriteria
)
load_dotenv()
api_key = os.getenv("ELEVENLABS_API_KEY")
elevenlabs = ElevenLabs(api_key=api_key)
response = elevenlabs.conversational_ai.agents.simulate_conversation(
agent_id="YOUR_AGENT_ID",
simulation_specification=ConversationSimulationSpecification(
simulated_user_config=AgentConfig(
prompt=PromptAgent(
prompt="Your goal is to be a really difficult user.",
llm="gpt-4o",
temperature=0.5
)
)
),
extra_evaluation_criteria=[
PromptEvaluationCriteria(
id="politeness_check",
name="Politeness Check",
conversation_goal_prompt="The agent was polite.",
use_knowledge_base=False
)
]
)
print(response)
```
```typescript title="TypeScript"
import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
import dotenv from 'dotenv';
dotenv.config();
const apiKey = process.env.ELEVENLABS_API_KEY;
const elevenlabs = new ElevenLabsClient({
apiKey: apiKey,
});
const response = await elevenlabs.conversationalAi.agents.simulateConversation('YOUR_AGENT_ID', {
simulationSpecification: {
simulatedUserConfig: {
prompt: {
prompt: 'Your goal is to be a really difficult user.',
llm: 'gpt-4o',
temperature: 0.5,
},
},
},
extraEvaluationCriteria: [
{
id: 'politeness_check',
name: 'Politeness Check',
conversationGoalPrompt: 'The agent was polite.',
useKnowledgeBase: false,
},
],
});
console.log(JSON.stringify(response, null, 4));
```
This is a basic example. For a comprehensive list of input parameters, please refer to the API
reference for [Simulate conversation](/docs/api-reference/agents/simulate-conversation) and
[Stream simulate conversation](/docs/api-reference/agents/simulate-conversation-stream) endpoints.
The SDK provides a comprehensive JSON object that includes the entire conversation transcript and detailed analysis.
**Simulated Conversation**: Captures each interaction turn between the simulated user and the agent, detailing messages and tool usage.
```json title="Example conversation history"
[
...
{
"role": "user",
"message": "Maybe a little. I'll think about it, but I'm still not convinced it's the right move.",
"tool_calls": [],
"tool_results": [],
"feedback": null,
"llm_override": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null,
"rag_retrieval_info": null,
"llm_usage": null
},
{
"role": "agent",
"message": "I understand. If you want to explore more at your own pace, I can direct you to our documentation, which has guides and API references. Would you like me to send you a link?",
"tool_calls": [],
"tool_results": [],
"feedback": null,
"llm_override": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null,
"rag_retrieval_info": null,
"llm_usage": null
},
{
"role": "user",
"message": "I guess it wouldn't hurt to take a look. Go ahead and send it over.",
"tool_calls": [],
"tool_results": [],
"feedback": null,
"llm_override": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null,
"rag_retrieval_info": null,
"llm_usage": null
},
{
"role": "agent",
"message": null,
"tool_calls": [
{
"type": "client",
"request_id": "redirectToDocs_421d21e4b4354ed9ac827d7600a2d59c",
"tool_name": "redirectToDocs",
"params_as_json": "{\"path\": \"/docs/api-reference/introduction\"}",
"tool_has_been_called": false,
"tool_details": null
}
],
"tool_results": [],
"feedback": null,
"llm_override": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null,
"rag_retrieval_info": null,
"llm_usage": null
},
{
"role": "agent",
"message": null,
"tool_calls": [],
"tool_results": [
{
"type": "client",
"request_id": "redirectToDocs_421d21e4b4354ed9ac827d7600a2d59c",
"tool_name": "redirectToDocs",
"result_value": "Tool Called.",
"is_error": false,
"tool_has_been_called": true,
"tool_latency_secs": 0
}
],
"feedback": null,
"llm_override": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null,
"rag_retrieval_info": null,
"llm_usage": null
},
{
"role": "agent",
"message": "Okay, I've sent you a link to the introduction to our API reference. It provides a good starting point for understanding our different tools and how they can be integrated. Let me know if you have any questions as you explore it.\n",
"tool_calls": [],
"tool_results": [],
"feedback": null,
"llm_override": null,
"time_in_call_secs": 0,
"conversation_turn_metrics": null,
"rag_retrieval_info": null,
"llm_usage": null
}
...
]
```
**Analysis**: Offers insights into evaluation criteria outcomes, data collection metrics, and a summary of the conversation transcript.
```json title="Example analysis"
{
"analysis": {
"evaluation_criteria_results": {
"politeness_check": {
"criteria_id": "politeness_check",
"result": "success",
"rationale": "The agent remained polite and helpful despite the user's challenging attitude."
},
"understood_root_cause": {
"criteria_id": "understood_root_cause",
"result": "success",
"rationale": "The agent acknowledged the user's hesitation and provided relevant information."
},
"positive_interaction": {
"criteria_id": "positive_interaction",
"result": "success",
"rationale": "The user eventually asked for the documentation link, indicating engagement."
}
},
"data_collection_results": {
"issue_type": {
"data_collection_id": "issue_type",
"value": "support_issue",
"rationale": "The user asked for help with integrating ElevenLabs tools."
},
"user_intent": {
"data_collection_id": "user_intent",
"value": "The user is interested in integrating ElevenLabs tools into a project."
}
},
"call_successful": "success",
"transcript_summary": "The user expressed skepticism, but the agent provided useful information and a link to the API documentation."
}
}
```
Review the simulated conversations thoroughly to assess the effectiveness of your evaluation
criteria. Identify any gaps or areas where the criteria may fall short in evaluating the agent's
performance. Refine and adjust the evaluation criteria accordingly to ensure they align with your
desired outcomes and accurately measure the agent's capabilities.
Once you are confident in the accuracy of your evaluation criteria, use the learnings from
simulated conversations to enhance your agent's capabilities. Consider refining the system prompt
to better guide the agent's responses, ensuring they align with your objectives and user
expectations. Additionally, explore other features or configurations that could be optimized, such
as adjusting the agent's tone, improving its ability to handle specific queries, or integrating
additional data sources to enrich its responses. By systematically applying these learnings, you
can create a more robust and effective conversational agent that delivers a superior user
experience.
After completing an initial testing and improvement cycle, establishing a comprehensive testing
suite can be a great way to cover a broad range of possible scenarios. This suite can explore
multiple simulated conversations using varied simulated user prompts and starting conditions. By
continuously iterating and refining your approach, you can ensure your agent remains effective and
responsive to evolving user needs.
## Pro Tips
#### Detailed Prompts and Criteria
Crafting detailed and verbose simulated user prompts and evaluation criteria can enhance the effectiveness of the simulation tests. The more context and specificity you provide, the better the agent can understand and respond to complex interactions.
#### Mock Tool Configurations
Utilize mock tool configurations to test the decision-making process of your agent. This allows you to observe how the agent decides to make tool calls and react to different tool call results. For more details, check out the tool\_mock\_config input parameter from the [API reference](/docs/api-reference/agents/simulate-conversation#request.body.simulation_specification.tool_mock_config).
#### Partial Conversation History
Use partial conversation histories to evaluate how agents handle interactions from a specific point. This is particularly useful for assessing the agent's ability to manage conversations where the user has already set up a question in a specific way, or if there have been certain tool calls that have succeeded or failed. For more details, check out the partial\_conversation\_history input parameter from the [API reference](/docs/api-reference/agents/simulate-conversation#request.body.simulation_specification.partial_conversation_history).
# Next.JS
> Learn how to create a web application that enables voice conversations with ElevenLabs AI agents
This tutorial will guide you through creating a web client that can interact with a Conversational AI agent. You'll learn how to implement real-time voice conversations, allowing users to speak with an AI agent that can listen, understand, and respond naturally using voice synthesis.
## What You'll Need
1. An ElevenLabs agent created following [this guide](/docs/conversational-ai/quickstart)
2. `npm` installed on your local system.
3. We'll use Typescript for this tutorial, but you can use Javascript if you prefer.
Looking for a complete example? Check out our [Next.js demo on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/nextjs).

## Setup
Open a terminal window and run the following command:
```bash
npm create next-app my-conversational-agent
```
It will ask you some questions about how to build your project. We'll follow the default suggestions for this tutorial.
```shell
cd my-conversational-agent
```
```shell
npm install @elevenlabs/react
```
Run the following command to start the development server and open the provided URL in your browser:
```shell
npm run dev
```

## Implement Conversational AI
Create a new file `app/components/conversation.tsx`:
```tsx app/components/conversation.tsx
'use client';
import { useConversation } from '@elevenlabs/react';
import { useCallback } from 'react';
export function Conversation() {
const conversation = useConversation({
onConnect: () => console.log('Connected'),
onDisconnect: () => console.log('Disconnected'),
onMessage: (message) => console.log('Message:', message),
onError: (error) => console.error('Error:', error),
});
const startConversation = useCallback(async () => {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
// Start the conversation with your agent
await conversation.startSession({
agentId: 'YOUR_AGENT_ID', // Replace with your agent ID
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation]);
const stopConversation = useCallback(async () => {
await conversation.endSession();
}, [conversation]);
return (
Start Conversation
Stop Conversation
Status: {conversation.status}
Agent is {conversation.isSpeaking ? 'speaking' : 'listening'}
);
}
```
Replace the contents of `app/page.tsx` with:
```tsx app/page.tsx
import { Conversation } from './components/conversation';
export default function Home() {
return (
ElevenLabs Conversational AI
);
}
```
This authentication step is only required for private agents. If you're using a public agent, you
can skip this section and directly use the `agentId` in the `startSession` call.
If you're using a private agent that requires authentication, you'll need to generate
a signed URL from your server. This section explains how to set this up.
### What You'll Need
1. An ElevenLabs account and API key. Sign up [here](https://www.elevenlabs.io/sign-up).
Create a `.env.local` file in your project root:
```yaml .env.local
ELEVENLABS_API_KEY=your-api-key-here
NEXT_PUBLIC_AGENT_ID=your-agent-id-here
```
1. Make sure to add `.env.local` to your `.gitignore` file to prevent accidentally committing sensitive credentials to version control.
2. Never expose your API key in the client-side code. Always keep it secure on the server.
Create a new file `app/api/get-signed-url/route.ts`:
```tsx app/api/get-signed-url/route.ts
import { NextResponse } from 'next/server';
export async function GET() {
try {
const response = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=${process.env.NEXT_PUBLIC_AGENT_ID}`,
{
headers: {
'xi-api-key': process.env.ELEVENLABS_API_KEY!,
},
}
);
if (!response.ok) {
throw new Error('Failed to get signed URL');
}
const data = await response.json();
return NextResponse.json({ signedUrl: data.signed_url });
} catch (error) {
return NextResponse.json(
{ error: 'Failed to generate signed URL' },
{ status: 500 }
);
}
}
```
Modify your `conversation.tsx` to fetch and use the signed URL:
```tsx app/components/conversation.tsx {5-12,19,23}
// ... existing imports ...
export function Conversation() {
// ... existing conversation setup ...
const getSignedUrl = async (): Promise => {
const response = await fetch("/api/get-signed-url");
if (!response.ok) {
throw new Error(`Failed to get signed url: ${response.statusText}`);
}
const { signedUrl } = await response.json();
return signedUrl;
};
const startConversation = useCallback(async () => {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
const signedUrl = await getSignedUrl();
// Start the conversation with your signed url
await conversation.startSession({
signedUrl,
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation]);
// ... rest of the component ...
}
```
Signed URLs expire after a short period. However, any conversations initiated before expiration will continue uninterrupted. In a production environment, implement proper error handling and URL refresh logic for starting new conversations.
## Next Steps
Now that you have a basic implementation, you can:
1. Add visual feedback for voice activity
2. Implement error handling and retry logic
3. Add a chat history display
4. Customize the UI to match your brand
For more advanced features and customization options, check out the
[@elevenlabs/react](https://www.npmjs.com/package/@elevenlabs/react) package.
# Vite (Javascript)
> Learn how to create a web application that enables voice conversations with ElevenLabs AI agents
This tutorial will guide you through creating a web client that can interact with a Conversational AI agent. You'll learn how to implement real-time voice conversations, allowing users to speak with an AI agent that can listen, understand, and respond naturally using voice synthesis.
Looking to build with React/Next.js? Check out our [Next.js
guide](/docs/conversational-ai/guides/quickstarts/next-js)
## What You'll Need
1. An ElevenLabs agent created following [this guide](/docs/conversational-ai/quickstart)
2. `npm` installed on your local system
3. Basic knowledge of JavaScript
Looking for a complete example? Check out our [Vanilla JS demo on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/javascript).
## Project Setup
Open a terminal and create a new directory for your project:
```bash
mkdir elevenlabs-conversational-ai
cd elevenlabs-conversational-ai
```
Initialize a new npm project and install the required packages:
```bash
npm init -y
npm install vite @elevenlabs/client
```
Add this to your `package.json`:
```json package.json {4}
{
"scripts": {
...
"dev:frontend": "vite"
}
}
```
Create the following file structure:
```shell {2,3}
elevenlabs-conversational-ai/
├── index.html
├── script.js
├── package-lock.json
├── package.json
└── node_modules
```
## Implementing the Voice Chat Interface
In `index.html`, set up a simple user interface:

```html index.html
ElevenLabs Conversational AI
ElevenLabs Conversational AI
Start Conversation
Stop Conversation
Status: Disconnected
Agent is listening
```
In `script.js`, implement the functionality:
```javascript script.js
import { Conversation } from '@elevenlabs/client';
const startButton = document.getElementById('startButton');
const stopButton = document.getElementById('stopButton');
const connectionStatus = document.getElementById('connectionStatus');
const agentStatus = document.getElementById('agentStatus');
let conversation;
async function startConversation() {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
// Start the conversation
conversation = await Conversation.startSession({
agentId: 'YOUR_AGENT_ID', // Replace with your agent ID
onConnect: () => {
connectionStatus.textContent = 'Connected';
startButton.disabled = true;
stopButton.disabled = false;
},
onDisconnect: () => {
connectionStatus.textContent = 'Disconnected';
startButton.disabled = false;
stopButton.disabled = true;
},
onError: (error) => {
console.error('Error:', error);
},
onModeChange: (mode) => {
agentStatus.textContent = mode.mode === 'speaking' ? 'speaking' : 'listening';
},
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}
async function stopConversation() {
if (conversation) {
await conversation.endSession();
conversation = null;
}
}
startButton.addEventListener('click', startConversation);
stopButton.addEventListener('click', stopConversation);
```
```shell
npm run dev:frontend
```
Make sure to replace
`'YOUR_AGENT_ID'`
with your actual agent ID from ElevenLabs.
This authentication step is only required for private agents. If you're using a public agent, you can skip this section and directly use the `agentId` in the `startSession` call.
Create a `.env` file in your project root:
```env .env
ELEVENLABS_API_KEY=your-api-key-here
AGENT_ID=your-agent-id-here
```
Make sure to add `.env` to your `.gitignore` file to prevent accidentally committing sensitive credentials.
1. Install additional dependencies:
```bash
npm install express cors dotenv
```
2. Create a new folder called `backend`:
```shell {2}
elevenlabs-conversational-ai/
├── backend
...
```
```javascript backend/server.js
require("dotenv").config();
const express = require("express");
const cors = require("cors");
const app = express();
app.use(cors());
app.use(express.json());
const PORT = process.env.PORT || 3001;
app.get("/api/get-signed-url", async (req, res) => {
try {
const response = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=${process.env.AGENT_ID}`,
{
headers: {
"xi-api-key": process.env.ELEVENLABS_API_KEY,
},
}
);
if (!response.ok) {
throw new Error("Failed to get signed URL");
}
const data = await response.json();
res.json({ signedUrl: data.signed_url });
} catch (error) {
console.error("Error:", error);
res.status(500).json({ error: "Failed to generate signed URL" });
}
});
app.listen(PORT, () => {
console.log(`Server running on http://localhost:${PORT}`);
});
```
Modify your `script.js` to fetch and use the signed URL:
```javascript script.js {2-10,16,19,20}
// ... existing imports and variables ...
async function getSignedUrl() {
const response = await fetch('http://localhost:3001/api/get-signed-url');
if (!response.ok) {
throw new Error(`Failed to get signed url: ${response.statusText}`);
}
const { signedUrl } = await response.json();
return signedUrl;
}
async function startConversation() {
try {
await navigator.mediaDevices.getUserMedia({ audio: true });
const signedUrl = await getSignedUrl();
conversation = await Conversation.startSession({
signedUrl,
// agentId has been removed...
onConnect: () => {
connectionStatus.textContent = 'Connected';
startButton.disabled = true;
stopButton.disabled = false;
},
onDisconnect: () => {
connectionStatus.textContent = 'Disconnected';
startButton.disabled = false;
stopButton.disabled = true;
},
onError: (error) => {
console.error('Error:', error);
},
onModeChange: (mode) => {
agentStatus.textContent = mode.mode === 'speaking' ? 'speaking' : 'listening';
},
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}
// ... rest of the code ...
```
Signed URLs expire after a short period. However, any conversations initiated before expiration will continue uninterrupted. In a production environment, implement proper error handling and URL refresh logic for starting new conversations.
```json package.json {4,5}
{
"scripts": {
...
"dev:backend": "node backend/server.js",
"dev": "npm run dev:frontend & npm run dev:backend"
}
}
```
Start the application with:
```bash
npm run dev
```
## Next Steps
Now that you have a basic implementation, you can:
1. Add visual feedback for voice activity
2. Implement error handling and retry logic
3. Add a chat history display
4. Customize the UI to match your brand
For more advanced features and customization options, check out the
[@elevenlabs/client](https://www.npmjs.com/package/@elevenlabs/client) package.
# Conversational AI in Ghost
> Learn how to deploy a Conversational AI agent to Ghost
This tutorial will guide you through adding your ElevenLabs Conversational AI agent to your Ghost website.
## Prerequisites
* An ElevenLabs Conversational AI agent created following [this guide](/docs/conversational-ai/docs/agent-setup)
* A Ghost website (paid plan or self-hosted)
* Access to Ghost admin panel
## Guide
There are two ways to add the widget to your Ghost site:
Visit the [ElevenLabs dashboard](https://elevenlabs.io/app/conversational-ai) and copy your agent's html widget.
```html
```
**Option A: Add globally (all pages)**
1. Go to Ghost Admin > Settings > Code Injection
2. Paste the code into Site Footer
3. Save changes
**Option B: Add to specific pages**
1. Edit your desired page/post
2. Click the + sign to add an HTML block
3. Paste your agent's html widget from step 1 into the HTML block. Make sure to fill in the agent-id attribute correctly.
4. Save and publish
1. Visit your Ghost website
2. Verify the widget appears and functions correctly
3. Test on different devices and browsers
## Troubleshooting
If the widget isn't appearing, verify:
* The code is correctly placed in either Code Injection or HTML block
* Your Ghost plan supports custom code
* No JavaScript conflicts with other scripts
## Next steps
Now that you have added your Conversational AI agent to Ghost, you can:
1. Customize the widget in the ElevenLabs dashboard to match your brand
2. Add additional languages
3. Add advanced functionality like tools & knowledge base
# Conversational AI in Framer
> Learn how to deploy a Conversational AI agent to Framer
This tutorial will guide you through adding your conversational AI agent to your Framer website.
## Prerequisites
* An ElevenLabs Conversational AI agent created following [this guide](/docs/conversational-ai/quickstart)
* A Framer account & website, create one [here](https://framer.com)
## Guide
Open your website in the Framer editor and click on the primary desktop on the left.
Copy and paste the following url into the page you would like to add the Conversational AI agent to:
```
https://framer.com/m/ConversationalAI-iHql.js@y7VwRka75sp0UFqGliIf
```
You'll now see a Conversational AI asset on the 'Layers' bar on the left and the Conversational AI component's details on the right.
Enable the Conversational AI agent by filling in the agent ID in the bar on the right.
You can find the agent ID in the [ElevenLabs dashboard](https://elevenlabs.io/app/conversational-ai).
Having trouble? Make sure the Conversational AI component is placed below the desktop component in the layers panel.
## Next steps
Now that you have added your Conversational AI agent to your Framer website, you can:
1. Customize the widget in the ElevenLabs dashboard to match your brand
2. Add additional languages
3. Add advanced functionality like tools & knowledge base.
# Conversational AI in Squarespace
> Learn how to deploy a Conversational AI agent to Squarespace
This tutorial will guide you through adding your ElevenLabs Conversational AI agent to your Squarespace website.
## Prerequisites
* An ElevenLabs Conversational AI agent created following [this guide](/docs/conversational-ai/docs/agent-setup)
* A Squarespace Business or Commerce plan (required for custom code)
* Basic familiarity with Squarespace's editor
## Guide
Visit the [ElevenLabs dashboard](https://elevenlabs.io/app/conversational-ai) and find your agent's embed widget.
```html
```
1. Navigate to your desired page
2. Click + to add a block
3. Select Code from the menu
4. Paste the ` ` snippet into the Code Block
5. Save the block
1. Go to Settings > Advanced > Code Injection
2. Paste the snippet `` into the Footer section
3. Save changes
4. Publish your site to see the changes
Note: The widget will only be visible on your live site, not in the editor preview.
## Troubleshooting
If the widget isn't appearing, verify:
* The `
```
1. Open your Webflow project in Designer
2. Drag an Embed Element to your desired location
3. Paste the ` ` snippet into the Embed Element's code editor
4. Save & Close
1. Go to Project Settings > Custom Code
2. Paste the snippet `` into the Footer Code section
3. Save Changes
4. Publish your site to see the changes
Note: The widget will only be visible after publishing your site, not in the Designer.
## Troubleshooting
If the widget isn't appearing, verify:
* The `
```
1. Open your Wix site in the Editor
2. Click on Dev Mode in the top menu
3. If Dev Mode is not visible, ensure you're using the full Wix Editor, not Wix ADI
1. Go to Settings > Custom Code
2. Click + Add Custom Code
3. Paste your ElevenLabs embed snippet from step 1 with the agent-id attribute filled in correctly
4. Select the pages you would like to add the Conversational AI widget to (all pages, or specific pages)
5. Save and publish
## Troubleshooting
If the widget isn't appearing, verify:
* You're using a Wix Premium plan
* Your site's domain is properly configured in the ElevenLabs allowlist
* The code is added correctly in the Custom Code section
## Next steps
Now that you have added your Conversational AI agent to Wix, you can:
1. Customize the widget in the ElevenLabs dashboard to match your brand
2. Add additional languages
3. Add advanced functionality like tools & knowledge base
# Conversational AI in WordPress
> Learn how to deploy a Conversational AI agent to WordPress
This tutorial will guide you through adding your ElevenLabs Conversational AI agent to your WordPress website.
## Prerequisites
* An ElevenLabs Conversational AI agent created following [this guide](/docs/conversational-ai/docs/agent-setup)
* A WordPress website with either:
* WordPress.com Business/Commerce plan, or
* Self-hosted WordPress installation
## Guide
Visit the [ElevenLabs dashboard](https://elevenlabs.io/app/conversational-ai) and find your agent's embed widget.
```html
```
1. In WordPress, edit your desired page
2. Add a Custom HTML block
3. Paste the ` ` snippet into the block
4. Update/publish the page
**Option A: Using a plugin**
1. Install Header Footer Code Manager
2. Add the snippet `` to the Footer section
3. Set it to run on All Pages
**Option B: Direct theme editing**
1. Go to Appearance > Theme Editor
2. Open footer.php
3. Paste the script snippet before ``
## Troubleshooting
If the widget isn't appearing, verify:
* The `