# ElevenLabs

> ElevenLabs is an AI audio research and deployment company.

{/* Light mode wave */}


  


{/* Dark mode wave */}


  


## Most popular


  
    Learn how to integrate ElevenLabs
  

  
    Deploy voice agents in minutes
  

  
    Learn how to use ElevenLabs
  

  
    Dive into our API reference
  


## Meet the models


  Eleven v3
} href="/docs/models#eleven-v3-alpha">
    Our most emotionally rich, expressive speech synthesis model

    
      
        Dramatic delivery and performance
      

      
        70+ languages supported
      

      
        10,000 character limit
      

      
        Support for natural multi-speaker dialogue
      
    
  

  
    Lifelike, consistent quality speech synthesis model

    
      
        Natural-sounding output
      

      
        29 languages supported
      

      
        10,000 character limit
      

      
        Most stable on long-form generations
      
    
  

  
    Our fast, affordable speech synthesis model

    
      
        Ultra-low latency (~75ms†)
      

      
        32 languages supported
      

      
        40,000 character limit
      

      
        Faster model, 50% lower price per character
      
    
  

  
    High quality, low-latency model with a good balance of quality and speed

    
      
        High quality voice generation
      

      
        32 languages supported
      

      
        40,000 character limit
      

      
        Low latency (~250ms-300ms†), 50% lower price per character
      
    
  



  
    State-of-the-art speech recognition model

    
      
        Accurate transcription in 99 languages
      

      
        Precise word-level timestamps
      

      
        Speaker diarization
      

      
        Dynamic audio tagging
      
    
  



  
    [Explore all](/docs/models)
  


## Capabilities


  
    
      
        
          Text to Speech
        

        
          Convert text into lifelike speech
        
      
    
  

  
    
      
        
          Speech to Text
        

        
          Transcribe spoken audio into text
        
      
    
  

  
    
      
        
          Voice changer
        

        
          Modify and transform voices
        
      
    
  

  
    
      
        
          Voice isolator
        

        
          Isolate voices from background noise
        
      
    
  

  
    
      
        
          Dubbing
        

        
          Dub audio and videos seamlessly
        
      
    
  

  
    
      
        
          Sound effects
        

        
          Create cinematic sound effects
        
      
    
  

  
    
      
        
          Voices
        

        
          Clone and design custom voices
        
      
    
  

  
    
      
        
          Conversational AI
        

        
          Deploy intelligent voice agents
        
      
    
  


## Product guides


  
    
      
        
          Product guides
        

        
          Explore our product guides for step-by-step guidance
        
      

      
        
      
    
  



  † Excluding application & network latency



# Developer quickstart

> Learn how to make your first ElevenLabs API request.

The ElevenLabs API provides a simple interface to state-of-the-art audio [models](/docs/models) and [features](/docs/api-reference/introduction). Follow this guide to learn how to create lifelike speech with our Text to Speech API. See the [developer guides](/docs/quickstart#explore-our-developer-guides) for more examples with our other products.

## Using the Text to Speech API


  
    [Create an API key in the dashboard here](https://elevenlabs.io/app/settings/api-keys), which you’ll use to securely [access the API](/docs/api-reference/authentication).

    Store the key as a managed secret and pass it to the SDKs either as a environment variable via an `.env` file, or directly in your app’s configuration depending on your preference.

    ```js title=".env"
    ELEVENLABS_API_KEY=
    ```
  

  
    We'll also use the `dotenv` library to load our API key from an environment variable.

    
      ```python
      pip install elevenlabs
      pip install python-dotenv
      ```

      ```typescript
      npm install @elevenlabs/elevenlabs-js
      npm install dotenv
      ```
    

    
      To play the audio through your speakers, you may be prompted to install [MPV](https://mpv.io/)
      and/or [ffmpeg](https://ffmpeg.org/).
    
  

  
    Create a new file named `example.py` or `example.mts`, depending on your language of choice and add the following code:

    {/* This snippet was auto-generated */}

    
      ```python
      from dotenv import load_dotenv
      from elevenlabs.client import ElevenLabs
      from elevenlabs import play

      load_dotenv()

      elevenlabs = ElevenLabs(
        api_key=os.getenv("ELEVENLABS_API_KEY"),
      )

      audio = elevenlabs.text_to_speech.convert(
          text="The first move is what sets everything in motion.",
          voice_id="JBFqnCBsd6RMkjVDRZzb",
          model_id="eleven_multilingual_v2",
          output_format="mp3_44100_128",
      )

      play(audio)

      ```

      ```typescript
      import { ElevenLabsClient, play } from '@elevenlabs/elevenlabs-js';
      import 'dotenv/config';

      const elevenlabs = new ElevenLabsClient();
      const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
        text: 'The first move is what sets everything in motion.',
        modelId: 'eleven_multilingual_v2',
        outputFormat: 'mp3_44100_128',
      });

      await play(audio);

      ```
    
  

  
    
      ```python
      python example.py
      ```

      ```typescript
      npx tsx example.mts
      ```
    

    You should hear the audio play through your speakers.
  


## Explore our developer guides

Now that you've made your first ElevenLabs API request, you can explore the other products that ElevenLabs offers.


  
    Convert spoken audio into text
  

  
    Deploy conversational voice agents
  

  
    Clone a voice
  

  
    Generate sound effects from text
  

  
    Transform the voice of an audio file
  

  
    Isolate background noise from audio
  

  
    Generate voices from a single text prompt
  

  
    Dub audio/video from one language to another
  

  
    Generate time-aligned transcripts for audio
  



# Models

> Learn about the models that power the ElevenLabs API.

## Flagship models


  Eleven v3
} href="/docs/models#eleven-v3-alpha">
    Our most emotionally rich, expressive speech synthesis model

    
      
        Dramatic delivery and performance
      

      
        70+ languages supported
      

      
        10,000 character limit
      

      
        Support for natural multi-speaker dialogue
      
    
  

  
    Lifelike, consistent quality speech synthesis model

    
      
        Natural-sounding output
      

      
        29 languages supported
      

      
        10,000 character limit
      

      
        Most stable on long-form generations
      
    
  

  
    Our fast, affordable speech synthesis model

    
      
        Ultra-low latency (~75ms†)
      

      
        32 languages supported
      

      
        40,000 character limit
      

      
        Faster model, 50% lower price per character
      
    
  

  
    High quality, low-latency model with a good balance of quality and speed

    
      
        High quality voice generation
      

      
        32 languages supported
      

      
        40,000 character limit
      

      
        Low latency (~250ms-300ms†), 50% lower price per character
      
    
  



  
    State-of-the-art speech recognition model

    
      
        Accurate transcription in 99 languages
      

      
        Precise word-level timestamps
      

      
        Speaker diarization
      

      
        Dynamic audio tagging
      
    
  



  
    [Pricing](https://elevenlabs.io/pricing/api)
  


## Models overview

The ElevenLabs API offers a range of audio models optimized for different use cases, quality levels, and performance requirements.

| Model ID                     | Description                                                                                                                                                                                                           | Languages                                                                                                                                                                                       |
| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `eleven_v3`                  | Human-like and expressive speech generation                                                                                                                                                                           | [70+ languages](/docs/models#supported-languages)                                                                                                                                               |
| `eleven_ttv_v3`              | Human-like and expressive voice design model (Text to Voice)                                                                                                                                                          | [70+ languages](/docs/models#supported-languages)                                                                                                                                               |
| `eleven_multilingual_v2`     | Our most lifelike model with rich emotional expression                                                                                                                                                                | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`                   |
| `eleven_flash_v2_5`          | Ultra-fast model optimized for real-time use (\~75ms†)                                                                                                                                                                | All `eleven_multilingual_v2` languages plus: `hu`, `no`, `vi`                                                                                                                                   |
| `eleven_flash_v2`            | Ultra-fast model optimized for real-time use (\~75ms†)                                                                                                                                                                | `en`                                                                                                                                                                                            |
| `eleven_turbo_v2_5`          | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms)                                                                                                                              | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`, `hu`, `no`, `vi` |
| `eleven_turbo_v2`            | High quality, low-latency model with a good balance of quality and speed (\~250ms-300ms)                                                                                                                              | `en`                                                                                                                                                                                            |
| `eleven_multilingual_sts_v2` | State-of-the-art multilingual voice changer model (Speech to Speech)                                                                                                                                                  | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`                   |
| `eleven_multilingual_ttv_v2` | State-of-the-art multilingual voice designer model (Text to Voice)                                                                                                                                                    | `en`, `ja`, `zh`, `de`, `hi`, `fr`, `ko`, `pt`, `it`, `es`, `id`, `nl`, `tr`, `fil`, `pl`, `sv`, `bg`, `ro`, `ar`, `cs`, `el`, `fi`, `hr`, `ms`, `sk`, `da`, `ta`, `uk`, `ru`                   |
| `eleven_english_sts_v2`      | English-only voice changer model (Speech to Speech)                                                                                                                                                                   | `en`                                                                                                                                                                                            |
| `scribe_v1`                  | State-of-the-art speech recognition model                                                                                                                                                                             | [99 languages](/docs/capabilities/speech-to-text#supported-languages)                                                                                                                           |
| `scribe_v1_experimental`     | State-of-the-art speech recognition model with experimental features: improved multilingual performance, reduced hallucinations during silence, fewer audio tags, and better handling of early transcript termination | [99 languages](/docs/capabilities/speech-to-text#supported-languages)                                                                                                                           |


  † Excluding application & network latency



  
    These models are maintained for backward compatibility but are not recommended for new projects.
  

  | Model ID                 | Description                                          | Languages                                      |
  | ------------------------ | ---------------------------------------------------- | ---------------------------------------------- |
  | `eleven_monolingual_v1`  | First generation TTS model (outclassed by v2 models) | `en`                                           |
  | `eleven_multilingual_v1` | First multilingual model (outclassed by v2 models)   | `en`, `fr`, `de`, `hi`, `it`, `pl`, `pt`, `es` |


## Eleven v3 (alpha)


  This model is currently in alpha and is subject to change. Eleven v3 is not made for real-time
  applications like Conversational AI. When integrating Eleven v3 into your application, consider
  generating several generations and allowing the user to select the best one.


Eleven v3 is our latest and most advanced speech synthesis model. It is a state-of-the-art model that produces natural, life-like speech with high emotional range and contextual understanding across multiple languages.

This model works well in the following scenarios:

* **Character Discussions**: Excellent for audio experiences with multiple characters that interact with each other.
* **Audiobook Production**: Perfect for long-form narration with complex emotional delivery.
* **Emotional Dialogue**: Generate natural, lifelike dialogue with high emotional range and contextual understanding.

With Eleven v3 comes a new Text to Dialogue API, which allows you to generate natural, lifelike dialogue with high emotional range and contextual understanding across multiple languages. Eleven v3 can also be used with the Text to Speech API to generate natural, lifelike speech with high emotional range and contextual understanding across multiple languages.


  Eleven v3 API access is currently not publicly available, but will be soon. To request access,
  please [contact our sales team](https://elevenlabs.io/contact-sales).


Read more about the Text to Dialogue API [here](/docs/capabilities/text-to-dialogue).

### Model selection

The model can be used with the Text to Speech API by selecting the `eleven_v3` model ID. The Text to Dialogue API defaults to using the v3 model. Alternatively you can select a preview version which is formatted as `eleven_v3_preview_YYYY_MM_DD`. When a preview version has been evaluated and is ready for production, it will be promoted to the `eleven_v3` model ID. Use the evergreen `eleven_v3` model ID for the most stable experience and the preview version for the latest features.

### Supported languages

The Eleven v3 model supports 70+ languages, including:

*Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).*

## Multilingual v2

Eleven Multilingual v2 is our most advanced, emotionally-aware speech synthesis model. It produces natural, lifelike speech with high emotional range and contextual understanding across multiple languages.

The model delivers consistent voice quality and personality across all supported languages while maintaining the speaker's unique characteristics and accent.

This model excels in scenarios requiring high-quality, emotionally nuanced speech:

* **Character Voiceovers**: Ideal for gaming and animation due to its emotional range.
* **Professional Content**: Well-suited for corporate videos and e-learning materials.
* **Multilingual Projects**: Maintains consistent voice quality across language switches.
* **Stable Quality**: Produces consistent, high-quality audio output.

While it has a higher latency & cost per character than Flash models, it delivers superior quality for projects where lifelike speech is important.

Our v2 models support 29 languages:

*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*

## Flash v2.5

Eleven Flash v2.5 is our fastest speech synthesis model, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (\~75ms†) across 32 languages.

The model balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output and consistent voice characteristics across languages.

This model is particularly well-suited for:

* **Conversational AI**: Perfect for real-time voice agents and chatbots.
* **Interactive Applications**: Ideal for games and applications requiring immediate response.
* **Large-Scale Processing**: Efficient for bulk text-to-speech conversion.

With its lower price point and 75ms latency, Flash v2.5 is the cost-effective option for anyone needing fast, reliable speech synthesis across multiple languages.

Flash v2.5 supports 32 languages - all languages from v2 models plus:

*Hungarian, Norwegian & Vietnamese*


  † Excluding application & network latency


### Considerations


  
    When using Flash v2.5, numbers aren't normalized in a way you might expect. For example, phone numbers might be read out in way that isn't clear for the user. Dates and currencies are affected in a similar manner.

    This is expected as normalization is disabled for Flash v2.5 to maintain the low latency.

    The Multilingual v2 model does a better job of normalizing numbers, so we recommend using it for phone numbers and other cases where number normalization is important.

    For low-latency or Conversational AI applications, best practice is to have your LLM [normalize the text](/docs/best-practices/prompting/normalization) before passing it to the TTS model.
  


## Turbo v2.5

Eleven Turbo v2.5 is our high-quality, low-latency model with a good balance of quality and speed.

This model is an ideal choice for all scenarios where you'd use Flash v2.5, but where you're willing to trade off latency for higher quality voice generation.

## Model selection guide


  
    
      
        Use `eleven_multilingual_v2`

        Best for high-fidelity audio output with rich emotional expression
      

      
        Use Flash models

        Optimized for real-time applications (\~75ms latency)
      

      
        Use either either `eleven_multilingual_v2` or `eleven_flash_v2_5`

        Both support up to 32 languages
      

      
        Use `eleven_turbo_v2_5`

        Good balance between quality and speed
      
    
  

  
    
      
        Use `eleven_multilingual_v2`

        Ideal for professional content, audiobooks & video narration.
      

      
        Use `eleven_flash_v2_5`, `eleven_flash_v2`, `eleven_multilingual_v2`, `eleven_turbo_v2_5` or `eleven_turbo_v2`

        Perfect for real-time conversational applications
      

      
        Use `eleven_multilingual_sts_v2`

        Specialized for Speech-to-Speech conversion
      
    
  


## Character limits

The maximum number of characters supported in a single text-to-speech request varies by model.

| Model ID                 | Character limit | Approximate audio duration |
| ------------------------ | --------------- | -------------------------- |
| `eleven_flash_v2_5`      | 40,000          | \~40 minutes               |
| `eleven_flash_v2`        | 30,000          | \~30 minutes               |
| `eleven_turbo_v2_5`      | 40,000          | \~40 minutes               |
| `eleven_turbo_v2`        | 30,000          | \~30 minutes               |
| `eleven_multilingual_v2` | 10,000          | \~10 minutes               |
| `eleven_multilingual_v1` | 10,000          | \~10 minutes               |
| `eleven_english_sts_v2`  | 10,000          | \~10 minutes               |
| `eleven_english_sts_v1`  | 10,000          | \~10 minutes               |


  For longer content, consider splitting the input into multiple requests.


## Scribe v1

Scribe v1 is our state-of-the-art speech recognition model designed for accurate transcription across 99 languages. It provides precise word-level timestamps and advanced features like speaker diarization and dynamic audio tagging.

This model excels in scenarios requiring accurate speech-to-text conversion:

* **Transcription Services**: Perfect for converting audio/video content to text
* **Meeting Documentation**: Ideal for capturing and documenting conversations
* **Content Analysis**: Well-suited for audio content processing and analysis
* **Multilingual Recognition**: Supports accurate transcription across 99 languages

Key features:

* Accurate transcription with word-level timestamps
* Speaker diarization for multi-speaker audio
* Dynamic audio tagging for enhanced context
* Support for 99 languages

Read more about Scribe v1 [here](/docs/capabilities/speech-to-text).

## Concurrency and priority

Your subscription plan determines how many requests can be processed simultaneously and the priority level of your requests in the queue.
Speech to Text has an elevated concurrency limit.
Once the concurrency limit is met, subsequent requests are processed in a queue alongside lower-priority requests.
In practice this typically only adds \~50ms of latency.

| Plan       | Concurrency Limit
 (Multilingual v2) | Concurrency Limit
 (Turbo & Flash) | STT Concurrency Limit | Priority level |
| ---------- | ----------------------------------------- | --------------------------------------- | --------------------- | -------------- |
| Free       | 2                                         | 4                                       | 10                    | 3              |
| Starter    | 3                                         | 6                                       | 15                    | 4              |
| Creator    | 5                                         | 10                                      | 25                    | 5              |
| Pro        | 10                                        | 20                                      | 50                    | 5              |
| Scale      | 15                                        | 30                                      | 75                    | 5              |
| Business   | 15                                        | 30                                      | 75                    | 5              |
| Enterprise | Elevated                                  | Elevated                                | Elevated              | Highest        |

The response headers include `current-concurrent-requests` and `maximum-concurrent-requests` which you can use to monitor your concurrency.

How endpoint requests are made impacts concurrency limits:

* With HTTP, each request counts individually toward your concurrency limit.
* With a WebSocket, only the time where our model is generating audio counts towards your concurrency limit, this means a for most of the time an open websocket doesn't count towards your concurrency limit at all.

### Understanding concurrency limits

The concurrency limit associated with your plan should not be interpreted as the maximum number of simultaneous conversations, phone calls character voiceovers, etc that can be handled at once.
The actual number depends on several factors, including the specific AI voices used and the characteristics of the use case.

As a general rule of thumb, a concurrency limit of 5 can typically support up to approximately 100 simultaneous audio broadcasts.

This is because of the speed it takes for audio to be generated relative to the time it takes for the TTS request to be processed.
The diagram below is an example of how 4 concurrent calls with different users can be facilitated while only hitting 2 concurrent requests.


  



  
    Where TTS is used to facilitate dialogue, a concurrency limit of 5 can support about 100 broadcasts for balanced conversations between AI agents and human participants.

    For use cases in which the AI agent speaks less frequently than the human, such as customer support interactions, more than 100 simultaneous conversations could be supported.
  

  
    Generally, more than 100 simultaneous character voiceovers can be supported for a concurrency limit of 5.

    The number can vary depending on the character’s dialogue frequency, the length of pauses, and in-game actions between lines.
  

  
    Concurrent dubbing streams generally follow the provided heuristic.

    If the broadcast involves periods of conversational pauses (e.g. because of a soundtrack, visual scenes, etc), more simultaneous dubbing streams than the suggestion may be possible.
  


If you exceed your plan's concurrency limits at any point and you are on the Enterprise plan, model requests may still succeed, albeit slower, on a best efforts basis depending on available capacity.


  To increase your concurrency limit & queue priority, [upgrade your subscription
  plan](https://elevenlabs.io/pricing/api).

  Enterprise customers can request a higher concurrency limit by contacting their account manager.



# June 23, 2025

### Tools migration

- **Conversational AI tools migration**: The way tools in Conversational AI are handled is being migrated, please see the guide here to understand [what's changing and how to migrate](/docs/conversational-ai/customization/tools/agent-tools-deprecation)

### Text to Speech

- **Audio tags automatic removal**: Audio tags are now automatically removed when switching from V3 to V2 models, ensuring optimal compatibility and performance.

### Conversational AI

- **Tools management UI**: Added a new comprehensive [tools management interface](/app/conversational-ai/tools) for creating, configuring, and managing tools across all agents in your workspace.
- **Streamlined agent creation**: Introduced a new [agent creation flow](/app/conversational-ai/new) with improved user experience and better configuration options.
- **Agent duplication**: Added the ability to [duplicate existing agents](/docs/api-reference/agents/duplicate), allowing you to quickly create variations of successful agent configurations.

### SIP Trunking

- **Inbound media encryption**: Added support for configurable [inbound media encryption settings](/docs/conversational-ai/phone-numbers/sip-trunking#configure-transport-and-encryption) for SIP trunk phone numbers, enhancing security options.

### Voices

- **Famous voice category**: Added a new "famous" voice category to the voice library, expanding the available voice options for users.

### Dubbing

- **CSV frame rate control**: Added `csv_fps` parameter to control frame rate when parsing CSV files for dubbing projects, providing more precise timing control.

## SDKs

- **ElevenLabs JavaScript SDK v2.4.0**: Released with new Conversational AI SDK support for Node.js. [View release notes](https://github.com/elevenlabs/elevenlabs-js/releases)
- **ElevenLabs Python SDK v2.5.0**: Updated with enhanced Conversational AI capabilities. [View release notes](https://github.com/elevenlabs/elevenlabs-python/releases)

### API



## New Endpoints

### Conversational AI

- [Duplicate agent](/docs/api-reference/agents/duplicate) - Create a new agent by duplicating an existing one
- [Create tool](/docs/api-reference/tools/create) - Add a new tool to the available tools in the workspace
- [List tools](/docs/api-reference/tools/list) - Retrieve all tools available in the workspace
- [Get tool](/docs/api-reference/tools/get) - Retrieve a specific tool configuration
- [Update tool](/docs/api-reference/tools/update) - Update an existing tool configuration
- [Delete tool](/docs/api-reference/tools/delete) - Remove a tool from the workspace
- [Get tool dependent agents](/docs/api-reference/tools/get-dependent-agents) - List all agents that depend on a specific tool

## Updated Endpoints

### Conversational AI

- **Agent configuration**:

  - Added `built_in_tools` configuration for system tools management
  - Deprecated inline `tools` configuration in favor of `tool_ids` for better tool management

- **Tool system**:
  - Refactored tool configuration structure to use centralized tool management

### Dubbing

- **CSV processing**:
  - [Create dubbing project](/docs/api-reference/dubbing/create) - Added `csv_fps` parameter for custom frame rate control

### SIP Trunking

- **Phone number creation**:
  - [Create SIP trunk phone number](/docs/api-reference/phone-numbers/create) - Added `inbound_media_encryption` parameter for security configuration

### Voice Library

- **Voice categories**:
  - Updated voice response models to include "famous" as a new voice category option
  - Enhanced voice search and filtering capabilities




# June 17, 2025

### Conversational AI

- **Dynamic variables in simulated conversations**: Added support for [dynamic variable population in simulated conversations](/docs/api-reference/agents/simulate-conversation#request.body.simulation_specification.simulated_user_config.dynamic_variables), enabling more flexible and context-aware conversation testing scenarios.
- **MCP server integration**: Introduced comprehensive support for [Model Context Protocol (MCP) servers](/docs/conversational-ai/customization/mcp), allowing agents to connect to external tools and services through standardized protocols with configurable approval policies.
- **Burst pricing for extra concurrency**: Added [bursting capability](/docs/conversational-ai/guides/burst-pricing) for workspace call limits, automatically allowing up to 3x the configured concurrency limit during peak usage for overflow capacity.

### Studio

- **JSON content initialization**: Added support for initializing Studio projects with structured JSON content through the `from_content_json` parameter, enabling programmatic project creation with predefined chapters, blocks, and voice configurations.

### Workspaces

- **Webhook management**: Introduced workspace-level webhook management capabilities, allowing administrators to view, configure, and monitor webhook integrations across the entire workspace with detailed usage tracking and failure diagnostics.

### API



## New Endpoints

### Conversational AI - MCP Servers

- [Create MCP server](/docs/api-reference/mcp/create) - Create a new MCP server configuration in the workspace
- [List MCP servers](/docs/api-reference/mcp/list) - Retrieve all MCP server configurations available in the workspace
- [Get MCP server](/docs/api-reference/mcp/get) - Retrieve a specific MCP server configuration from the workspace
- [Update MCP server approval policy](/docs/api-reference/mcp/approval-policies/update) - Update the approval policy configuration for an MCP server
- [Create MCP server tool approval](/docs/api-reference/mcp/approval-policies/create) - Add approval for a specific MCP tool when using per-tool approval mode
- [Delete MCP server tool approval](/docs/api-reference/mcp/approval-policies/delete) - Remove approval for a specific MCP tool when using per-tool approval mode

### Workspace

- [Get workspace webhooks](/docs/api-reference/webhooks/list) - Retrieve all webhook configurations for the workspace with optional usage information

## Updated Endpoints

### Conversational AI

- **Agent simulation**:

  - [Simulate conversation](/docs/api-reference/agents/simulate-conversation) - Added `dynamic_variables` parameter for populating conversation context with runtime values
  - [Simulate conversation stream](/docs/api-reference/agents/simulate-conversation-stream) - Added `dynamic_variables` parameter for streaming conversation simulations

- **Agent configuration**:

  - [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.call_limits) - Added `bursting_enabled` parameter to control burst pricing for call limits

- **WebSocket events**:

  - Enhanced `ClientEvent` enum to include `mcp_connection_status` for real-time MCP server monitoring

- **Conversation charging**:
  - Added `is_burst` indicator to conversation metadata for tracking burst pricing usage

### Studio

- [Create Studio project](/docs/api-reference/studio/add-project#request.body.from_content_json.from_content_json) - Added `from_content_json` parameter for JSON-based project setup

### User Management

- **User profile**:
  - [Get user](/docs/api-reference/user/get) - Deprecated `can_use_delayed_payment_methods` field in user response model

### Subscription Management

- **Subscription status**:
  - Removed `canceled` and `unpaid` from available subscription status types, streamlining subscription state management




# June 8, 2025

### Text to Speech

- **Eleven v3 (alpha)**: Released Eleven v3 (alpha), our most expressive Text to Speech model, as a research preview.

### Conversational AI

- **Custom voice settings in multi-voice**: Added support for configuring individual [voice settings per supported voice](/docs/conversational-ai/customization/voice/multi-voice-support) in multi-voice agents, allowing fine-tuned control over stability, speed, similarity boost, and streaming latency for each voice.
- **Silent transfer to human in Twilio**: Added backend configuration support for silent (cold) [transfer to human](/docs/conversational-ai/customization/tools/system-tools/transfer-to-human) in the Twilio native integration, enabling seamless handoff without announcing the transfer to callers.
- **Batch calling retry and cancel**: Added support for retrying outbound calls to phone numbers that did not respond during a [batch call](/docs/conversational-ai/phone-numbers/batch-calls), along with the ability to cancel ongoing batch operations for better campaign management.
- **LLM pinning**: Added support for [versioned LLM models with explicit checkpoint identifiers](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm)
- **Custom LLM headers**: Added support for passing [custom headers to custom LLMs](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.custom_llm.request_headers)
- **Fixed issue in non-latin languages**: Fixed an issue causing some conversations in non latin alphabet languages to fail.

### SDKs

- **Python SDK v2.3.0**: Released [Python SDK v2.3.0](https://github.com/elevenlabs/elevenlabs-python/releases/tag/v2.3.0)
- **JavaScript SDK v2.2.0**: Released [JavaScript SDK v2.2.0](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v2.2.0)

### API



## New Endpoints

### Conversational AI

- **Batch Calling**:

  - [Cancel batch call](/docs/api-reference/batch-calling/cancel) - Cancel a running batch call and set all recipients to cancelled status
  - [Retry batch call](/docs/api-reference/batch-calling/retry) - Retry a batch call by setting completed recipients back to pending status

- **Knowledge Base RAG**:
  - [Get document RAG indexes](/docs/api-reference/knowledge-base/get-document-rag-indexes) - Get information about all RAG indexes of a knowledge base document
  - [Delete document RAG index](/docs/api-reference/knowledge-base/delete-document-rag-index) - Delete a specific RAG index for a knowledge base document
  - [RAG index overview](/docs/api-reference/knowledge-base/rag-index-overview) - Get total size and information of RAG indexes used by knowledge base documents

## Updated Endpoints

### Conversational AI

- **Supported Voices**:

  - [Agent configuration](/docs/api-reference/agents/update#request.body.tts.supported_voices) - Added `optimize_streaming_latency`, `stability`, `speed`, and `similarity_boost` parameters for per-voice TTS customization

- **Transfer to Human**:

  - [Agent configuration](/docs/api-reference/agents/update#request.body.system_tools.transfer_to_number) - Added `enable_client_message` parameter to control whether a message is played to the client during transfer

- **Knowledge Base**:

  - Knowledge base documents now use `supported_usages` instead of `prompt_injectable` for better usage mode control
  - RAG index creation now returns enhanced response model with usage information

- **Custom LLM**:

  - [Agent configuration](/docs/api-reference/agents/update#request.body.llm.custom_llm) - Added `request_headers` parameter for custom header configuration

- **Widget Configuration**:

  - [Agent platform settings](/docs/api-reference/agents/update#request.body.platform_settings.widget_config) - Added comprehensive `styles` configuration for widget appearance customization

- **LLM**:

  - Added support for [versioned LLM models](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm) with explicit version identifiers




# June 1, 2025

### Conversational AI

- **Multi-voice support for agents**: Enable conversational AI agents to [dynamically switch between different voices](docs/conversational-ai/customization/voice/multi-voice-support) during conversations for multi-character storytelling, language tutoring, and role-playing scenarios.
- **Claude Sonnet 4 support**: Added [Claude Sonnet 4 as a new LLM option](/docs/conversational-ai/customization/llm#anthropic) for conversational agents, providing enhanced reasoning capabilities and improved performance.
- **Genesys Cloud integration**: Introduced AudioHook Protocol integration for seamless connection with [Genesys Cloud contact center platform](/docs/conversational-ai/phone-numbers/c-caa-s-integrations/genesys).
- **Force delete knowledge base documents**: Added [`force` parameter](/docs/api-reference/knowledge-base/delete#request.query.force.force) to knowledge base document deletion, allowing removal of documents even when used by agents.
- **Multimodal widget**: Added text input and text-only mode defaults for better user experience with [improved widget configuration](/docs/conversational-ai/customization/widget).

### API



## Updated Endpoints

### Speech to Text

- [Create transcript](/docs/api-reference/speech-to-text/convert) - Added `webhook` parameter for asynchronous processing with webhook delivery

### Conversational AI

- **Knowledge Base**:

  - [Delete knowledge base document](/docs/api-reference/knowledge-base/delete) - Added `force` query parameter to delete documents regardless of agent dependencies

- **Widget**:
  - [Widget configuration](/docs/api-reference/widget/get#response.body.widget_config.supports_text_only) - Added text input and text-only mode support for multi-modality




# May 26, 2025

### Forced Aligment

- **Forced alignment improvements**: Fixed a rare failure case in forced alignment processing to improve reliability.

### Voices

- **Live moderated voices filter**: Added `include_live_moderated` query parameter to the shared voices endpoint, allowing you to include or exclude voices that are live moderated.

### Conversational AI

- **Secret dynamic variables**: Added support for specifying dynamic variables as secrets with the `secret__` prefix. Secret dynamic variables can only be used in webhook tool headers and are never sent to an LLM, enhancing security for sensitive data. [Learn more](/docs/conversational-ai/customization/personalization/dynamic-variables#secret-dynamic-variables).
- **Skip turn system tool**: Introduced a new system tool called **skip_turn**. When enabled, the agent will skip its turn if the user explicitly indicates they need a moment to think or perform an action (e.g., "just a sec", "give me a minute"). This prevents turn timeout from being triggered during intentional user pauses. See the [skip turn tool docs](/docs/conversational-ai/customization/tools/system-tools/skip-turn) for more information.
- **Text input support**: Added text input support in websocket connections via "user_message" event with text field. Also added "user_activity" event support to indicate typing or other UI activity, improving agent turn-taking when there's interleaved text and audio input.
- **RAG chunk limit**: Added ability to configure the [maximum number of chunks](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.rag.max_retrieved_rag_chunks_count) collected during RAG retrieval, giving users
  more control over context window usage and costs.
- **Enhanced widget configuration**: Expanded widget customization options to include [text input and text only mode](/docs/api-reference/widget/get#response.body.widget_config.text_only).
- **LLM usage calculator**: Introduced tools to calculate expected LLM token usage and costs for agents, helping with cost estimation and planning.

### Audio Native

- **Accessibility improvements**: Enhanced accessibility for the AudioNative player with multiple improvements:
  - Added aria-labels for all buttons
  - Enabled keyboard navigation for all interactive elements
  - Made progress bar handle focusable and keyboard-accessible
  - Improved focus indicator visibility for better screen reader compatibility

### API



## New Endpoints

- Added 3 new endpoints:
  - [Get Agent Knowledge Base Size](/docs/conversational-ai/api-reference/knowledge-base/size) - Returns the number of pages in the agent's knowledge base.
  - [Calculate Agent LLM Usage](/docs/conversational-ai/api-reference/llm-usage/calculate) - Calculates expected number of LLM tokens needed for the specified agent.
  - [Calculate LLM Usage](/docs/conversational-ai/api-reference/llm-usage/calculate) - Returns a list of LLM models and the expected cost for using them based on the provided values.

## Updated Endpoints

### Voices

- [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_live_moderated` query parameter to `GET /v1/shared-voices` to filter voices by live moderation status.

### Conversational AI

- **Agent Configuration**:

  - Enhanced system tools with new `skip_turn` tool configuration
  - Improved RAG configuration with `max_retrieved_rag_chunks_count` parameter

- **Widget Configuration**:

  - Added support for text-only mode

- **Batch Calling**:
  - Batch call responses now include `phone_provider` field with default value "twilio"

### Text to Speech

- **Voice Settings**:
  - Added `quality` parameter to voice settings for controlling audio generation quality
  - Model response schema updated to include `can_use_quality` field




# May 19, 2025

### SDKs

- **SDKs V2**: Released new v2 SDKs for both [Python](https://github.com/elevenlabs/elevenlabs-python) and [JavaScript](https://github.com/elevenlabs/elevenlabs-js)

### Speech to Text

- **Speech to text logprobs**: The Speech to Text response now includes a `logprob` field for word prediction confidence.

### Billing

- **Improved API error messages**: Enhanced API error messages for subscriptions with failed payments. This provides clearer information if a failed payment has caused a user to reach their quota threshold sooner than expected.

### Conversational AI

- **Batch calls**: Released new batch calling functionality, which allows you to [automate groups of outbound calls](/docs/conversational-ai/phone-numbers/batch-calls).
- **Increased evaluation criteria limit**: The maximum number of evaluation criteria for agent performance evaluation has been increased from 5 to 10.
- **Human-readable IDs**: Introduced human-readable IDs for key Conversational AI entities (e.g., agents, conversations). This improves usability and makes resources easier to identify and manage through the API and UI.
- **Unanswered call tracking**: 'Not Answered' outbound calls are now reliably detected and visible in the conversation history.
- **LLM cost visibility in dashboard**: The Conversational AI dashboard now displays the total and per-minute average LLM costs.
- **Zero retention mode (ZRM) for agents**: Allowed enabling Zero Retention Mode (ZRM) [per agent](/docs/conversational-ai/customization/privacy/zero-retention-mode).
- **Dynamic variables in headers**: Added option of setting dynamic variable as a [header value for tools](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.webhook.api_schema.request_headers.Conv-AI-Dynamic-Variable)
- **Customisable tool timeouts**: Shipped setting different [timeout durations per tool](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.tools.client.response_timeout_secs).

### Workspaces

- **Simplified secret updates**: Workspace secrets can now be updated more granularly using a `PATCH` request via the API, simplifying the management of individual secret values. For technical details, please see the API changes section below.

### API



## New Endpoints

- Added 6 new endpoints:
  - [Get Signed Url](/docs/conversational-ai/api-reference/conversations/get-signed-url) - Get a signed URL to start a conversation with an agent that requires authorization.
  - [Simulate Conversation](/docs/conversational-ai/api-reference/agents/simulate-conversation) - Run a conversation between an agent and a simulated user.
  - [Simulate Conversation (Stream)](/docs/conversational-ai/api-reference/agents/simulate-conversation-stream) - Run and stream a conversation simulation between an agent and a simulated user.
  - [Update Convai Workspace Secret](/docs/conversational-ai/api-reference/workspace/secrets/update-secret) - Update an existing secret for the Convai workspace.
  - [Submit Batch Call Request](/docs/conversational-ai/api-reference/batch-calling/create) - Submit a batch call request to schedule calls for multiple recipients.
  - [Get All Batch Calls for Workspace](/docs/conversational-ai/api-reference/batch-calling/list) - Retrieve all batch calls for the current workspace.

## Updated Endpoints

### Conversational AI

- **Agents & Conversations**:
  - Endpoint `GET /v1/convai/conversation/get_signed_url` (snake_case path) has been deprecated. Use the new `GET /v1/convai/conversation/get-signed-url` (kebab-case path) instead.
- **Phone Numbers**:
  - [Get Phone Number Details](/docs/conversational-ai/api-reference/phone-numbers/get) - Response schema for `GET /v1/convai/phone-numbers/{phone_number_id}` updated to distinct `Twilio` and `SIPTrunk` provider details.
  - [Update Phone Number](/docs/conversational-ai/api-reference/phone-numbers/update) - Response schema for `PATCH /v1/convai/phone-numbers/{phone_number_id}` updated similarly for `Twilio` and `SIPTrunk`.
  - [List Phone Numbers](/docs/conversational-ai/api-reference/phone-numbers/list) - Response schema for `GET /v1/convai/phone-numbers/` list items updated for `Twilio` and `SIPTrunk` providers.

### Text To Speech

- [Text to Speech Endpoints](/docs/api-reference/text-to-speech) - Default `model_id` changed from `eleven_monolingual_v1` to `eleven_multilingual_v2` for the following endpoints:
  - `POST /v1/text-to-speech/{voice_id}/stream`
  - `POST /v1/text-to-speech/{voice_id}/stream-with-timestamps`
  - `POST /v1/text-to-speech/{voice_id}`
  - `POST /v1/text-to-speech/{voice_id}/with-timestamps`

### Voices

- [Get Shared Voices](/docs/api-reference/voices#get-shared-voices) - Added `include_custom_rates` query parameter to `GET /v1/shared-voices`.
- **Schema Updates**:
  - `LibraryVoiceResponseModel` and `VoiceSharingResponseModel` now include an optional `fiat_rate` field (USD per 1000 credits).




# May 12, 2025

### Billing

- **Downgraded Plan Pricing Fix**: Fixed an issue where customers with downgraded subscriptions were shown their current price instead of the correct future price.

### Conversational AI

- **Edit Knowledge Base Document Names**: You can now edit the names of knowledge base documents.  
  See: [Knowledge Base](/docs/conversational-ai/customization/knowledge-base)
- **Conversation Simulation**: Released a [new endpoint](/docs/conversational-ai/api-reference/agents/simulate-conversation) that allows you to test an agent over text

### Studio

- **Export Paragraphs as Zip**: Added support for exporting separated paragraphs in a zip file.  
  See: [Studio](/docs/product-guides/products/studio)

### SDKs

- **Released new SDKs**:
  - [ElevenLabs Python v1.58.1](https://github.com/elevenlabs/elevenlabs-python)
  - [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js)

### API



#### New Endpoints

- [Update metadata for a speaker](/docs/api-reference/dubbing)  
  `PATCH /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}`  
  Amend the metadata associated with a speaker, such as their voice. Both voice cloning and using voices from the ElevenLabs library are supported.

- [Search similar voices for a speaker](/docs/api-reference/dubbing)  
  `GET /v1/dubbing/resource/{dubbing_id}/speaker/{speaker_id}/similar-voices`  
  Fetch the top 10 similar voices to a speaker, including IDs, names, descriptions, and sample audio.

- [Simulate a conversation](/docs/api-reference/agents/simulate-conversation)  
  `POST /v1/convai/agents/{agent_id}/simulate_conversation`  
  Run a conversation between the agent and a simulated user.

- [Simulate a conversation (stream)](/docs/api-reference/agents/simulate-conversation-stream)  
  `POST /v1/convai/agents/{agent_id}/simulate_conversation/stream`  
  Stream a simulated conversation between the agent and a simulated user.

- [Handle outbound call via SIP trunk](/docs/api-reference/sip-trunk/outbound-call)  
  `POST /v1/convai/sip-trunk/outbound-call`  
  Initiate an outbound call using SIP trunking.

#### Updated Endpoints

- [List conversations](/docs/api-reference/conversations/get-conversations)  
  `GET /v1/convai/conversations`  
  Added `call_start_after_unix` query parameter to filter conversations by start date.

- [Update knowledge base document](/docs/api-reference/knowledge-base/update-knowledge-base-document)  
  `PATCH /v1/convai/knowledge-base/{documentation_id}`  
  Now supports updating the name of a document.

- [Text to Speech endpoints](/docs/api-reference/text-to-speech)  
  The default model for all TTS endpoints is now `eleven_multilingual_v2` (was `eleven_monolingual_v1`).

#### Removed Endpoints

- None.




# May 5, 2025

### Dubbing

- **Disable Voice Cloning**: Added an option in the [Dubbing Studio UI](https://elevenlabs.io/app/dubbing) to disable voice cloning when uploading audio, aligning with the existing `disable_voice_cloning` API parameter.

### Billing

- **Quota Exceeded Error**: Improved error messaging for exceeding character limits. Users attempting to generate audio beyond their quota within a short billing window will now receive a clearer `401 unauthorized: This request exceeds your quota limit of...` error message indicating the limit has been exceeded.

## SDKs

- **Released new SDKs**: Added [ElevenLabs Python v1.58.0](https://github.com/elevenlabs/elevenlabs-python) and [ElevenLabs JS v1.58.0](https://github.com/elevenlabs/elevenlabs-js) to fix a breaking change that had been mistakenly shipped


# April 28, 2025

### Conversational AI

- **Custom Dashboard Charts**: The Conversational AI Dashboard can now be extended with custom charts displaying the results of evaluation criteria over time. See the new [GET](/docs/api-reference/workspace/dashboard/get) and [PATCH](/docs/api-reference/workspace/dashboard/update) endpoints for managing dashboard settings.
- **Call History Filtering**: Added the ability to filter the call history by start date using the new `call_start_before_unix` parameter in the [List Conversations](/docs/conversational-ai/api-reference/conversations/get-conversations#request.query.call_start_before_unix) endpoint. [Try it here](https://elevenlabs.io/app/conversational-ai/history).
- **Server Tools**: Added option of making PUT requests in [server tools](/docs/conversational-ai/customization/tools/server-tools)
- **Transfer to human**: Added call forwarding functionality to support forwarding to operators, see docs [here](/docs/conversational-ai/customization/tools/system-tools/transfer-to-human)
- **Language detection**: Fixed an issue where the [language detection system tool](/docs/conversational-ai/customization/tools/system-tools/language-detection) would trigger on a user replying yes in non-English language.

### Usage Analytics

- **Custom Aggregation**: Added an optional `aggregation_interval` parameter to the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint to control the interval over which to aggregate character usage (hour, day, week, month, or cumulative).
- **New Metric Breakdowns**: The Usage Analytics section now supports additional metric breakdowns including `minutes_used`, `request_count`, `ttfb_avg`, and `ttfb_p95`, selectable via the new `metric` parameter in the [Get Usage Metrics](/docs/api-reference/usage/get) endpoint. Furthermore, you can now get a breakdown and filter by `request_queue`.

### API



## New Endpoints

- Added 2 new endpoints for managing Conversational AI dashboard settings:
  - [Get Dashboard Settings](/docs/api-reference/workspace/dashboard/get) - Retrieves custom chart configurations for the ConvAI dashboard.
  - [Update Dashboard Settings](/docs/api-reference/workspace/dashboard/update) - Updates custom chart configurations for the ConvAI dashboard.

## Updated Endpoints

### Audio Generation (TTS, S2S, SFX, Voice Design)

- Updated endpoints to support new `output_format` option `pcm_48000`:
  - [Text to Speech](/docs/api-reference/text-to-speech/convert) (`POST /v1/text-to-speech/{voice_id}`)
  - [Text to Speech with Timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/with-timestamps`)
  - [Text to Speech Stream](/docs/api-reference/text-to-speech/convert-as-stream) (`POST /v1/text-to-speech/{voice_id}/stream`)
  - [Text to Speech Stream with Timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) (`POST /v1/text-to-speech/{voice_id}/stream/with-timestamps`)
  - [Speech to Speech](/docs/api-reference/speech-to-speech/convert) (`POST /v1/speech-to-speech/{voice_id}`)
  - [Speech to Speech Stream](/docs/api-reference/speech-to-speech/stream) (`POST /v1/speech-to-speech/{voice_id}/stream`)
  - [Sound Generation](/docs/api-reference/text-to-sound-effects/convert) (`POST /v1/sound-generation`)
  - [Create Voice Previews](/docs/api-reference/legacy/voices/create-previews) (`POST /v1/text-to-voice/create-previews`)

### Usage Analytics

- Updated usage metrics endpoint:
  - [Get Usage Metrics](/docs/api-reference/usage/get) (`GET /v1/usage/character-stats`) - Added optional `aggregation_interval` and `metric` query parameters.

### Conversational AI

- Updated conversation listing endpoint:
  - [List Conversations](/docs/conversational-ai/api-reference/conversations/get-conversations#request.query.call_start_before_unix) (`GET /v1/convai/conversations`) - Added optional `call_start_before_unix` query parameter for filtering by start date.

## Schema Changes

### Conversational AI

- Added detailed LLM usage and pricing information to conversation [charging and history models](/docs/conversational-ai/api-reference/conversations/get-conversation#response.body.metadata.charging).
- Added `tool_latency_secs` to [tool result schemas](/docs/api-reference/conversations/get-conversation#response.body.transcript.tool_results.tool_latency_secs)
- Added `access_info` to [`GET /v1/convai/agents/{agent_id}`](/docs/api-reference/agents/get#response.body.access_info)




# April 21, 2025

### Professional Voice Cloning (PVC)

- **PVC API**: Introduced a comprehensive suite of API endpoints for managing Professional Voice Clones (PVC). You can now programmatically create voices, add/manage/delete audio samples, retrieve audio/waveforms, manage speaker separation, handle verification, and initiate training. For a full list of new endpoints check the API changes summary below or read the PVC API reference [here](/docs/api-reference/voices/pvc/create).

### Speech to Text

- **Enhanced Export Options**: Added options to include or exclude timestamps and speaker IDs when exporting Speech to Text results in segmented JSON format via the API.

### Conversational AI

- **New LLM Models**: Added support for new GPT-4.1 models: `gpt-4.1`, `gpt-4.1-mini`, and `gpt-4.1-nano` [here](/docs/api-reference/agents/create#request.body.conversation_config.agent.prompt.llm)
- **VAD Score**: Added a new client event which sends VAD scores to the client, see reference [here](/docs/conversational-ai/customization/events/client-events#vad_score)

### Workspace

- **Member Management**: Added a new API endpoint to allow administrators to delete workspace members [here](/docs/api-reference/workspace/delete-member)

### API



## New Endpoints

- Added 16 new endpoints:
  - [Delete Member](/docs/api-reference/workspace/delete-member) - Allows deleting workspace members.
  - [Create PVC Voice](/docs/api-reference/voices/pvc/create) - Creates a new PVC voice.
  - [Edit PVC Voice](/docs/api-reference/voices/pvc/update) - Edits PVC voice metadata.
  - [Add Samples To PVC Voice](/docs/api-reference/voices/pvc/samples/create) - Adds audio samples to a PVC voice.
  - [Update PVC Voice Sample](/docs/api-reference/voices/pvc/samples/update) - Updates a PVC voice sample (noise removal, speaker selection, trimming).
  - [Delete PVC Voice Sample](/docs/api-reference/voices/pvc/samples/delete) - Deletes a sample from a PVC voice.
  - [Retrieve Voice Sample Audio](/docs/api-reference/voices/pvc/samples/get-audio) - Retrieves audio for a PVC voice sample.
  - [Retrieve Voice Sample Visual Waveform](/docs/api-reference/voices/pvc/samples/get-waveform) - Retrieves the visual waveform for a PVC voice sample.
  - [Retrieve Speaker Separation Status](/docs/api-reference/voices/pvc/samples/get-speaker-separation-status) - Gets the status of speaker separation for a sample.
  - [Start Speaker Separation](/docs/api-reference/voices/pvc/samples/separate-speakers) - Initiates speaker separation for a sample.
  - [Retrieve Separated Speaker Audio](/docs/api-reference/voices/pvc/samples/get-separated-speaker-audio) - Retrieves audio for a specific separated speaker.
  - [Get PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha) - Gets the captcha for PVC voice verification.
  - [Verify PVC Voice Captcha](/docs/api-reference/voices/pvc/verification/captcha/verify) - Submits captcha verification for a PVC voice.
  - [Run PVC Training](/docs/api-reference/voices/pvc/train) - Starts the training process for a PVC voice.
  - [Request Manual Verification](/docs/api-reference/voices/pvc/verification/request) - Requests manual verification for a PVC voice.

## Updated Endpoints

### Speech to Text

- Updated endpoint with changes:
  - [Create Forced Alignment Task](/docs/api-reference/forced-alignment/create#request.body.enabled_spooled_file) - Added `enabled_spooled_file` parameter to allow streaming large files (`POST /v1/forced-alignment`).

## Schema Changes

### Conversational AI

- `GET conversation details`: Added `has_audio`, `has_user_audio`, `has_response_audio` boolean fields [here](/docs/api-reference/conversations/get-conversation#response.body.has_audio)

### Dubbing

- `GET dubbing resource `: Added `status` field to each render [here](/docs/api-reference/dubbing/get-dubbing-resource#response.body.renders.status)




# April 14, 2025

### Voices

- **New PVC flow**: Added new flow for Professional Voice Clone creation, try it out [here](https://elevenlabs.io/app/voice-lab?action=create&creationType=professionalVoiceClone)

### Conversational AI

- **Agent-agent transfer:** Added support for agent-to-agent transfers via a new system tool, enabling more complex conversational flows. See the [Agent Transfer tool documentation](/docs/conversational-ai/customization/tools/system-tools/agent-transfer) for details.
- **Enhanced tool debugging:** Improved how tool execution details are displayed in the conversation history for easier debugging.
- **Language detection fix:** Resolved an issue regarding the forced calling of the language detection tool.

### Dubbing

- **Render endpoint:** Introduced a new endpoint to regenerate audio or video renders for specific languages within a dubbing project. This automatically handles missing transcriptions or translations. See the [Render Dub endpoint](/docs/api-reference/dubbing/render-dub).
- **Increased size limit:** Raised the maximum allowed file size for dubbing projects to 1 GiB.

### API



## New Endpoints

- [Added render dub endpoint](/docs/api-reference/dubbing/render-dub) - Regenerate dubs for a specific language.

## Updated Endpoints

### Pronunciation Dictionaries

- Updated the response for the [`GET /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/`](/docs/api-reference/pronunciation-dictionary/get#response.body.permission_on_resource) endpoint and related components to include the `permission_on_resource` field.

### Speech to Text

- Updated [Speech to Text endpoint](/docs/api-reference/speech-to-text/convert) (`POST /v1/speech-to-text`):
  - Added `cloud_storage_url` parameter to allow transcription directly from public S3 or GCS URLs (up to 2GB).
  - Made the `file` parameter optional; exactly one of `file` or `cloud_storage_url` must now be provided.

### Speech to Speech

- Added optional `file_format` parameter (`pcm_s16le_16` or `other`) for lower latency with PCM input to [`POST /v1/speech-to-speech/{voice_id}`](/docs/api-reference/speech-to-speech/convert)

### Conversational AI

- Updated components to support [agent-agent transfer](/docs/conversational-ai/customization/tools/system-tools/agent-transfer) tool

### Voices

- Updated [`GET /v1/voices/{voice_id}`](/docs/api-reference/voices/get#response.body.samples.trim_start) `samples` field to include optional `trim_start` and `trim_end` parameters.

### AudioNative

- Updated [`Get /v1/audio-native/{project_id}/settings`](/docs/api-reference/audio-native/get-settings#response.body.settings.status) to include `status` field (`processing` or `ready`).




# April 7, 2025

## Speech to text

- **`scribe_v1_experimental`**: Launched a new experimental preview of the [Scribe v1 model](/docs/capabilities/speech-to-text) with improvements including improved performance on audio files with multiple languages, reduced hallucinations when audio is interleaved with silence, and improved audio tags. The new model is available via the API under the model name [`scribe_v1_experimental`](/docs/api-reference/speech-to-text/convert#request.body.model_id)

### Text to speech

- **A-law format support**: Added [a-law format](/docs/api-reference/text-to-speech/convert#request.query.output_format) with 8kHz sample rate to enable integration with European telephony systems.
- **Fixed quota issues**: Fixed a database bug that caused some requests to be mistakenly rejected as exceeding their quota.

### Conversational AI

- **Document type filtering**: Added support for filtering knowledge base documents by their [type](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.types) (file, URL, or text).
- **Non-audio agents**: Added support for conversational agents that don't output audio but still send response transcripts and can use tools. Non-audio agents can be enabled by removing the audio [client event](/docs/conversational-ai/customization/events/client-events).
- **Improved agent templates**: Updated all agent templates with enhanced configurations and prompts. See more about how to improve system prompts [here](/docs/conversational-ai/best-practices/prompting-guide).
- **Fixed stuck exports**: Fixed an issue that caused exports to be stuck for extended periods.

### Studio

- **Fixed volume normalization**: Fixed issue with streaming project snapshots when volume normalization is enabled.

### New API endpoints

- **Forced alignment**: Added new [forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text, perfect for subtitle generation.
- **Batch calling**: Added batch calling [endpoint](/docs/conversational-ai/api-reference/batch-calling/create) for scheduling calls to multiple recipients

### API



## New Endpoints

- Added [Forced alignment](/docs/api-reference/forced-alignment) endpoint for aligning audio with text
- Added dedicated endpoints for knowledge base document types:
  - [Create text document](/docs/api-reference/knowledge-base/create-from-text)
  - [Create file document](/docs/api-reference/knowledge-base/create-from-file)
  - [Create URL document](/docs/api-reference/knowledge-base/create-from-url)

## Updated Endpoints

### Text to Speech

- Added a-law format (8kHz) to all audio endpoints:
  - [Text to speech](/docs/api-reference/text-to-speech/convert)
  - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream)
  - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps)
  - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps)
  - [Speech to speech](/docs/api-reference/speech-to-speech)
  - [Stream speech to speech](/docs/api-reference/speech-to-speech/stream)
  - [Create voice previews](/docs/api-reference/legacy/voices/create-previews)
  - [Sound generation](/docs/api-reference/sound-generation)

### Voices

- [Get voices](/docs/api-reference/voices/search) - Added `collection_id` parameter for filtering voices by collection

### Knowledge Base

- [Get knowledge base](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Added `types` parameter for filtering documents by type
- General endpoint for creating knowledge base documents marked as deprecated in favor of specialized endpoints

### User Subscription

- [Get user subscription](/docs/api-reference/user/subscription/get) - Added `professional_voice_slots_used` property to track number of professional voices used in a workspace

### Conversational AI

- Added `silence_end_call_timeout` parameter to set maximum wait time before terminating a call
- Removed `/v1/convai/agents/{agent_id}/add-secret` endpoint (now handled by workspace secrets endpoints)




# March 31, 2025

### Text to speech

- **Opus format support**: Added support for Opus format with 48kHz sample rate across multiple bitrates (32-192 kbps).
- **Improved websocket error handling**: Updated TTS websocket API to return more accurate error codes (1011 for internal errors instead of 1008) for better error identification and SLA monitoring.

### Conversational AI

- **Twilio outbound**: Added ability to natively run outbound calls.
- **Post-call webhook override**: Added ability to override post-call webhook settings at the agent level, providing more flexible configurations.
- **Large knowledge base document viewing**: Enhanced the knowledge base interface to allow viewing the entire content of large RAG documents.
- **Added call SID dynamic variable**: Added `system__call_sid` as a system dynamic variable to allow referencing the call ID in prompts and tools.

### Studio

- **Actor Mode**: Added Actor Mode in Studio, allowing you to use your own voice recordings to direct the way speech should sound in Studio projects.
- **Improved keyboard shortcuts**: Updated keyboard shortcuts for viewing settings and editor shortcuts to avoid conflicts and simplified shortcuts for locking paragraphs.

### Dubbing

- **Dubbing duplication**: Made dubbing duplication feature available to all users.
- **Manual mode foreground generation**: Added ability to generate foreground audio when using manual mode with a file and CSV.

### Voices

- **Enhanced voice collections**: Improved voice collections with visual upgrades, language-based filtering, navigation breadcrumbs, collection images, and mouse dragging for carousel navigation.
- **Locale filtering**: Added locale parameter to shared voices endpoint for more precise voice filtering.

### API



## Updated Endpoints

### Text to Speech

- Updated Text to Speech endpoints:
  - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Added `apply_language_text_normalization` parameter for improved text pronunciation in supported languages (currently Japanese)
  - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added `apply_language_text_normalization`
  - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added `apply_language_text_normalization`
  - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added `apply_language_text_normalization`

### Audio Format

- Added Opus format support to multiple endpoints:
  - [Text to speech](/docs/api-reference/text-to-speech/convert) - Added support for Opus format with 48kHz sample rate at multiple bitrates (32, 64, 96, 128, 192 kbps)
  - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Added Opus format options
  - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Added Opus format options
  - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Added Opus format options
  - [Speech to speech](/docs/api-reference/speech-to-speech) - Added Opus format options
  - [Stream speech to speech](/docs/api-reference/speech-to-speech/stream) - Added Opus format options
  - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added Opus format options
  - [Sound generation](/docs/api-reference/sound-generation) - Added Opus format options

### Conversational AI

- Updated Conversational AI endpoints:
  - [Delete agent](/docs/api-reference/agents/delete) - Changed success response code from 200 to 204
  - [Updated RAG embedding model options](docs/api-reference/knowledge-base/rag-index-status#request.body.model) - replaced `gte_Qwen2_15B_instruct` with `multilingual_e5_large_instruct`

### Voices

- Updated Voice endpoints:
  - [Get shared voices](/docs/api-reference/voice-library/get-shared) - Added locale parameter for filtering voices by language region

### Dubbing

- Updated Dubbing endpoint:
  - [Dub a video or audio file](/docs/api-reference/dubbing/create) - Renamed beta feature `use_replacement_voices_from_library` parameter to `disable_voice_cloning` for clarity




# March 24, 2025

### Voices

- **List Voices V2**: Added a new [V2 voice search endpoint](/docs/api-reference/voices/search) with better search and additional filtering options

### Conversational AI

- **Native outbound calling**: Added native outbound calling for Twilio-configured numbers, eliminating the need for complex setup configurations. Outbound calls are now visible in the Call History page.
- **Automatic language detection**: Added new system tool for automatic language detection that enables agents to switch languages based on both explicit user requests ("Let's talk in Spanish") and implicit language in user audio.
- **Pronunciation dictionary improvements**: Fixed phoneme tags in pronunciation dictionaries to work correctly with conversational AI.
- **Large RAG document viewing**: Added ability to view the entire content of large RAG documents in the knowledge base.
- **Customizable widget controls**: Updated UI to include an optional mute microphone button and made widget icons customizable via slots.

### Sound Effects

- **Fractional duration support**: Fixed an issue where users couldn't enter fractional values (like 0.5 seconds) for sound effect generation duration.

### Speech to Text

- **Repetition handling**: Improved detection and handling of repetitions in speech-to-text processing.

### Studio

- **Reader publishing fixes**: Added support for mp3_44100_192 output format (high quality) so users below Publisher tier can export audio to Reader.

### Mobile

- **Core app signup**: Added signup endpoints for the new Core mobile app.

### API



## New Endpoints

- Added 5 new endpoints:
  - [List voices (v2)](/docs/api-reference/voices/search) - Enhanced voice search capabilities with additional filtering options
  - [Initiate outbound call](/docs/api-reference/conversations/outbound-call) - New endpoint for making outbound calls via Twilio integration
  - [Add pronunciation dictionary from rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Create pronunciation dictionaries directly from rules without file upload
  - [Get knowledge base document content](/docs/api-reference/knowledge-base/get-knowledge-base-document-content) - Retrieve full document content from the knowledge base
  - [Get knowledge base document chunk](/docs/api-reference/knowledge-base/get-knowledge-base-document-part-by-id) - Retrieve specific chunks from knowledge base documents

## Updated Endpoints

### Conversational AI

- Updated Conversational AI endpoints:
  - [Create agent](/docs/api-reference/agents/create) - Added `mic_muting_enabled` property for UI control and `workspace_overrides` property for workspace-specific configurations
  - [Update agent](/docs/api-reference/agents/update) - Added `workspace_overrides` property for customizing agent behavior per workspace
  - [Get agent](/docs/api-reference/agents/get) - Added `workspace_overrides` property to the response
  - [Get widget](/docs/api-reference/widget/get-agent-widget) - Added `mic_muting_enabled` property for controlling microphone muting in the widget UI
  - [Get conversation](/docs/api-reference/conversations/get-conversation) - Added rag information to view knowledge base content used during conversations
  - [Create phone number](/docs/api-reference/phone-numbers/create) - Replaced generic structure with specific twilio phone number and sip trunk options
  - [Compute RAG index](/docs/conversational-ai/api-reference/knowledge-base/compute-rag-index) - Removed `force_reindex` query parameter for more controlled indexing
  - [List knowledge base documents](/docs/api-reference/knowledge-base/get-knowledge-base-list) - Changed response structure to support different document types
  - [Get knowledge base document](/docs/api-reference/knowledge-base/get) - Modified to return different response models based on document type

### Text to Speech

- Updated Text to Speech endpoints:
  - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made properties optional, including `stability` and `similarity` settings
  - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made voice settings properties optional for more flexible streaming requests
  - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made settings optional and modified `pronunciation_dictionary_locators` property
  - [Stream with timestamps](/docs/api-reference/text-to-speech/stream-with-timestamps) - Made voice settings properties optional for more flexible requests

### Speech to Text

- Updated Speech to Text endpoint:
  - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Removed `biased_keywords` property from form data and improved internal repetition detection algorithm

### Voice Management

- Updated Voice endpoints:
  - [Get voices](/docs/api-reference/voices/search) - Updated voice settings properties in the response
  - [Get default voice settings](/docs/api-reference/voices/settings/get-default) - Made `stability` and `similarity` properties optional
  - [Get voice settings](/docs/api-reference/voices/settings/get) - Made numeric properties optional for more flexible configuration
  - [Edit voice settings](/docs/api-reference/voices/settings/update) - Made `stability` and `similarity` settings optional
  - [Create voice](/docs/api-reference/voices/ivc/create) - Modified array properties to accept null values
  - [Create voice from preview](/docs/api-reference/legacy/voices/create-voice-from-preview) - Updated voice settings model with optional properties

### Studio

- Updated Studio endpoints:
  - [Get project](/docs/api-reference/studio/get-project) - Added `version_rules_num` to project metadata
  - [Get project snapshot](/docs/api-reference/studio/get-project-snapshot) - Removed `status` property
  - [Create pronunciation dictionaries](/docs/api-reference/studio/create-pronunciation-dictionaries) - Modified `pronunciation_dictionary_locators` property and string properties to accept null values

### Pronunciation Dictionary

- Updated Pronunciation Dictionary endpoints:
  - [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Added `sort` and `sort_direction` query parameters, plus `latest_version_rules_num` and `integer` properties to response
  - [Get pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/get) - Added `latest_version_rules_num` and `integer` properties to response
  - [Add from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Added `version_rules_num` property to response for tracking rules quantity
  - [Add rules](/docs/api-reference/pronunciation-dictionary/add-rules) - Added `version_rules_num` to response for rules tracking
  - [Remove rules](/docs/api-reference/pronunciation-dictionary/remove-rules) - Added `version_rules_num` to response for rules tracking




# March 17, 2025

### Conversational AI

- **Default LLM update**: Changed the default agent LLM from Gemini 1.5 Flash to Gemini 2.0 Flash for improved performance.
- **Fixed incorrect conversation abandons**: Improved detection of conversation continuations, preventing premature abandons when users repeat themselves.
- **Twilio information in history**: Added Twilio call details to conversation history for better tracking.
- **Knowledge base redesign**: Redesigned the knowledge base interface.
- **System dynamic variables**: Added system dynamic variables to use time, conversation id, caller id and other system values as dynamic variables in prompts and tools.
- **Twilio client initialisation**: Adds an agent level override for conversation initiation client data twilio webhook.
- **RAG chunks in history**: Added retrieved chunks by RAG to the call transcripts in the [history view](https://elevenlabs.io/app/conversational-ai/history).

### Speech to Text

- **Reduced pricing**: Reduced the pricing of our Scribe model, see more [here](/docs/capabilities/speech-to-text#pricing).
- **Improved VAD detection**: Enhanced Voice Activity Detection with better pause detection at segment boundaries and improved handling of silent segments.
- **Enhanced diarization**: Improved speaker clustering with a better ECAPA model, symmetric connectivity matrix, and more selective speaker embedding generation.
- **Fixed ASR bugs**: Resolved issues with VAD rounding, silence and clustering that affected transcription accuracy.

### Studio

- **Disable publishing UI**: Added ability to disable the publishing interface for specific workspace members to support enterprise workflows.
- **Snapshot API improvement**: Modified endpoints for project and chapter snapshots to return an empty list instead of throwing errors when snapshots can't be downloaded.
- **Disabled auto-moderation**: Turned off automatic moderation based on Text to Speech generations in Studio.

### Workspaces

- **Fixed API key editing**: Resolved an issue where editing workspace API keys would reset character limits to zero, causing the keys to stop working.
- **Optimized free subscriptions**: Fixed an issue with refreshing free subscription character limits,

### API



## New Endpoints

- Added 3 new endpoints:
  - [Get workspace resource](/docs/api-reference/workspace/get-resource)
  - [Share workspace resource](/docs/api-reference/workspace/share-workspace-resource)
  - [Unshare workspace resource](/docs/api-reference/workspace/unshare-workspace-resource)

## Updated Endpoints

### Dubbing

- Updated Dubbing endpoints:
  - [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `use_replacement_voices_from_library` property and made `source_path`, `target_language`, `source_language` nullable
  - [Resource dubbing](/docs/api-reference/dubbing/dub-segments) - Made `language_codes` array nullable
  - [Add language to dubbing resource](/docs/api-reference/dubbing/add-language-to-resource) - Made `language_code` nullable
  - [Translate dubbing resource](/docs/api-reference/dubbing/translate-segments) - Made `target_languages` array nullable
  - [Update dubbing segment](/docs/api-reference/dubbing/update-segment-language) - Made `start_time` and `end_time` nullable

### Project Management

- Updated Project endpoints:
  - [Add project](/docs/api-reference/studio/add-project) - Made `metadata`, `project_name`, `description` nullable
  - [Create podcast](/docs/api-reference/studio/create-podcast) - Made `title`, `description`, `author` nullable
  - [Get project](/docs/api-reference/studio/get-project) - Made `last_modified_at`, `created_at`, `project_name` nullable
  - [Add chapter](/docs/api-reference/studio/add-chapter) - Made `chapter_id`, `word_count`, `statistics` nullable
  - [Update chapter](/docs/api-reference/studio/update-chapter) - Made `content` and `blocks` properties nullable

### Conversational AI

- Updated Conversational AI endpoints:
  - [Update agent](/docs/api-reference/agents/update) - Made `conversation_config`, `platform_settings` nullable and added `workspace_overrides` property
  - [Create agent](/docs/api-reference/agents/create) - Made `agent_name`, `prompt`, `widget_config` nullable and added `workspace_overrides` property
  - [Add to knowledge base](/docs/api-reference/knowledge-base/create-from-url) - Made `document_name` nullable
  - [Get conversation](/docs/api-reference/conversations/get-conversation) - Added `twilio_call_data` model and made `transcript`, `metadata` nullable

### Text to Speech

- Updated Text to Speech endpoints:
  - [Convert text to speech](/docs/api-reference/text-to-speech/convert) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property
  - [Stream text to speech](/docs/api-reference/text-to-speech/convert-as-stream) - Made `voice_settings`, `text_input` nullable and deprecated `use_pvc_as_ivc` property
  - [Convert with timestamps](/docs/api-reference/text-to-speech/convert-with-timestamps) - Made `character_alignment` and `word_alignment` nullable

### Voice Management

- Updated Voice endpoints:
  - [Create voice previews](/docs/api-reference/legacy/voices/create-previews) - Added `loudness`, `quality`, `guidance_scale` properties
  - [Create voice from preview](/docs/api-reference/legacy/voices/create-voice-from-preview) - Added `speaker_separation` properties and made `voice_id`, `name`, `labels` nullable
  - [Get voice](/docs/api-reference/voices/get) - Added `speaker_boost`, `speaker_clarity`, `speaker_isolation` properties

### Speech to Text

- Updated Speech to Text endpoint:
  - [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `biased_keywords` property

### Other Updates

- [Download history](/docs/api-reference/history/download) - Added application/zip content type and 400 response
- [Add pronunciation dictionary from file](/docs/api-reference/pronunciation-dictionary/add-from-file) - Made `dictionary_name` and `description` nullable




# March 10, 2025

### Conversational AI

- **HIPAA compliance**: Conversational AI is now [HIPAA compliant](/docs/conversational-ai/legal/hipaa) on appropriate plans, when a BAA is signed, zero-retention mode is enabled and appropriate LLMs are used. For access please [contact sales](/contact-sales)
- **Cascade LLM**: Added dynamic dispatch during the LLM step to other LLMs if your default LLM fails. This results in higher latency but prevents the turn failing.
- **Better error messages**: Added better error messages for websocket failures.
- **Audio toggling**: Added ability to select only user or agent audio in the conversation playback.

### Scribe

- **HIPAA compliance**: Added a zero retention mode to Scribe to be HIPAA compliant.
- **Diarization**: Increased time length of audio files that can be transcribed with diarization from 8 minutes to 2 hours.
- **Cheaper pricing**: Updated Scribe's pricing to be cheaper, as low as $0.22 per hour for the Business tier.
- **Memory usage**: Shipped improvements to Scribe's memory usage.
- **Fixed timestamps**: Fixed an issue that was causing incorrect timestamps to be returned.

### Text to Speech

- **Pronunciation dictionaries**: Fixed pronunciation dictionary rule application for replacements that contain symbols.

### Dubbing

- **Studio support**: Added support for creating dubs with `dubbing_studio` enabled, allowing for more advanced dubbing workflows beyond one-off dubs.

### Voices

- **Verification**: Fixed an issue where users on probation could not verify their voice clone.

### API



## New Endpoints

- Added 7 new endpoints:
  - [Add a shared voice to your collection](/docs/api-reference/voice-library/share)
  - [Archive a project snapshot](/docs/api-reference/studio/archive-snapshot)
  - [Update a project](/docs/api-reference/studio/edit-project)
  - [Create an Audio Native enabled project](/docs/api-reference/audio-native/create)
  - [Get all voices](/docs/api-reference/voices/search)
  - [Download a pronunciation dictionary](/docs/api-reference/pronunciation-dictionary/download)
  - [Get Audio Native project settings](/docs/api-reference/audio-native/get-settings)

## Updated Endpoints

### Studio Projects

- Updated Studio project endpoints to add `source_type` property and deprecate `quality_check_on` and `quality_check_on_when_bulk_convert` properties:
  - [Get projects](/docs/api-reference/studio/get-projects)
  - [Get project](/docs/api-reference/studio/get-project)
  - [Add project](/docs/api-reference/studio/add-project)
  - [Update content](/docs/api-reference/studio/update-content)
  - [Create podcast](/docs/api-reference/studio/create-podcast)

### Voice Management

- Updated Voice endpoints with several property changes:
  - [Get voice](/docs/api-reference/voices/get) - Made several properties optional and added `preview_url`
  - [Create voice](/docs/api-reference/voices/ivc/create) - Made several properties optional and added `preview_url`
  - [Create voice from preview](/docs/api-reference/legacy/voices/create-voice-from-preview) - Made several properties optional and added `preview_url`
  - [Get similar voices](/docs/api-reference/voices/get-similar-library-voices) - Made `language`, `description`, `preview_url`, and `rate` properties optional

### Conversational AI

- Updated Conversational AI agent endpoints:
  - [Update agent](/docs/api-reference/agents/update) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties
  - [Create agent](/docs/api-reference/agents/create) - Modified `conversation_config`, `agent`, `prompt`, platform_settings, widget properties and added `shareable_page_show_terms`
  - [Get agent](/docs/api-reference/agents/get) - Modified `conversation_config`, `agent`, `platform_settings`, and `widget` properties
  - [Get widget](/docs/api-reference/widget/get-agent-widget) - Modified `widget_config` property and added `shareable_page_show_terms`

### Knowledge Base

- Updated Knowledge Base endpoints to add metadata property:
  - [List knowledge base documents](/docs/api-reference/knowledge-base/list#response.body.metadata)
  - [Get knowledge base document](/docs/api-reference/knowledge-base/get-document#response.body.metadata)

### Other Updates

- [Dub a video or audio file](/docs/api-reference/dubbing/create) - Added `dubbing_studio` property
- [Convert text to sound effects](/docs/api-reference/text-to-sound-effects/convert) - Added `output_format` query parameter
- [Convert speech to text](/docs/api-reference/speech-to-text/convert) - Added `enable_logging` query parameter
- [Get secrets](/docs/api-reference/workspace/secrets/list) - Modified `secrets` and `used_by` properties
- [Get all pronunciation dictionaries](/docs/api-reference/pronunciation-dictionary/get-all) - Made `next_cursor` property optional

## Removed Endpoints

- Temporarily removed Conversational AI tools endpoints:

  - Get tool
  - List tools
  - Update tool
  - Create tool
  - Delete tool




# March 3, 2025

### Dubbing

- **Scribe for speech recognition**: Dubbing Studio now uses Scribe by default for speech recognition to improve accuracy.

### Speech to Text

- **Fixes**: Shipped several fixes improving the stability of Speech to Text.

### Conversational AI

- **Speed control**: Added speed control to an agent's settings in Conversational AI.
- **Post call webhook**: Added the option of sending [post-call webhooks](/docs/conversational-ai/customization/personalization/post-call-webhooks) after conversations are completed.
- **Improved error messages**: Added better error messages to the Conversational AI websocket.
- **Claude 3.7 Sonnet**: Added Claude 3.7 Sonnet as a new LLM option in Conversational AI.

### API



#### New Endpoints

- Added new Dubbing resource management endpoints:
  - for adding [languages to dubs](/docs/api-reference/dubbing/resources/add-language)
  - for retrieving [dubbing resources](/docs/api-reference/dubbing/resources/get-resource)
  - for creating [segments](/docs/api-reference/dubbing/resources/create-segment)
  - for modifying [segments](/docs/api-reference/dubbing/resources/update-segment)
  - for removing [segments](/docs/api-reference/dubbing/resources/delete-segment)
  - for dubbing [segments](/docs/api-reference/dubbing/resources/dub-segment)
  - for transcribing [segments](/docs/api-reference/dubbing/resources/transcribe-segment)
  - for translating [segments](/docs/api-reference/dubbing/resources/translate-segment)
- Added Knowledge Base RAG indexing [endpoint](/docs/conversational-ai/api-reference/knowledge-base/compute-rag-index)
- Added Studio snapshot retrieval endpoints for [projects](/docs/api-reference/studio/get-project-snapshot) and [chapters](/docs/api-reference/studio/get-chapter-snapshot)

#### Updated Endpoints

- Added `prompt_injectable` property to knowledge base [endpoints](docs/api-reference/knowledge-base/get#response.body.prompt_injectable)
- Added `name` property to Knowledge Base document [creation](/docs/api-reference/knowledge-base/create-from-url#request.body.name) and [retrieval](/docs/api-reference/knowledge-base/get-document#response.body.name) endpoints:
- Added `speed` property to [agent creation](/docs/api-reference/agents/create#request.body.conversation_config.tts.speed)
- Removed `secrets` property from agent endpoints (now handled by dedicated secrets endpoints)
- Added [secret deletion endpoint](/docs/api-reference/workspace/secrets/delete) for removing secrets
- Removed `secrets` property from settings [endpoints](/docs/api-reference/workspace/get)




# February 25, 2025

### Speech to Text

- **ElevenLabs launched a new state of the art [Speech to Text API](/docs/capabilities/speech-to-text) available in 99 languages.**

### Text to Speech

- **Speed control**: Added speed control to the Text to Speech API.

### Studio

- **Auto-assigned projects**: Increased token limits for auto-assigned projects from 1 month to 3 months worth of tokens, addressing user feedback about working on longer projects.
- **Language detection**: Added automatic language detection when generating audio for the first time, with suggestions to switch to Eleven Turbo v2.5 for languages not supported by Multilingual v2 (Hungarian, Norwegian, Vietnamese).
- **Project export**: Enhanced project exporting in ElevenReader with better metadata tracking.

### Dubbing

- **Clip overlap prevention**: Added automatic trimming of overlapping clips in dubbing jobs to ensure clean audio tracks for each speaker and language.

### Voice Management

- **Instant Voice Cloning**: Improved preview generation for Instant Voice Cloning v2, making previews available immediately.

### Conversational AI

- **Agent ownership**: Added display of agent creators in the agent list, improving visibility and management of shared agents.

### Web app

- **Dark mode**: Added dark mode to the web app.

### API



- Launched **/v1/speech-to-text** [endpoint](/docs/api-reference/speech-to-text/convert)
- Added `agents.level` property to [Conversational AI agents endpoint](/docs/api-reference/agents/get#response.body.agents.access_level)
- Added `platform_settings` to [Conversational AI agent endpoint](/docs/api-reference/agents/update#request.body.platform_settings)
- Added `expandable` variant to `widget_config`, with configuration options `show_avatar_when_collapsed` and `disable_banner` to [Conversational AI agent widget endpoint](/docs/api-reference/agents/get#response.body.widget)
- Added `webhooks` property and `used_by` to `secrets` to [secrets endpoint](/docs/api-reference/workspace/secrets/list#response.body.secrets.used_by)
- Added `verified_languages` to [voices endpoint](/docs/api-reference/voices/get#response.body.verified_languages)
- Added `speed` property to [voice settings endpoints](/docs/api-reference/voices/get#response.body.settings.speed)
- Added `verified_languages`, `is_added_by_user` to `voices` and `min_notice_period_days` query parameter to [shared voices endpoint](/docs/api-reference/voice-library/get-shared#request.query)
- Added `verified_languages`, `is_added_by_user` to `voices` in [similar voices endpoint](/docs/api-reference/voices/get-similar-library-voices)
- Added `search`, `show_only_owned_documents`, `use_typesense` query parameters to [knowledge base endpoint](/docs/api-reference/knowledge-base/get-knowledge-base-list#request.query.search)
- Added `used_by` to Conversation AI [secrets endpoint](/docs/api-reference/workspace/secrets/list)
- Added `invalidate_affected_text` property to Studio [pronunciation dictionaries endpoint](/docs/api-reference/studio/create-pronunciation-dictionaries#request.body.invalidate_affected_text)




# February 17, 2025

### Conversational AI

- **Tool calling fix**: Fixed an issue where tool calling was not working with agents using gpt-4o mini. This was due to a breaking change in the OpenAI API.
- **Tool calling improvements**: Added support for tool calling with dynamic variables inside objects and arrays.
- **Dynamic variables**: Fixed an issue where dynamic variables of a conversation were not being displayed correctly.

### Voice Isolator

- **Fixed**: Fixed an issue that caused the voice isolator to not work correctly temporarily.

### Workspace

- **Billing**: Improved billing visibility by differentiating rollover, cycle, gifted, and usage-based credits.
- **Usage Analytics**: Improved usage analytics load times and readability.
- **Fine grained fiat billing**: Added support for customizable pricing based on several factors.

### API


- Added `phone_numbers` property to [Agent responses](/docs/api-reference/agents/get)
- Added usage metrics to subscription_extras in [User endpoint](/docs/api-reference/user/get):
  - `unused_characters_rolled_over_from_previous_period`
  - `overused_characters_rolled_over_from_previous_period`
  - `usage` statistics
- Added `enable_conversation_initiation_client_data_from_webhook` to [Agent creation](/docs/api-reference/agents/create)
- Updated [Agent](/docs/api-reference/agents) endpoints with consolidated settings for:
  - `platform_settings`
  - `overrides`
  - `safety`
- Deprecated `with_settings` parameter in [Voice retrieval endpoint](/docs/api-reference/voices/get)



# February 10, 2025

## Conversational AI

- **Updated Pricing**: Updated self-serve pricing for Conversational AI with [reduced cost and a more generous free tier](/docs/conversational-ai/overview#pricing-tiers).
- **Knowledge Base UI**: Created a new page to easily manage your [knowledge base](/app/conversational-ai/knowledge-base).
- **Live calls**: Added number of live calls in progress in the user [dashboard](/app/conversational-ai) and as a new endpoint.
- **Retention**: Added ability to customize transcripts and audio recordings [retention settings](/docs/conversational-ai/customization/privacy/retention).
- **Audio recording**: Added a new option to [disable audio recordings](/docs/conversational-ai/customization/privacy/audio-saving).
- **8k PCM support**: Added support for 8k PCM audio for both input and output.

## Studio

- **GenFM**: Updated the create podcast endpoint to accept [multiple input sources](/docs/api-reference/studio/create-podcast).
- **GenFM**: Fixed an issue where GenFM was creating empty podcasts.

## Enterprise

- **New workspace group endpoints**: Added new endpoints to manage [workspace groups](/docs/api-reference/workspace/search-user-groups).

### API


  
    
    **Studio (formerly Projects)**

    All `/v1/projects/*` endpoints have been deprecated in favor of the new `/v1/studio/projects/*` endpoints. The following endpoints are now deprecated:

    - All operations on `/v1/projects/`
    - All operations related to chapters, snapshots, and content under `/v1/projects/*`

    **Conversational AI**
    - `POST /v1/convai/add-tool` - Use `POST /v1/convai/tools` instead

  

  
    - `DELETE /v1/convai/agents/{agent_id}` - Response type is no longer an object
    - `GET /v1/convai/tools` - Response type changed from array to object with a `tools` property
  

  
    **Conversational AI Updates**
    - `GET /v1/convai/agents/{agent_id}` - Updated conversation configuration and agent properties
    - `PATCH /v1/convai/agents/{agent_id}` - Added `use_tool_ids` parameter for tool management
    - `POST /v1/convai/agents/create` - Added tool integration via `use_tool_ids`

    **Knowledge Base & Tools**
    - `GET /v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties
    - `GET /v1/convai/knowledge-base/{documentation_id}` - Added `name` and `access_level` properties
    - `GET /v1/convai/tools/{tool_id}` - Added `dependent_agents` property
    - `PATCH /v1/convai/tools/{tool_id}` - Added `dependent_agents` property

    **GenFM**
    - `POST /v1/projects/podcast/create` - Added support for multiple input sources

  

  
    **Studio (formerly Projects)**
    
    New endpoints replacing the deprecated `/v1/projects/*` endpoints
    - `GET /v1/studio/projects`: List all projects
    - `POST /v1/studio/projects`: Create a project
    - `GET /v1/studio/projects/{project_id}`: Get project details
    - `DELETE /v1/studio/projects/{project_id}`: Delete a project

    **Knowledge Base Management**
    - `GET /v1/convai/knowledge-base`: List all knowledge base documents
    - `DELETE /v1/convai/knowledge-base/{documentation_id}`: Delete a knowledge base
    - `GET /v1/convai/knowledge-base/{documentation_id}/dependent-agents`: List agents using this knowledge base

    **Workspace Groups** - New enterprise features for team management
    - `GET /v1/workspace/groups/search`: Search workspace groups
    - `POST /v1/workspace/groups/{group_id}/members`: Add members to a group
    - `POST /v1/workspace/groups/{group_id}/members/remove`: Remove members from a group

    **Tools**
    - `POST /v1/convai/tools`: Create new tools for agents

  


## Socials

- **ElevenLabs Developers**: Follow our new developers account on X [@ElevenLabsDevs](https://x.com/intent/user?screen_name=elevenlabsdevs)


# February 4, 2025

### Conversational AI

- **Agent monitoring**: Added a new dashboard for monitoring conversational AI agents' activity. Check out your's [here](/app/conversational-ai).
- **Proactive conversations**: Enhanced capabilities with improved timeout retry logic. [Learn more](/docs/conversational-ai/customization/conversation-flow)
- **Tool calls**: Fixed timeout issues occurring during tool calls
- **Allowlist**: Fixed implementation of allowlist functionality.
- **Content summarization**: Added Gemini as a fallback model to ensure service reliability
- **Widget stability**: Fixed issue with dynamic variables causing the Conversational AI widget to fail

### Reader

- **Trending content**: Added carousel showcasing popular articles and trending content
- **New publications**: Introduced dedicated section for recent ElevenReader Publishing releases

### Studio (formerly Projects)

- **Projects is now Studio** and is now generally available to everyone
- **Chapter content editing**: Added support for editing chapter content through the public API, enabling programmatic updates to chapter text and metadata
- **GenFM public API**: Added public API support for podcast creation through GenFM. Key features include:
  - Conversation mode with configurable host and guest voices
  - URL-based content sourcing
  - Customizable duration and highlights
  - Webhook callbacks for status updates
  - Project snapshot IDs for audio downloads

### SDKs

- **Swift**: fixed an issue where resources were not being released after the end of a session
- **Python**: added uv support
- **Python**: fixed an issue where calls were not ending correctly

### API


- Added POST `v1/workspace/invites/add-bulk` [endpoint](/docs/api-reference/workspace/invite-multiple-users) to enable inviting multiple users simultaneously
- Added POST `v1/projects/podcast/create` [endpoint](/docs/api-reference/studio/create-podcast) for programmatic podcast generation through GenFM
- Added 'v1/convai/knowledge-base/:documentation_id' [endpoints](/docs/api-reference/knowledge-base/) with CRUD operations for Conversational AI
- Added PATCH `v1/projects/:project_id/chapters/:chapter_id` [endpoint](/docs/api-reference/studio/update-chapter) for updating project chapter content and metadata
- Added `group_ids` parameter to [Workspace Invite endpoint](/docs/api-reference/workspace/invite-user) for group-based access control
- Added structured `content` property to [Chapter response objects](/docs/api-reference/studio/get-chapter)
- Added `retention_days` and `delete_transcript_and_pii` data retention parameters to [Agent creation](/docs/api-reference/agents/create)
- Added structured response to [AudioNative content](/docs/api-reference/audio-native/create#response.body.project_id)
- Added `convai_chars_per_minute` usage metric to [User endpoint](/docs/api-reference/user/get)
- Added `media_metadata` field to [Dubbing response objects](/docs/api-reference/dubbing/get)
- Added GDPR-compliant `deletion_settings` to [Conversation responses](/docs/api-reference/conversations/get-conversation#response.body.metadata.deletion_settings)
- Deprecated Knowledge Base legacy endpoints:
  - POST `/v1/convai/agents/{agent_id}/add-to-knowledge-base`
  - GET `/v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}`
- Updated Agent endpoints with consolidated [privacy control parameters](/docs/api-reference/agents/create)



# January 27, 2025

### Docs

- **Shipped our new docs**: we're keen to hear your thoughts, you can reach out by opening an issue on [GitHub](https://github.com/elevenlabs/elevenlabs-docs) or chatting with us on [Discord](https://discord.gg/elevenlabs)

### Conversational AI

- **Dynamic variables**: Available in the dashboard and SDKs. [Learn more](/docs/conversational-ai/customization/personalization/dynamic-variables)
- **Interruption handling**: Now possible to ignore user interruptions in Conversational AI. [Learn more](/docs/conversational-ai/customization/conversation-flow#interruptions)
- **Twilio integration**: Shipped changes to increase audio quality when integrating with Twilio
- **Latency optimization**: Published detailed blog post on latency optimizations. [Read more](/blog/how-do-you-optimize-latency-for-conversational-ai)
- **PCM 8000**: Added support for PCM 8000 to Conversational AI agents
- **Websocket improvements**: Fixed unexpected websocket closures

### Projects

- **Auto-regenerate**: Auto-regeneration now available by default at no extra cost
- **Content management**: Added `updateContent` method for dynamic content updates
- **Audio conversion**: New auto-convert and auto-publish flags for seamless workflows

### API


- Added `Update Project` endpoint for [project editing](/docs/api-reference/studio/edit-project#:~:text=List%20projects-,POST,Update%20project,-GET)
- Added `Update Content` endpoint for [AudioNative content management](/docs/api-reference/audio-native/update-content)
- Deprecated `quality_check_on` parameter in [project operations](/docs/api-reference/studio/add-project#request.body.quality_check_on). It is now enabled for all users at no extra cost 
- Added `apply_text_normalization` parameter to project creation with modes 'auto', 'on', 'apply_english' and 'off' for controlling text normalization during [project creation](/docs/api-reference/studio/add-project#request.body.apply_text_normalization)
- Added alpha feature `auto_assign_voices` in [project creation](/docs/api-reference/studio/add-project#request.body.auto_assign_voices) to automatically assign voices to phrases 
- Added `auto_convert` flag to project creation to automatically convert [projects to audio](/docs/api-reference/audio-native/create#request.body.auto_convert)
- Added support for creating Conversational AI agents with [dynamic variables](/docs/api-reference/agents/create#request.body.conversation_config.agent.dynamic_variables)
- Added `voice_slots_used` to `Subscription` model to track number of custom voices used in a workspace to the `User` [endpoint](/docs/api-reference/user/subscription/get#response.body.voice_slots_used)
- Added `user_id` field to `User` [endpoint](/docs/api-reference/user/get#response.body.user_id)
- Marked legacy AudioNative creation parameters (`image`, `small`, `sessionization`) as deprecated [parameters](/docs/api-reference/audio-native/create#request.body.image)
- Agents platform now supports `call_limits` containing either `agent_concurrency_limit` or `daily_limit` or both parameters to control simultaneous and daily conversation limits for [agents](/docs/api-reference/agents/create#request.body.platform_settings.call_limits)
- Added support for `language_presets` in `conversation_config` to customize language-specific [settings](/docs/api-reference/agents/create#request.body.conversation_config.language_presets)


### SDKs

- **Cross-Runtime Support**: Now compatible with **Bun 1.1.45+** and **Deno 2.1.7+**
- **Regenerated SDKs**: We regenerated our SDKs to be up to date with the latest API spec. Check out the latest [Python SDK release](https://github.com/elevenlabs/elevenlabs-python/releases/tag/1.50.5) and [JS SDK release](https://github.com/elevenlabs/elevenlabs-js/releases/tag/v1.50.4)
- **Dynamic Variables**: Fixed an issue where dynamic variables were not being handled correctly, they are now correctly handled in all SDKs


# January 16, 2025

## Product

### Conversational AI

- **Additional languages**: Add a language dropdown to your widget so customers can launch conversations in their preferred language. Learn more [here](/docs/conversational-ai/customization/language).
- **End call tool**: Let the agent automatically end the call with our new “End Call” tool. Learn more [here](/docs/conversational-ai/customization/tools)
- **Flash default**: Flash, our lowest latency model, is now the default for new agents. In your agent dashboard under “voice”, you can toggle between Turbo and Flash. Learn more about Flash [here](https://elevenlabs.io/blog/meet-flash).
- **Privacy**: Set concurrent call and daily call limits, turn off audio recordings, add feedback collection, and define customer terms & conditions.
- **Increased tool limits**: Increase the number of tools available to your agent from 5 to 15. Learn more [here](/docs/conversational-ai/customization/tools).


# January 2, 2025

## Product

- **Workspace Groups and Permissions**: Introduced new workspace group management features to enhance access control within organizations. [Learn more](https://elevenlabs.io/blog/workspace-groups-and-permissions).


# December 19, 2024

## Model

- **Introducing Flash**: Our fastest text-to-speech model yet, generating speech in just 75ms. Access it via the API with model IDs `eleven_flash_v2` and `eleven_flash_v2_5`. Perfect for low-latency conversational AI applications. [Try it now](https://elevenlabs.io/docs/api-reference/text-to-speech).

## Launches

- **[TalkToSanta.io](https://www.talktosanta.io)**: Experience Conversational AI in action by talking to Santa this holiday season. For every conversation with santa we donate 2 dollars to [Bridging Voice](https://www.bridgingvoice.org) (up to $11,000).

- **[AI Engineer Pack](https://aiengineerpack.com)**: Get $50+ in credits from leading AI developer tools, including ElevenLabs.


# December 6, 2024

## Product

- **GenFM Now on Web**: Access GenFM directly from the website in addition to the ElevenReader App, [try it now](https://elevenlabs.io/app/projects).


# December 3, 2024

## API

- **Credit Usage Limits**: Set specific credit limits for API keys to control costs and manage usage across different use cases by setting "Access" or "No Access" to features like Dubbing, Audio Native, and more. [Check it out](https://elevenlabs.io/app/settings/api-keys)
- **Workspace API Keys**: Now support access permissions, such as "Read" or "Read and Write" for User, Workspace, and History resources.
- **Improved Key Management**:
  - Redesigned interface moving from modals to dedicated pages
  - Added detailed descriptions and key information
  - Enhanced visibility of key details and settings


# November 29, 2024

## Product

- **GenFM**: Launched in the ElevenReader app. [Learn more](https://elevenlabs.io/blog/genfm-on-elevenreader)
- **Conversational AI**: Now generally available to all customers. [Try it now](https://elevenlabs.io/conversational-ai)
- **TTS Redesign**: The website TTS redesign is now rolled out to all customers.
- **Auto-regenerate**: Now available in Projects. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects)
- **Reader Platform Improvements**:

  - Improved content sharing with enhanced landing pages and social media previews.
  - Added podcast rating system and improved voice synchronization.

- **Projects revamp**:
  - Restore past generations, lock content, assign speakers to sentence fragments, and QC at 2x speed. [Learn more](https://elevenlabs.io/blog/narrate-any-project)
  - Auto-regeneration identifies mispronunciations and regenerates audio at no extra cost. [Learn more](https://elevenlabs.io/blog/auto-regenerate-is-live-in-projects)

## API

- **Conversational AI**: [SDKs and APIs](https://elevenlabs.io/docs/conversational-ai/docs/introduction) now available.


# October 27, 2024

## API

- **u-law Audio Formats**: Added u-law audio formats to the Convai API for integrations with Twilio.
- **TTS Websocket Improvements**: TTS websocket improvements, flushes and generation work more intuitively now.
- **TTS Websocket Auto Mode**: A streamlined mode for using websockets. This setting reduces latency by disabling chunk scheduling and buffers. Note: Using partial sentences will result in significantly reduced quality.
- **Improvements to latency consistency**: Improvements to latency consistency for all models.

## Website

- **TTS Redesign**: The website TTS redesign is now in alpha!


# October 20, 2024

## API

- **Normalize Text with the API**: Added the option normalize the input text in the TTS API. The new parameter is called `apply_text_normalization` and works on all non-turbo & non-flash models.

## Product

- **Voice Design**: The Voice Design feature is now in beta!


# October 13, 2024

## Model

- **Stability Improvements**: Significant audio stability improvements across all models, most noticeable on `turbo_v2` and `turbo_v2.5`, when using:
  - Websockets
  - Projects
  - Reader app
  - TTS with request stitching
  - ConvAI
- **Latency Improvements**: Reduced time to first byte latency by approximately 20-30ms for all models.

## API

- **Remove Background Noise Voice Samples**: Added the ability to remove background noise from voice samples using our audio isolation model to improve quality for IVCs and PVCs at no additional cost.
- **Remove Background Noise STS Input**: Added the ability to remove background noise from STS audio input using our audio isolation model to improve quality at no additional cost.

## Feature

- **Conversational AI Beta**: Conversational AI is now in beta.


# Text to Speech

> Learn how to turn text into lifelike spoken audio with ElevenLabs.

## Overview

ElevenLabs [Text to Speech (TTS)](/docs/api-reference/text-to-speech) API turns text into lifelike audio with nuanced intonation, pacing and emotional awareness. [Our models](/docs/models) adapt to textual cues across 32 languages and multiple voice styles and can be used to:

* Narrate global media campaigns & ads
* Produce audiobooks in multiple languages with complex emotional delivery
* Stream real-time audio from text

Listen to a sample:



Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project.


  
    Learn how to integrate text to speech into your application.
  

  
    Step-by-step guide for using text to speech in ElevenLabs.
  


### Voice quality

For real-time applications, Flash v2.5 provides ultra-low 75ms latency, while Multilingual v2 delivers the highest quality audio with more nuanced expression.


  Eleven v3
} href="/docs/models#eleven-v3-alpha">
    Our most emotionally rich, expressive speech synthesis model

    
      
        Dramatic delivery and performance
      

      
        70+ languages supported
      

      
        10,000 character limit
      

      
        Support for natural multi-speaker dialogue
      
    
  

  
    Lifelike, consistent quality speech synthesis model

    
      
        Natural-sounding output
      

      
        29 languages supported
      

      
        10,000 character limit
      

      
        Most stable on long-form generations
      
    
  

  
    Our fast, affordable speech synthesis model

    
      
        Ultra-low latency (~75ms†)
      

      
        32 languages supported
      

      
        40,000 character limit
      

      
        Faster model, 50% lower price per character
      
    
  

  
    High quality, low-latency model with a good balance of quality and speed

    
      
        High quality voice generation
      

      
        32 languages supported
      

      
        40,000 character limit
      

      
        Low latency (~250ms-300ms†), 50% lower price per character
      
    
  



  
    [Explore all](/docs/models)
  


### Voice options

ElevenLabs offers thousands of voices across 32 languages through multiple creation methods:

* [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices
* [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas
* [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication
* [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions

Learn more about our [voice options](/docs/capabilities/voices).

### Supported formats

The default response format is "mp3", but other formats like "PCM", & "μ-law" are available.

* **MP3**
  * Sample rates: 22.05kHz - 44.1kHz
  * Bitrates: 32kbps - 192kbps
  * 22.05kHz @ 32kbps
  * 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps
* **PCM (S16LE)**
  * Sample rates: 16kHz - 44.1kHz
  * Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz
  * 16-bit depth
* **μ-law**
  * 8kHz sample rate
  * Optimized for telephony applications
* **A-law**
  * 8kHz sample rate
  * Optimized for telephony applications
* **Opus**
  * Sample rate: 48kHz
  * Bitrates: 32kbps - 192kbps


  Higher quality audio options are only available on paid tiers - see our [pricing
  page](https://elevenlabs.io/pricing/api) for details.


### Supported languages

Our v2 models support 29 languages:

*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*

Flash v2.5 supports 32 languages - all languages from v2 models plus:

*Hungarian, Norwegian & Vietnamese*

Simply input text in any of our supported languages and select a matching voice from our [voice library](https://elevenlabs.io/community). For the most natural results, choose a voice with an accent that matches your target language and region.

### Prompting

The models interpret emotional context directly from the text input. For example, adding
descriptive text like "she said excitedly" or using exclamation marks will influence the speech
emotion. Voice settings like Stability and Similarity help control the consistency, while the
underlying emotion comes from textual cues.

Read the [prompting guide](/docs/best-practices/prompting) for more details.


  Descriptive text will be spoken out by the model and must be manually trimmed or removed from the
  audio if desired.


## FAQ


  
    Yes, you can create [instant voice clones](/docs/capabilities/voices#cloned) of your own voice
    from short audio clips. For high-fidelity clones, check out our [professional voice
    cloning](/docs/capabilities/voices#cloned) feature.
  

  
    Yes. You retain ownership of any audio you generate. However, commercial usage rights are only
    available with paid plans. With a paid subscription, you may use generated audio for commercial
    purposes and monetize the outputs if you own the IP rights to the input content.
  

  
    A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions:

    * You can regenerate each piece of content up to 2 times for free
    * The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation

    Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data.
  

  
    Use the low-latency Flash [models](/docs/models) (Flash v2 or v2.5) optimized for near real-time
    conversational or interactive scenarios. See our [latency optimization
    guide](/docs/best-practices/latency-optimization) for more details.
  

  
    The models are nondeterministic. For consistency, use the optional [seed
    parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle
    differences may still occur.
  

  
    Split long text into segments and use streaming for real-time playback and efficient processing.
    To maintain natural prosody flow between chunks, include [previous/next text or previous/next
    request id parameters](/docs/api-reference/text-to-speech/convert#request.body.previous_text).
  



# Speech to Text

> Learn how to turn spoken audio into text with ElevenLabs.

## Overview

The ElevenLabs [Speech to Text (STT)](/docs/api-reference/speech-to-text) API turns spoken audio into text with state of the art accuracy. Our Scribe v1 [model](/docs/models) adapts to textual cues across 99 languages and multiple voice styles and can be used to:

* Transcribe podcasts, interviews, and other audio or video content
* Generate transcripts for meetings and other audio or video recordings


  
    Learn how to integrate speech to text into your application.
  

  
    Step-by-step guide for using speech to text in ElevenLabs.
  



  Companies requiring HIPAA compliance must contact [ElevenLabs
  Sales](https://elevenlabs.io/contact-sales) to sign a Business Associate Agreement (BAA)
  agreement. Please ensure this step is completed before proceeding with any HIPAA-related
  integrations or deployments.


## State of the art accuracy

The Scribe v1 model is capable of transcribing audio from up to 32 speakers with high accuracy. Optionally it can also transcribe audio events like laughter, applause, and other non-speech sounds.

The transcribed output supports exact timestamps for each word and audio event, plus diarization to identify the speaker for each word.

The Scribe v1 model is best used for when high-accuracy transcription is required rather than real-time transcription. A low-latency, real-time version will be released soon.

## Pricing


  
    | Tier     | Price/month | Hours included      | Price per included hour | Price per additional hour |
    | -------- | ----------- | ------------------- | ----------------------- | ------------------------- |
    | Free     | \$0         | Unavailable         | Unavailable             | Unavailable               |
    | Starter  | \$5         | 12 hours 30 minutes | \$0.40                  | Unavailable               |
    | Creator  | \$22        | 62 hours 51 minutes | \$0.35                  | \$0.48                    |
    | Pro      | \$99        | 300 hours           | \$0.33                  | \$0.40                    |
    | Scale    | \$330       | 1,100 hours         | \$0.30                  | \$0.33                    |
    | Business | \$1,320     | 6,000 hours         | \$0.22                  | \$0.22                    |
  

  
    | Tier     | Price/month | Hours included  | Price per included hour |
    | -------- | ----------- | --------------- | ----------------------- |
    | Free     | \$0         | 12 minutes      | Unavailable             |
    | Starter  | \$5         | 1 hour          | \$5                     |
    | Creator  | \$22        | 4 hours 53 min  | \$4.5                   |
    | Pro      | \$99        | 24 hours 45 min | \$4                     |
    | Scale    | \$330       | 94 hours 17 min | \$3.5                   |
    | Business | \$1,320     | 440 hours       | \$3                     |
  



  For reduced pricing at higher scale than 6,000 hours/month in addition to custom MSAs and DPAs,
  please [contact sales](https://elevenlabs.io/contact-sales).

  **Note: The free tier requires attribution and does not have commercial licensing.**


Scribe has higher concurrency limits than other services from ElevenLabs.
Please see other concurrency limits [here](/docs/models#concurrency-and-priority)

| Plan       | STT Concurrency Limit |
| ---------- | --------------------- |
| Free       | 10                    |
| Starter    | 15                    |
| Creator    | 25                    |
| Pro        | 50                    |
| Scale      | 75                    |
| Business   | 75                    |
| Enterprise | Elevated              |

## Examples

The following example shows the output of the Scribe v1 model for a sample audio file.



```javascript
{
  "language_code": "en",
  "language_probability": 1,
  "text": "With a soft and whispery American accent, I'm the ideal choice for creating ASMR content, meditative guides, or adding an intimate feel to your narrative projects.",
  "words": [
    {
      "text": "With",
      "start": 0.119,
      "end": 0.259,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 0.239,
      "end": 0.299,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "a",
      "start": 0.279,
      "end": 0.359,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 0.339,
      "end": 0.499,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "soft",
      "start": 0.479,
      "end": 1.039,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.019,
      "end": 1.2,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "and",
      "start": 1.18,
      "end": 1.359,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.339,
      "end": 1.44,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "whispery",
      "start": 1.419,
      "end": 1.979,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 1.959,
      "end": 2.179,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "American",
      "start": 2.159,
      "end": 2.719,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 2.699,
      "end": 2.779,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "accent,",
      "start": 2.759,
      "end": 3.389,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 4.119,
      "end": 4.179,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "I'm",
      "start": 4.159,
      "end": 4.459,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 4.44,
      "end": 4.52,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "the",
      "start": 4.5,
      "end": 4.599,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 4.579,
      "end": 4.699,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "ideal",
      "start": 4.679,
      "end": 5.099,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 5.079,
      "end": 5.219,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "choice",
      "start": 5.199,
      "end": 5.719,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 5.699,
      "end": 6.099,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "for",
      "start": 6.099,
      "end": 6.199,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 6.179,
      "end": 6.279,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "creating",
      "start": 6.259,
      "end": 6.799,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 6.779,
      "end": 6.979,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "ASMR",
      "start": 6.959,
      "end": 7.739,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 7.719,
      "end": 7.859,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "content,",
      "start": 7.839,
      "end": 8.45,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 9,
      "end": 9.06,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "meditative",
      "start": 9.04,
      "end": 9.64,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 9.619,
      "end": 9.699,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "guides,",
      "start": 9.679,
      "end": 10.359,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 10.359,
      "end": 10.409,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "or",
      "start": 11.319,
      "end": 11.439,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 11.42,
      "end": 11.52,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "adding",
      "start": 11.5,
      "end": 11.879,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 11.859,
      "end": 12,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "an",
      "start": 11.979,
      "end": 12.079,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 12.059,
      "end": 12.179,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "intimate",
      "start": 12.179,
      "end": 12.579,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 12.559,
      "end": 12.699,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "feel",
      "start": 12.679,
      "end": 13.159,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 13.139,
      "end": 13.179,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "to",
      "start": 13.159,
      "end": 13.26,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 13.239,
      "end": 13.3,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "your",
      "start": 13.299,
      "end": 13.399,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 13.379,
      "end": 13.479,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "narrative",
      "start": 13.479,
      "end": 13.889,
      "type": "word",
      "speaker_id": "speaker_0"
    },
    {
      "text": " ",
      "start": 13.919,
      "end": 13.939,
      "type": "spacing",
      "speaker_id": "speaker_0"
    },
    {
      "text": "projects.",
      "start": 13.919,
      "end": 14.779,
      "type": "word",
      "speaker_id": "speaker_0"
    }
  ]
}
```

The output is classified in three category types:

* `word` - A word in the language of the audio
* `spacing` - The space between words, not applicable for languages that don't use spaces like Japanese, Mandarin, Thai, Lao, Burmese and Cantonese
* `audio_event` - Non-speech sounds like laughter or applause

## Models


  
    State-of-the-art speech recognition model

    
      
        Accurate transcription in 99 languages
      

      
        Precise word-level timestamps
      

      
        Speaker diarization
      

      
        Dynamic audio tagging
      
    
  



  
    [Explore all](/docs/models)
  


## Concurrency and priority

Concurrency is the concept of how many requests can be processed at the same time.

For Speech to Text, files that are over 8 minutes long are transcribed in parallel internally in order to speed up processing. The audio is chunked into four segments to be transcribed concurrently.

You can calculate the concurrency limit with the following calculation:

$$
Concurrency = \min(4, \text{round\_up}(\frac{\text{audio\_duration\_secs}}{480}))
$$

For example, a 15 minute audio file will be transcribed with a concurrency of 2, while a 120 minute audio file will be transcribed with a concurrency of 4.

## Supported languages

The Scribe v1 model supports 99 languages, including:

*Afrikaans (afr), Amharic (amh), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Burmese (mya), Cantonese (yue), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Fulah (ful), Galician (glg), Ganda (lug), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Igbo (ibo), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kabuverdianu (kea), Kannada (kan), Kazakh (kaz), Khmer (khm), Korean (kor), Kurdish (kur), Kyrgyz (kir), Lao (lao), Latvian (lav), Lingala (lin), Lithuanian (lit), Luo (luo), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Maltese (mlt), Mandarin Chinese (zho), Māori (mri), Marathi (mar), Mongolian (mon), Nepali (nep), Northern Sotho (nso), Norwegian (nor), Occitan (oci), Odia (ori), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Shona (sna), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Tajik (tgk), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Umbundu (umb), Urdu (urd), Uzbek (uzb), Vietnamese (vie), Welsh (cym), Wolof (wol), Xhosa (xho) and Zulu (zul).*

### Breakdown of language support

Word Error Rate (WER) is a key metric used to evaluate the accuracy of transcription systems. It measures how many errors are present in a transcript compared to a reference transcript. Below is a breakdown of the WER for each language that Scribe v1 supports.


  
    Bulgarian (bul), Catalan (cat), Czech (ces), Danish (dan), Dutch (nld), English (eng), Finnish
    (fin), French (fra), Galician (glg), German (deu), Greek (ell), Hindi (hin), Indonesian (ind),
    Italian (ita), Japanese (jpn), Kannada (kan), Malay (msa), Malayalam (mal), Macedonian (mkd),
    Norwegian (nor), Polish (pol), Portuguese (por), Romanian (ron), Russian (rus), Serbian (srp),
    Slovak (slk), Spanish (spa), Swedish (swe), Turkish (tur), Ukrainian (ukr) and Vietnamese (vie).
  

  
    Bengali (ben), Belarusian (bel), Bosnian (bos), Cantonese (yue), Estonian (est), Filipino (fil),
    Gujarati (guj), Hungarian (hun), Kazakh (kaz), Latvian (lav), Lithuanian (lit), Mandarin (cmn),
    Marathi (mar), Nepali (nep), Odia (ori), Persian (fas), Slovenian (slv), Tamil (tam) and Telugu
    (tel)
  

  
    Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Asturian (ast), Azerbaijani
    (aze), Burmese (mya), Cebuano (ceb), Croatian (hrv), Georgian (kat), Hausa (hau), Hebrew (heb),
    Icelandic (isl), Javanese (jav), Kabuverdianu (kea), Korean (kor), Kyrgyz (kir), Lingala (lin),
    Maltese (mlt), Mongolian (mon), Māori (mri), Occitan (oci), Punjabi (pan), Sindhi (snd), Swahili
    (swa), Tajik (tgk), Thai (tha), Urdu (urd), Uzbek (uzb) and Welsh (cym).
  

  
    Amharic (amh), Chichewa (nya), Fulah (ful), Ganda (lug), Igbo (ibo), Irish (gle), Khmer (khm),
    Kurdish (kur), Lao (lao), Luxembourgish (ltz), Luo (luo), Northern Sotho (nso), Pashto (pus),
    Shona (sna), Somali (som), Umbundu (umb), Wolof (wol), Xhosa (xho) and Zulu (zul).
  


## FAQ


  
    Yes, the API supports uploading both audio and video files for transcription.
  

  
    Files up to 1 GB in size and up to 4.5 hours in duration are supported.
  

  
    The audio supported audio formats include:

    * audio/aac
    * audio/x-aac
    * audio/x-aiff
    * audio/ogg
    * audio/mpeg
    * audio/mp3
    * audio/mpeg3
    * audio/x-mpeg-3
    * audio/opus
    * audio/wav
    * audio/x-wav
    * audio/webm
    * audio/flac
    * audio/x-flac
    * audio/mp4
    * audio/aiff
    * audio/x-m4a

    Supported video formats include:

    * video/mp4
    * video/x-msvideo
    * video/x-matroska
    * video/quicktime
    * video/x-ms-wmv
    * video/x-flv
    * video/webm
    * video/mpeg
    * video/3gpp
  

  
    ElevenLabs is constantly expanding the number of languages supported by our models. Please check back frequently for updates.
  

  
    Yes, asynchronous transcription results can be sent to webhooks configured in webhook settings in the UI. Learn more in the [webhooks cookbook](/docs/cookbooks/speech-to-text/webhooks).
  



# Text to Dialogue

> Learn how to create immersive, natural-sounding dialogue with ElevenLabs.


  Eleven v3 API access is currently not publicly available, but will be soon. To request access,
  please [contact our sales team](https://elevenlabs.io/contact-sales).


## Overview

The ElevenLabs [Text to Dialogue](/docs/api-reference/text-to-dialogue) API creates natural sounding expressive dialogue from text using the Eleven v3 model. Popular use cases include:

* Generating pitch perfect conversations for video games
* Creating immersive dialogue for podcasts and other audio content
* Bring audiobooks to life with expressive narration

Text to Dialogue is not intended for use in real-time applications like Conversational AI. Several generations might be required to achieve the desired results. When integrating Text to Dialogue into your application, consider generating several generations and allowing the user to select the best one.

Listen to a sample:




  
    Learn how to integrate text to dialogue into your application.
  

  
    Step-by-step guide for using text to dialogue in ElevenLabs.
  


## Voice options

ElevenLabs offers thousands of voices across 70+ languages through multiple creation methods:

* [Voice library](/docs/capabilities/voices) with 3,000+ community-shared voices
* [Professional voice cloning](/docs/capabilities/voices#cloned) for highest-fidelity replicas
* [Instant voice cloning](/docs/capabilities/voices#cloned) for quick voice replication
* [Voice design](/docs/capabilities/voices#voice-design) to generate custom voices from text descriptions

Learn more about our [voice options](/docs/capabilities/voices).

## Prompting

The models interpret emotional context directly from the text input. For example, adding
descriptive text like "she said excitedly" or using exclamation marks will influence the speech
emotion. Voice settings like Stability and Similarity help control the consistency, while the
underlying emotion comes from textual cues.

Read the [prompting guide](/docs/best-practices/prompting) for more details.

### Emotional deliveries with audio tags


  This feature is still under active development, actual results may vary.


The Eleven v3 model allows the use of non-speech audio events to influence the delivery of the dialogue. This is done by inserting the audio events into the text input wrapped in square brackets.

Audio tags come in a few different forms:

### Emotions and delivery

For example, \[sad], \[laughing] and \[whispering]

### Audio events

For example, \[leaves rustling], \[gentle footsteps] and \[applause].

### Overall direction

For example, \[football], \[wrestling match] and \[auctioneer].

Some examples include:

```
"[giggling] That's really funny!"
"[groaning] That was awful."
"Well, [sigh] I'm not sure what to say."
```



You can also use punctuation to indicate the flow of dialog, like interruptions:

```
"[cautiously] Hello, is this seat-"
"[jumping in] Free? [cheerfully] Yes it is."
```



Ellipses can be used to indicate trailing sentences:

```
"[indecisive] Hi, can I get uhhh..."
"[quizzically] The usual?"
"[elated] Yes! [laughs] I'm so glad you knew!"
```



## Supported formats

The default response format is "mp3", but other formats like "PCM", & "μ-law" are available.

* **MP3**
  * Sample rates: 22.05kHz - 44.1kHz
  * Bitrates: 32kbps - 192kbps
  * 22.05kHz @ 32kbps
  * 44.1kHz @ 32kbps, 64kbps, 96kbps, 128kbps, 192kbps
* **PCM (S16LE)**
  * Sample rates: 16kHz - 44.1kHz
  * Bitrates: 8kHz, 16kHz, 22.05kHz, 24kHz, 44.1kHz, 48kHz
  * 16-bit depth
* **μ-law**
  * 8kHz sample rate
  * Optimized for telephony applications
* **A-law**
  * 8kHz sample rate
  * Optimized for telephony applications
* **Opus**
  * Sample rate: 48kHz
  * Bitrates: 32kbps - 192kbps


  Higher quality audio options are only available on paid tiers - see our [pricing
  page](https://elevenlabs.io/pricing/api) for details.


## Supported languages

The Eleven v3 model supports 70+ languages, including:

*Afrikaans (afr), Arabic (ara), Armenian (hye), Assamese (asm), Azerbaijani (aze), Belarusian (bel), Bengali (ben), Bosnian (bos), Bulgarian (bul), Catalan (cat), Cebuano (ceb), Chichewa (nya), Croatian (hrv), Czech (ces), Danish (dan), Dutch (nld), English (eng), Estonian (est), Filipino (fil), Finnish (fin), French (fra), Galician (glg), Georgian (kat), German (deu), Greek (ell), Gujarati (guj), Hausa (hau), Hebrew (heb), Hindi (hin), Hungarian (hun), Icelandic (isl), Indonesian (ind), Irish (gle), Italian (ita), Japanese (jpn), Javanese (jav), Kannada (kan), Kazakh (kaz), Kirghiz (kir), Korean (kor), Latvian (lav), Lingala (lin), Lithuanian (lit), Luxembourgish (ltz), Macedonian (mkd), Malay (msa), Malayalam (mal), Mandarin Chinese (cmn), Marathi (mar), Nepali (nep), Norwegian (nor), Pashto (pus), Persian (fas), Polish (pol), Portuguese (por), Punjabi (pan), Romanian (ron), Russian (rus), Serbian (srp), Sindhi (snd), Slovak (slk), Slovenian (slv), Somali (som), Spanish (spa), Swahili (swa), Swedish (swe), Tamil (tam), Telugu (tel), Thai (tha), Turkish (tur), Ukrainian (ukr), Urdu (urd), Vietnamese (vie), Welsh (cym).*

## FAQ


  
    Text to Dialogue is only available on the Eleven v3 model.
  

  
    Yes. You retain ownership of any audio you generate. However, commercial usage rights are only
    available with paid plans. With a paid subscription, you may use generated audio for commercial
    purposes and monetize the outputs if you own the IP rights to the input content.
  

  
    A free regeneration allows you to regenerate the same text to speech content without additional cost, subject to these conditions:

    * Only available within the ElevenLabs dashboard.
    * You can regenerate each piece of content up to 2 times for free.
    * The content must be exactly the same as the previous generation. Any changes to the text, voice settings, or other parameters will require a new, paid generation.

    Free regenerations are useful in case there is a slight distortion in the audio output. According to ElevenLabs' internal benchmarks, regenerations will solve roughly half of issues with quality, with remaining issues usually due to poor training data.
  

  
    There is no limit to the number of speakers in a dialogue.
  

  
    The models are nondeterministic. For consistency, use the optional [seed
    parameter](/docs/api-reference/text-to-speech/convert#request.body.seed), though subtle
    differences may still occur.
  

  
    Split long text into segments and use streaming for real-time playback and efficient processing.
  



# Voice changer

> Learn how to transform audio between voices while preserving emotion and delivery.

## Overview

ElevenLabs [voice changer](/docs/api-reference/speech-to-speech/convert) API lets you transform any source audio (recorded or uploaded) into a different, fully cloned voice without losing the performance nuances of the original. It’s capable of capturing whispers, laughs, cries, accents, and subtle emotional cues to achieve a highly realistic, human feel and can be used to:

* Change any voice while preserving emotional delivery and nuance
* Create consistent character voices across multiple languages and recording sessions
* Fix or replace specific words and phrases in existing recordings


  


Explore our [voice library](https://elevenlabs.io/community) to find the perfect voice for your project.


  
    Learn how to integrate voice changer into your application.
  

  
    Step-by-step guide for using voice changer in ElevenLabs.
  


## Supported languages

Our v2 models support 29 languages:

*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*

The `eleven_english_sts_v2` model only supports English.

## Best practices

### Audio quality

* Record in a quiet environment to minimize background noise
* Maintain appropriate microphone levels - avoid too quiet or peaked audio
* Use `remove_background_noise=true` if environmental sounds are present

### Recording guidelines

* Keep segments under 5 minutes for optimal processing
* Feel free to include natural expressions (laughs, sighs, emotions)
* The source audio's accent and language will be preserved in the output

### Parameters

* **Style**: Set to 0% when input audio is already expressive
* **Stability**: Use 100% for maximum voice consistency
* **Language**: Choose source audio that matches your desired accent and language

## FAQ


  
    Yes, but you must split it into smaller chunks (each under 5 minutes). This helps ensure stability
    and consistent output.
  

  
    Absolutely. Provide your custom voice’s voice\_id and specify the correct{' '}
    model\_id.
  

  
    You’re charged at 1000 characters’ worth of usage per minute of processed audio. There’s no
    additional fee based on file size.
  

  
    Possibly. Use remove\_background\_noise=true or the Voice Isolator tool to minimize
    environmental sounds in the final output.
  

  
    Though eleven\_english\_sts\_v2 is available, our{' '}
    eleven\_multilingual\_sts\_v2 model often outperforms it, even for English material.
  

  
    “Style” adds interpretative flair; “stability” enforces consistency. For high-energy performances
    in the source audio, turn style down and stability up.
  



# Voice isolator

> Learn how to isolate speech from background noise, music, and ambient sounds from any audio.

## Overview

ElevenLabs [voice isolator](/docs/api-reference/audio-isolation/audio-isolation) API transforms audio recordings with background noise into clean, studio-quality speech. This is particularly useful for audio recorded in noisy environments, or recordings containing unwanted ambient sounds, music, or other background interference.

Listen to a sample:


  

  


## Usage

The voice isolator model extracts speech from background noise in both audio and video files.


  
    Learn how to integrate voice isolator into your application.
  

  
    Step-by-step guide for using voice isolator in ElevenLabs.
  


### Supported file types

* **Audio**: AAC, AIFF, OGG, MP3, OPUS, WAV, FLAC, M4A
* **Video**: MP4, AVI, MKV, MOV, WMV, FLV, WEBM, MPEG, 3GPP

## FAQ

* **Cost**: Voice isolator costs 1000 characters for every minute of audio.
* **File size and length**: Supports files up to 500MB and 1 hour in length.
* **Music vocals**: Not specifically optimized for isolating vocals from music, but may work depending on the content.


# Dubbing

> Learn how to translate audio and video while preserving the emotion, timing & tone of speakers.

## Overview

ElevenLabs [dubbing](/docs/api-reference/dubbing/create) API translates audio and video across 32 languages while preserving the emotion, timing, tone and unique characteristics of each speaker. Our model separates each speaker’s dialogue from the soundtrack, allowing you to recreate the original delivery in another language. It can be used to:

* Grow your addressable audience by 4x to reach international audiences
* Adapt existing material for new markets while preserving emotional nuance
* Offer content in multiple languages without re-recording voice talent


  


We also offer a [fully managed dubbing service](https://elevenlabs.io/elevenstudios) for video and podcast creators.

## Usage

ElevenLabs dubbing can be used in three ways:

* **Dubbing Studio** in the user interface for fast, interactive control and editing
* **Programmatic integration** via our [API](/docs/api-reference/dubbing/create) for large-scale or automated workflows
* **Human-verified dubs via ElevenLabs Productions** - for more information, please reach out to [productions@elevenlabs.io](mailto:productions@elevenlabs.io)

The UI supports files up to **500MB** and **45 minutes**. The API supports files up to **1GB** and **2.5 hours**.


  
    Learn how to integrate dubbing into your application.
  

  
    Edit transcripts and translate videos step by step in Dubbing Studio.
  


### Key features

**Speaker separation**
Automatically detect multiple speakers, even with overlapping speech.

**Multi-language output**
Generate localized tracks in 32 languages.

**Preserve original voices**
Retain the speaker’s identity and emotional tone.

**Keep background audio**
Avoid re-mixing music, effects, or ambient sounds.

**Customizable transcripts**
Manually edit translations and transcripts as needed.

**Supported file types**
Videos and audio can be dubbed from various sources, including YouTube, X, TikTok, Vimeo, direct URLs, or file uploads.

**Video transcript and translation editing**
Our AI video translator lets you manually edit transcripts and translations to ensure your content is properly synced and localized. Adjust the voice settings to tune delivery, and regenerate speech segments until the output sounds just right.


  A Creator plan or higher is required to dub audio files. For videos, a watermark option is
  available to reduce credit usage.


### Cost

To reduce credit usage, you can:

* Dub only a selected portion of your file
* Use watermarks on video output (not available for audio)
* Fine-tune transcripts and regenerate individual segments instead of the entire clip

Refer to our [pricing page](https://elevenlabs.io/pricing) for detailed credit costs.

## List of supported languages for dubbing

| No | Language Name | Language Code |
| -- | ------------- | ------------- |
| 1  | English       | en            |
| 2  | Hindi         | hi            |
| 3  | Portuguese    | pt            |
| 4  | Chinese       | zh            |
| 5  | Spanish       | es            |
| 6  | French        | fr            |
| 7  | German        | de            |
| 8  | Japanese      | ja            |
| 9  | Arabic        | ar            |
| 10 | Russian       | ru            |
| 11 | Korean        | ko            |
| 12 | Indonesian    | id            |
| 13 | Italian       | it            |
| 14 | Dutch         | nl            |
| 15 | Turkish       | tr            |
| 16 | Polish        | pl            |
| 17 | Swedish       | sv            |
| 18 | Filipino      | fil           |
| 19 | Malay         | ms            |
| 20 | Romanian      | ro            |
| 21 | Ukrainian     | uk            |
| 22 | Greek         | el            |
| 23 | Czech         | cs            |
| 24 | Danish        | da            |
| 25 | Finnish       | fi            |
| 26 | Bulgarian     | bg            |
| 27 | Croatian      | hr            |
| 28 | Slovak        | sk            |
| 29 | Tamil         | ta            |

## FAQ


  
    Dubbing can be performed on all types of short and long form video and audio content. We
    recommend dubbing content with a maximum of 9 unique speakers at a time to ensure a high-quality
    dub.
  

  
    Yes. Our models analyze each speaker’s original delivery to recreate the same tone, pace, and
    style in your target language.
  

  
    We use advanced source separation to isolate individual voices from ambient sound. Multiple
    overlapping speakers can be split into separate tracks.
  

  
    Via the user interface, the maximum file size is 500MB up to 45 minutes. Through the API, you
    can process files up to 1GB and 2.5 hours.
  

  
    You can choose to dub only certain portions of your video/audio or tweak translations/voices in
    our interactive Dubbing Studio.
  



# Sound effects

> Learn how to create high-quality sound effects from text with ElevenLabs.

## Overview

ElevenLabs [sound effects](/docs/api-reference/text-to-sound-effects/convert) API turns text descriptions into high-quality audio effects with precise control over timing, style and complexity. The model understands both natural language and audio terminology, enabling you to:

* Generate cinematic sound design for films & trailers
* Create custom sound effects for games & interactive media
* Produce Foley and ambient sounds for video content

Listen to an example:



## Usage

Sound effects are generated using text descriptions & two optional parameters:

* **Duration**: Set a specific length for the generated audio (in seconds)

  * Default: Automatically determined based on the prompt
  * Range: 0.1 to 22 seconds
  * Cost: 40 credits per second when duration is specified

* **Prompt influence**: Control how strictly the model follows the prompt

  * High: More literal interpretation of the prompt
  * Low: More creative interpretation with added variations


  
    Learn how to integrate sound effects into your application.
  

  
    Step-by-step guide for using sound effects in ElevenLabs.
  


### Prompting guide

#### Simple effects

For basic sound effects, use clear, concise descriptions:

* "Glass shattering on concrete"
* "Heavy wooden door creaking open"
* "Thunder rumbling in the distance"



#### Complex sequences

For multi-part sound effects, describe the sequence of events:

* "Footsteps on gravel, then a metallic door opens"
* "Wind whistling through trees, followed by leaves rustling"
* "Sword being drawn, then clashing with another blade"



#### Musical elements

The API also supports generation of musical components:

* "90s hip-hop drum loop, 90 BPM"
* "Vintage brass stabs in F minor"
* "Atmospheric synth pad with subtle modulation"



#### Audio Terminology

Common terms that can enhance your prompts:

* **Impact**: Collision or contact sounds between objects, from subtle taps to dramatic crashes
* **Whoosh**: Movement through air effects, ranging from fast and ghostly to slow-spinning or rhythmic
* **Ambience**: Background environmental sounds that establish atmosphere and space
* **One-shot**: Single, non-repeating sound
* **Loop**: Repeating audio segment
* **Stem**: Isolated audio component
* **Braam**: Big, brassy cinematic hit that signals epic or dramatic moments, common in trailers
* **Glitch**: Sounds of malfunction, jittering, or erratic movement, useful for transitions and sci-fi
* **Drone**: Continuous, textured sound that creates atmosphere and suspense

## FAQ


  
    The maximum duration is 22 seconds per generation. For longer sequences, generate multiple
    effects and combine them.
  

  
    Yes, you can generate musical elements like drum loops, bass lines, and melodic samples.
    However, for full music production, consider combining multiple generated elements.
  

  
    Use detailed prompts, appropriate duration settings, and high prompt influence for more
    predictable results. For complex sounds, generate components separately and combine them.
  

  
    Generated audio is provided in MP3 format with professional-grade quality (44.1kHz,
    128-192kbps).
  



# Voices

> Learn how to create, customize, and manage voices with ElevenLabs.

## Overview

ElevenLabs provides models for voice creation & customization. The platform supports a wide range of voice options, including voices from our extensive [voice library](https://elevenlabs.io/app/voice-library), voice cloning, and artificially designed voices using text prompts.

### Voice categories

* **Community**: Voices shared by the community from the ElevenLabs [voice library](/docs/product-guides/voices/voice-library).
* **Cloned**: Custom voices created using instant or professional [voice cloning](/docs/product-guides/voices/voice-cloning).
* **Voice design**: Artificially designed voices created with the [voice design](/docs/product-guides/voices/voice-design) tool.
* **Default**: Pre-designed, high-quality voices optimized for general use.

#### Community

The [voice library](/docs/product-guides/voices/voice-library) contains over 5,000 voices shared by the ElevenLabs community. Use it to:

* Discover unique voices shared by the ElevenLabs community.
* Add voices to your personal collection.
* Share your own voice clones for cash rewards when others use it.


  Share your voice with the community, set your terms, and earn cash rewards when others use it.
  We've paid out over **\$1M** already.



  
    Learn how to use voices from the voice library
  


#### Cloned

Clone your own voice from 30-second samples with Instant Voice Cloning, or create hyper-realistic voices using Professional Voice Cloning.

* **Instant Voice Cloning**: Quickly replicate a voice from short audio samples.
* **Professional Voice Cloning**: Generate professional-grade voice clones with extended training audio.

Voice-captcha technology is used to verify that **all** voice clones are created from your own voice samples.


  A Creator plan or higher is required to create voice clones.



  
    Clone a voice instantly
  

  
    Create a perfect voice clone
  

  
    Learn how to create instant & professional voice clones
  


#### Voice design

With [Voice Design](/docs/product-guides/voices/voice-design), you can create entirely new voices by specifying attributes like age, gender, accent, and tone. Generated voices are ideal for:

* Realistic voices with nuanced characteristics.
* Creative character voices for games and storytelling.

The voice design tool creates 3 voice previews, simply provide:

* A **voice description** between 20 and 1000 characters.
* A **text** to preview the voice between 100 and 1000 characters.

#### Voice design with Eleven v3 (alpha)

Using the new [Eleven v3 model](/docs/models#eleven-v3-alpha), voices that are capable of a wide range of emotion can be designed via a prompt.

Using v3 gets you the following benefits:

* More natural and versatile voice generation.
* Better control over voice characteristics.
* Audio tags supported in Preview generations.
* Backward compatibility with v2 models.


  Voice design with v3 is currently in alpha. It is only available in the dashboard, with API access
  coming soon.



  
    Integrate voice design into your application.
  

  
    Learn how to craft voices from a single prompt.
  


#### Default

Our curated set of default voices is optimized for core use cases. These voices are:

* **Reliable**: Available long-term.
* **Consistent**: Carefully crafted and quality-checked for performance.
* **Model-ready**: Fine-tuned on new models upon release.


  Default voices are available to all users via the **my voices** tab in the [voice lab
  dashboard](https://elevenlabs.io/app/voice-lab). Default voices were previously referred to as
  `premade` voices. The latter term is still used when accessing default voices via the API.


### Managing voices

All voices can be managed through **My Voices**, where you can:

* Search, filter, and categorize voices
* Add descriptions and custom tags
* Organize voices for quick access

Learn how to manage your voice collection in [My Voices documentation](/docs/product-guides/voices/voice-library).

* **Search and Filter**: Find voices using keywords or tags.
* **Preview Samples**: Listen to voice demos before adding them to **My Voices**.
* **Add to Collection**: Save voices for easy access in your projects.

> **Tip**: Try searching by specific accents or genres, such as "Australian narration" or "child-like character."

### Supported languages

All ElevenLabs voices support multiple languages. Experiment by converting phrases like `Hello! こんにちは! Bonjour!` into speech to hear how your own voice sounds across different languages.

ElevenLabs supports voice creation in 32 languages. Match your voice selection to your target region for the most natural results.

* **Default Voices**: Optimized for multilingual use.
* **Generated and Cloned Voices**: Accent fidelity depends on input samples or selected attributes.

Our v2 models support 29 languages:

*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*

Flash v2.5 supports 32 languages - all languages from v2 models plus:

*Hungarian, Norwegian & Vietnamese*

[Learn more about our models](/docs/models)

## FAQ


  
    Yes, you can create custom voices with Voice Design or clone voices using Instant or
    Professional Voice Cloning. Both options are accessible in **My Voices**.
  

  
    Instant Voice Cloning uses short audio samples for near-instantaneous voice creation.
    Professional Voice Cloning requires longer samples but delivers hyper-realistic, high-quality
    results.
  

  
    Professional Voice Clones can be shared privately or publicly in the Voice Library. Generated
    voices and Instant Voice Clones cannot currently be shared.
  

  
    Use **My Voices** to search, filter, and organize your voice collection. You can also delete,
    tag, and categorize voices for easier management.
  

  
    Use clean and consistent audio samples. For Professional Voice Cloning, provide a variety of
    recordings in the desired speaking style.
  

  
    Yes, Professional Voice Clones can be shared in the Voice Library. Instant Voice Clones and
    Generated Voices cannot currently be shared.
  

  
    Generated Voices are ideal for unique characters in games, animations, and creative
    storytelling.
  

  
    Go to **Voices > Voice Library** in your dashboard or access it via API.
  



# Forced Alignment

> Learn how to turn spoken audio and text into a time-aligned transcript with ElevenLabs.

## Overview

The ElevenLabs [Forced Alignment](/docs/api-reference/forced-alignment) API turns spoken audio and text into a time-aligned transcript. This is useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript. This can be used for:

* Matching subtitles to a video recording
* Generating timings for an audiobook recording of an ebook

## Usage

The Forced Alignment API can be used by interfacing with the ElevenLabs API directly.


  
    Learn how to integrate Forced Alignment into your application.
  


## Supported languages

Our v2 models support 29 languages:

*English (USA, UK, Australia, Canada), Japanese, Chinese, German, Hindi, French (France, Canada), Korean, Portuguese (Brazil, Portugal), Italian, Spanish (Spain, Mexico), Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic (Saudi Arabia, UAE), Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian & Russian.*

## FAQ


  
    Forced alignment is a technique used to align spoken audio with text. You provide an audio file and a transcript of the audio file and the API will return a time-aligned transcript.

    It's useful for cases where you have audio recording and a transcript, but need exact timestamps for each word or phrase in the transcript.
  

  
    The input text should be a string with no special formatting i.e. JSON.

    Example of good input text:

    ```
    "Hello, how are you?"
    ```

    Example of bad input text:

    ```
    {
        "text": "Hello, how are you?"
    }
    ```
  

  
    Forced Alignment costs the same as the [Speech to Text](/docs/capabilities/speech-to-text#pricing) API.
  

  
    Forced Alignment does not support diarization. If you provide diarized text, the API will likely return unwanted results.
  

  
    The maximum file size for Forced Alignment is 1GB.
  

  
    For audio files, the maximum duration is 4.5 hours.

    For the text input, the maximum length is 675k characters.
  



# Streaming text to speech

> Learn how to stream text into speech in Python or Node.js.

In this tutorial, you'll learn how to convert [text to speech](https://elevenlabs.io/text-to-speech) with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application.

If you want to jump straight to an example you can find them in the [Python](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python) and [Node.js](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node) example repositories.

## Requirements

* An ElevenLabs account with an API key (here’s how to [find your API key](/docs/developer-guides/quickstart#authentication)).
* Python or Node installed on your machine
* (Optionally) an AWS account with access to S3.

## Setup

### Installing our SDK

Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip:


  ```bash Python
  pip install elevenlabs
  ```

  ```bash TypeScript
  npm install @elevenlabs/elevenlabs-js
  ```


Additionally, install necessary packages to manage your environmental variables:


  ```bash Python
  pip install python-dotenv
  ```

  ```bash TypeScript
  npm install dotenv
  npm install @types/dotenv --save-dev
  ```


Next, create a `.env` file in your project directory and fill it with your credentials like so:

```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```

## Convert text to speech (file)

To convert text to speech and save it as a file, we’ll use the `convert` method of the ElevenLabs SDK and then it locally as a `.mp3` file.


  ```python Python

  import os
  import uuid
  from elevenlabs import VoiceSettings
  from elevenlabs.client import ElevenLabs

  ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
  elevenlabs = ElevenLabs(
      api_key=ELEVENLABS_API_KEY,
  )


  def text_to_speech_file(text: str) -> str:
      # Calling the text_to_speech conversion API with detailed parameters
      response = elevenlabs.text_to_speech.convert(
          voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
          output_format="mp3_22050_32",
          text=text,
          model_id="eleven_turbo_v2_5", # use the turbo model for low latency
          # Optional voice settings that allow you to customize the output
          voice_settings=VoiceSettings(
              stability=0.0,
              similarity_boost=1.0,
              style=0.0,
              use_speaker_boost=True,
              speed=1.0,
          ),
      )

      # uncomment the line below to play the audio back
      # play(response)

      # Generating a unique file name for the output MP3 file
      save_file_path = f"{uuid.uuid4()}.mp3"

      # Writing the audio to a file
      with open(save_file_path, "wb") as f:
          for chunk in response:
              if chunk:
                  f.write(chunk)

      print(f"{save_file_path}: A new audio file was saved successfully!")

      # Return the path of the saved audio file
      return save_file_path

  ```

  ```typescript TypeScript
  import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
  import * as dotenv from 'dotenv';
  import { createWriteStream } from 'fs';
  import { v4 as uuid } from 'uuid';

  dotenv.config();

  const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;

  const elevenlabs = new ElevenLabsClient({
    apiKey: ELEVENLABS_API_KEY,
  });

  export const createAudioFileFromText = async (text: string): Promise => {
    return new Promise(async (resolve, reject) => {
      try {
        const audio = await elevenlabs.textToSpeech.convert('JBFqnCBsd6RMkjVDRZzb', {
          modelId: 'eleven_multilingual_v2',
          text,
          outputFormat: 'mp3_44100_128',
          // Optional voice settings that allow you to customize the output
          voiceSettings: {
            stability: 0,
            similarityBoost: 0,
            useSpeakerBoost: true,
            speed: 1.0,
          },
        });

        const fileName = `${uuid()}.mp3`;
        const fileStream = createWriteStream(fileName);

        audio.pipe(fileStream);
        fileStream.on('finish', () => resolve(fileName)); // Resolve with the fileName
        fileStream.on('error', reject);
      } catch (error) {
        reject(error);
      }
    });
  };
  ```


You can then run this function with:


  ```python Python
  text_to_speech_file("Hello World")
  ```

  ```typescript TypeScript
  await createAudioFileFromText('Hello World');
  ```


## Convert text to speech (streaming)

If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature.


  ```python Python

  import os
  from typing import IO
  from io import BytesIO
  from elevenlabs import VoiceSettings
  from elevenlabs.client import ElevenLabs

  ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
  elevenlabs = ElevenLabs(
      api_key=ELEVENLABS_API_KEY,
  )


  def text_to_speech_stream(text: str) -> IO[bytes]:
      # Perform the text-to-speech conversion
      response = elevenlabs.text_to_speech.stream(
          voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
          output_format="mp3_22050_32",
          text=text,
          model_id="eleven_multilingual_v2",
          # Optional voice settings that allow you to customize the output
          voice_settings=VoiceSettings(
              stability=0.0,
              similarity_boost=1.0,
              style=0.0,
              use_speaker_boost=True,
              speed=1.0,
          ),
      )

      # Create a BytesIO object to hold the audio data in memory
      audio_stream = BytesIO()

      # Write each chunk of audio data to the stream
      for chunk in response:
          if chunk:
              audio_stream.write(chunk)

      # Reset stream position to the beginning
      audio_stream.seek(0)

      # Return the stream for further use
      return audio_stream

  ```

  ```typescript TypeScript
  import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
  import * as dotenv from 'dotenv';

  dotenv.config();

  const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;

  if (!ELEVENLABS_API_KEY) {
    throw new Error('Missing ELEVENLABS_API_KEY in environment variables');
  }

  const elevenlabs = new ElevenLabsClient({
    apiKey: ELEVENLABS_API_KEY,
  });

  export const createAudioStreamFromText = async (text: string): Promise => {
    const audioStream = await elevenlabs.textToSpeech.stream('JBFqnCBsd6RMkjVDRZzb', {
      modelId: 'eleven_multilingual_v2',
      text,
      outputFormat: 'mp3_44100_128',
      // Optional voice settings that allow you to customize the output
      voiceSettings: {
        stability: 0,
        similarityBoost: 1.0,
        useSpeakerBoost: true,
        speed: 1.0,
      },
    });

    const chunks: Buffer[] = [];
    for await (const chunk of audioStream) {
      chunks.push(chunk);
    }

    const content = Buffer.concat(chunks);
    return content;
  };
  ```


You can then run this function with:


  ```python Python
  text_to_speech_stream("This is James")
  ```

  ```typescript TypeScript
  await createAudioStreamFromText('This is James');
  ```


## Bonus - Uploading to AWS S3 and getting a secure sharing link

Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link.


  
    To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your `.env` file. Follow these steps to find the credentials:

    1. Log in to your AWS Management Console: Navigate to the AWS home page and sign in with your account.

    
      
    

    2. Access the IAM (Identity and Access Management) Dashboard: You can find IAM under "Security, Identity, & Compliance" on the services menu. The IAM dashboard manages access to your AWS services securely.

    
      
    

    3. Create a New User (if necessary): On the IAM dashboard, select "Users" and then "Add user". Enter a user name.

    
      
    

    4. Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it's best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket.

    
      
    

    5. Review and create the user: Review your settings and create the user. Upon creation, you'll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step.

    
      
    

    6. Get AWS region name: ex. us-east-1

    
      
    

    If you do not have an AWS S3 bucket, you will need to create a new one by following these steps:

    1. Access the S3 dashboard: You can find S3 under "Storage" on the services menu.

    
      
    

    2. Create a new bucket: On the S3 dashboard, click the "Create bucket" button.

    
      
    

    3. Enter a bucket name and click on the "Create bucket" button. You can leave the other bucket options as default. The newly added bucket will appear in the list.

    
      
    

    
      
    
  

  
    Install `boto3` for interacting with AWS services using `pip` and `npm`.

    
      ```bash Python
      pip install boto3
      ```

      ```bash TypeScript
      npm install @aws-sdk/client-s3
      npm install @aws-sdk/s3-request-presigner
      ```
    

    Then add the environment variables to `.env` file like so:

    ```
    AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
    AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
    AWS_REGION_NAME=your_aws_region_name_here
    AWS_S3_BUCKET_NAME=your_s3_bucket_name_here
    ```
  

  
    Add the following functions to upload the audio stream to S3 and generate a signed URL.

    
      ```python s3_uploader.py (Python)

      import os
      import boto3
      import uuid

      AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
      AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
      AWS_REGION_NAME = os.getenv("AWS_REGION_NAME")
      AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")

      session = boto3.Session(
          aws_access_key_id=AWS_ACCESS_KEY_ID,
          aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
          region_name=AWS_REGION_NAME,
      )
      s3 = session.client("s3")


      def generate_presigned_url(s3_file_name: str) -> str:
          signed_url = s3.generate_presigned_url(
              "get_object",
              Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name},
              ExpiresIn=3600,
          )  # URL expires in 1 hour
          return signed_url


      def upload_audiostream_to_s3(audio_stream) -> str:
          s3_file_name = f"{uuid.uuid4()}.mp3"  # Generates a unique file name using UUID
          s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name)

          return s3_file_name

      ```

      ```typescript s3_uploader.ts (TypeScript)
      import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
      import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
      import * as dotenv from 'dotenv';
      import { v4 as uuid } from 'uuid';

      dotenv.config();

      const { AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME, AWS_S3_BUCKET_NAME } =
        process.env;

      if (!AWS_ACCESS_KEY_ID || !AWS_SECRET_ACCESS_KEY || !AWS_REGION_NAME || !AWS_S3_BUCKET_NAME) {
        throw new Error('One or more environment variables are not set. Please check your .env file.');
      }

      const s3 = new S3Client({
        credentials: {
          accessKeyId: AWS_ACCESS_KEY_ID,
          secretAccessKey: AWS_SECRET_ACCESS_KEY,
        },
        region: AWS_REGION_NAME,
      });

      export const generatePresignedUrl = async (objectKey: string) => {
        const getObjectParams = {
          Bucket: AWS_S3_BUCKET_NAME,
          Key: objectKey,
          Expires: 3600,
        };
        const command = new GetObjectCommand(getObjectParams);
        const url = await getSignedUrl(s3, command, { expiresIn: 3600 });
        return url;
      };

      export const uploadAudioStreamToS3 = async (audioStream: Buffer) => {
        const remotePath = `${uuid()}.mp3`;
        await s3.send(
          new PutObjectCommand({
            Bucket: AWS_S3_BUCKET_NAME,
            Key: remotePath,
            Body: audioStream,
            ContentType: 'audio/mpeg',
          })
        );
        return remotePath;
      };
      ```
    

    You can then call uploading function with the audio stream from the text.

    
      ```python Python
      s3_file_name = upload_audiostream_to_s3(audio_stream)
      ```

      ```typescript TypeScript
      const s3path = await uploadAudioStreamToS3(stream);
      ```
    

    After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing.

    You can now generate a URL from a file with:

    
      ```python Python
      signed_url = generate_presigned_url(s3_file_name)
      print(f"Signed URL to access the file: {signed_url}")
      ```

      ```typescript TypeScript
      const presignedUrl = await generatePresignedUrl(s3path);
      console.log('Presigned URL:', presignedUrl);
      ```
    

    If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire.
  

  
    To put it all together, you can use the following script:

    
      ```python main.py (Python)

      import os

      from dotenv import load_dotenv

      load_dotenv()

      from text_to_speech_stream import text_to_speech_stream
      from s3_uploader import upload_audiostream_to_s3, generate_presigned_url


      def main():
          text = "This is James"

          audio_stream = text_to_speech_stream(text)
          s3_file_name = upload_audiostream_to_s3(audio_stream)
          signed_url = generate_presigned_url(s3_file_name)

          print(f"Signed URL to access the file: {signed_url}")


      if __name__ == "__main__":
          main()

      ```

      ```typescript index.ts (Typescript)
      import 'dotenv/config';

      import { generatePresignedUrl, uploadAudioStreamToS3 } from './s3_uploader';
      import { createAudioFileFromText } from './text_to_speech_file';
      import { createAudioStreamFromText } from './text_to_speech_stream';

      (async () => {
        // save the audio file to disk
        const fileName = await createAudioFileFromText(
          'Today, the sky is exceptionally clear, and the sun shines brightly.'
        );

        console.log('File name:', fileName);

        // OR stream the audio, upload to S3, and get a presigned URL
        const stream = await createAudioStreamFromText(
          'Today, the sky is exceptionally clear, and the sun shines brightly.'
        );

        const s3path = await uploadAudioStreamToS3(stream);

        const presignedUrl = await generatePresignedUrl(s3path);

        console.log('Presigned URL:', presignedUrl);
      })();
      ```
    
  


## Conclusion

You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically.

Here are some examples of what you could build with this.

1. **Educational Podcasts**: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting.

2. **Accessibility Features for Websites**: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning.

3. **Automated Customer Support Messages**: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails.

4. **Audio Books and Narration**: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading.

5. **Language Learning Tools**: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way.

For more details, visit the following to see the full project files which give a clear structure for setting up your application:

For Python: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python)

For TypeScript: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node)

If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).


# Stitching multiple requests

> Learn how to maintain voice prosody over multiple chunks/generations.

When converting a large body of text into audio, you may encounter abrupt changes in prosody from one chunk to another. This can be particularly noticeable when converting text that spans multiple paragraphs or sections. In order to maintain voice prosody over multiple chunks, you can use the Request Stitching feature.

This feature allows you to provide context on what has already been generated and what will be generated in the future, helping to maintain a consistent voice and prosody throughout the entire text.

Here's an example without Request Stitching: