# Add Chapter To A Project
post /v1/projects/{project_id}/chapters/add
Creates a new chapter either as blank or from a URL.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Add Project
post /v1/projects/add
Creates a new project, it can be either initialized as blank, from a document or from a URL.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Add Sharing Voice
post /v1/voices/add/{public_user_id}/{voice_id}
Add a sharing voice to your collection of voices in VoiceLab.
# Add Voice
post /v1/voices/add
Add a new voice to your collection of voices in VoiceLab.
## Voice Cloning API Usage
If you provide a list of file paths to audio recordings intended for voice cloning,
you confirm that you have all necessary rights or consents to upload and clone the
voice samples they contain and that you will not use the platform-generated content
for any illegal, fraudulent, or harmful purpose. You reaffirm your obligation to abide
by ElevenLabs’ [Terms of Service](https://elevenlabs.io/terms-of-use),
[Prohibited Use Policy](https://elevenlabs.io/use-policy) and
[Privacy Policy](https://elevenlabs.io/privacy-policy).
# Audio Isolation
post /v1/audio-isolation
Removes background noise from audio
## Pricing
The API is charged at 1000 characters per minute of audio.
## Removing background noise with our Python SDK
Our Audio Isolation API is what powers our Voice Isolator, which removes background noise from audio and leaves you with crystal clear dialogue. To get started, here's an example you can follow using our [Python SDK.](https://github.com/elevenlabs/elevenlabs-python)
```python
from elevenlabs.client import ElevenLabs
# Initialize the client with your API key
client = ElevenLabs(api_key="your api key")
# Path to the audio file you want to isolate
audio_file_path = "sample_file.mp3"
with open(audio_file_path, "rb") as audio_file:
# Perform audio isolation
isolated_audio_iterator = client.audio_isolation.audio_isolation(audio=audio_file)
# Save the isolated audio to a new file
output_file_path = "cleaned_file.mp3"
with open(output_file_path, "wb") as output_file:
for chunk in isolated_audio_iterator:
output_file.write(chunk)
print(f"Isolated audio saved to {output_file_path}")
```
# Audio Isolation Stream
post /v1/audio-isolation/stream
Removes background noise from audio and streams the result
## Pricing
The API is charged at 1000 characters per minute of audio.
# Convert Chapter
post /v1/projects/{project_id}/chapters/{chapter_id}/convert
Starts conversion of a specific chapter.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Convert Project
post /v1/projects/{project_id}/convert
Starts conversion of a project and all of its chapters.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Dub A Video Or An Audio File
post /v1/dubbing
Dubs provided audio or video file into given language.
# Creates Audionative Enabled Project.
post /v1/audio-native
Creates AudioNative enabled project, optionally starts conversion and returns project id and embeddable html snippet.
# Delete Chapter
delete /v1/projects/{project_id}/chapters/{chapter_id}
Delete a chapter by its chapter_id.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Delete Dubbing Project
delete /v1/dubbing/{dubbing_id}
Deletes a dubbing project.
# Delete History Item
delete /v1/history/{history_item_id}
Delete a history item by its ID
# Delete Project
delete /v1/projects/{project_id}
Delete a project by its project_id.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Delete Sample
delete /v1/voices/{voice_id}/samples/{sample_id}
Removes a sample by its ID.
# Delete Voice
delete /v1/voices/{voice_id}
Deletes a voice by its ID.
# Download History Items
post /v1/history/download
Download one or more history items. If one history item ID is provided, we will return a single audio file. If more than one history item IDs are provided, we will provide the history items packed into a .zip file.
# Edit Voice
post /v1/voices/{voice_id}/edit
Edit a voice created by you.
# Edit Voice Settings
post /v1/voices/{voice_id}/settings/edit
Edit your settings for a specific voice. "similarity_boost" corresponds to"Clarity + Similarity Enhancement" in the web app and "stability" corresponds to "Stability" slider in the web app.
# Generate A Random Voice
post /v1/voice-generation/generate-voice
Generate a random voice based on parameters. This method returns a generated_voice_id in the response header, and a sample of the voice in the body. If you like the generated voice call /v1/voice-generation/create-voice with the generated_voice_id to create the voice.
This API is deprecated. Please use the new [Text to Voice API](/api-reference/ttv-create-previews).
# Voice Generation Parameters
get /v1/voice-generation/generate-voice/parameters
Get possible parameters for the /v1/voice-generation/generate-voice endpoint.
This API is deprecated. Please use the new [Text to Voice API](/api-reference/ttv-create-previews).
# Get Audio From History Item
get /v1/history/{history_item_id}/audio
Returns the audio of an history item.
# Get Audio From Sample
get /v1/voices/{voice_id}/samples/{sample_id}/audio
Returns the audio corresponding to a sample attached to a voice.
# Get Chapter By Id
get /v1/projects/{project_id}/chapters/{chapter_id}
Returns information about a specific chapter.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Get Chapter Snapshots
get /v1/projects/{project_id}/chapters/{chapter_id}/snapshots
Gets information about all the snapshots of a chapter, each snapshot corresponds can be downloaded as audio. Whenever a chapter is converted a snapshot will be automatically created.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Get Chapters
get /v1/projects/{project_id}/chapters
Returns a list of your chapters for a project together and its metadata.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Get Default Voice Settings.
get /v1/voices/settings/default
Gets the default settings for voices. "similarity_boost" corresponds to"Clarity + Similarity Enhancement" in the web app and "stability" corresponds to "Stability" slider in the web app.
# Get Dubbed File
get /v1/dubbing/{dubbing_id}/audio/{language_code}
Returns dubbed file as a streamed file. Videos will be returned in MP4 format and audio only dubs will be returned in MP3.
# Get Dubbing Project Metadata
get /v1/dubbing/{dubbing_id}
Returns metadata about a dubbing project, including whether it's still in progress or not
# Get Transcript For Dub
get /v1/dubbing/{dubbing_id}/transcript/{language_code}
Returns transcript for the dub as an SRT file.
# Get Generated Items
get /v1/history
Returns metadata about all your generated audio.
# Get History Item By Id
get /v1/history/{history_item_id}
Returns information about an history item by its ID.
# Get Models
get /v1/models
Gets a list of available models.
# Get Project By Id
get /v1/projects/{project_id}
Returns information about a specific project. This endpoint returns more detailed information about a project than GET api.elevenlabs.io/v1/projects.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Get Project Snapshots
get /v1/projects/{project_id}/snapshots
Gets the snapshots of a project.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Get Projects
get /v1/projects
Returns a list of your projects together and its metadata.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Get User Info
get /v1/user
Gets information about the user
# Get User Subscription Info
get /v1/user/subscription
Gets extended information about the users subscription
# Get Voice
get /v1/voices/{voice_id}
Returns metadata about a specific voice.
# Get Voice Settings
get /v1/voices/{voice_id}/settings
Returns the settings for a specific voice. "similarity_boost" corresponds to"Clarity + Similarity Enhancement" in the web app and "stability" corresponds to "Stability" slider in the web app.
# Get Voices
get /v1/voices
Gets a list of all available voices for a user.
# API Reference Overview
Overview of ElevenLabs API endpoints and capabilities
Convert text into lifelike speech with industry-leading quality and latency
Clone voices from audio while preserving emotion and intonation
Generate AI-powered sound effects and audio for any use case
Separate speech from background noise in audio files
Create interactive AI voice experiences with WebSocket agents
Access and manage your generated audio history
Access subscription information and user details
Access and manage your custom AI voice collection
Generate custom voices from text descriptions
Browse and use our collection of shared voices
Organize and manage your audio generation projects
Create and manage custom pronunciation rules
Access our selection of AI voice models
Create and manage audio-enabled web projects
Automatically translate and dub audio content
Manage team members and workspace settings
Monitor character and API usage metrics
All API endpoints require authentication using your API key. Click through to
each section for detailed endpoint documentation.
# Add from file
post /v1/pronunciation-dictionaries/add-from-file
Creates a new pronunciation dictionary from a lexicon .PLS file
## Adding a pronunciation dictionary
Here is some example code for uploading a [pronunciation dictionary](https://elevenlabs.io/docs/projects/overview#pronunciation-dictionaries) and printing the response `pronunciation_dictionary_id` and `version_id`. You'll require these identifiers in the request body if you intend to use `pronunciation_dictionary_locators`.
All you will need to do is replace `API_KEY_HERE` with your actual API key and `PATH_HERE` with the actual path to the PLS file you want to upload.
If you need help in understanding how to properly format a PLS / pronunciation dictionary, please refer to the guide [here](https://elevenlabs.io/docs/projects/overview#pronunciation-dictionaries).
There is currently no way to fetch or update old dictionaries uploaded. Therefore, you will need to keep track of the identifiers. If you need to update the dictionary, you will have to upload a new one.
```python
import requests
import os
# Define your API key and the base URL for the Eleven Labs API
XI_API_KEY = "API_KEY_HERE"
BASE_URL = "https://api.elevenlabs.io/v1"
# Setup the headers for HTTP requests to include the API key and accept JSON responses
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
def upload_pronunciation_dictionary(file_path, name, description):
"""
Uploads a pronunciation dictionary file to the Eleven Labs API and returns its ID and version ID.
Parameters:
- file_path: The local path to the pronunciation dictionary file.
- name: A name for the pronunciation dictionary.
- description: A description of the pronunciation dictionary.
Returns:
A tuple containing the pronunciation dictionary ID and version ID if successful, None otherwise.
"""
# Construct the URL for adding a pronunciation dictionary from a file
url = f"{BASE_URL}/pronunciation-dictionaries/add-from-file"
# Prepare the file and data to be sent in the request
files = {'file': open(file_path, 'rb')}
data = {'name': name, 'description': description}
# Make the POST request to upload the dictionary
response = requests.post(url, headers=headers, files=files, data=data)
# Handle the response
if response.status_code == 200:
# Parse the response JSON to get the pronunciation dictionary and version IDs
data = response.json()
pronunciation_dictionary_id = data.get('id')
version_id = data.get('version_id')
# Return the IDs
return pronunciation_dictionary_id, version_id
else:
# Print an error message if the request failed
print("Error:", response.status_code)
return None, None
def main():
"""
The main function to upload a pronunciation dictionary.
"""
# Define the path to your pronunciation dictionary file and its metadata
file_path = r"PATH_HERE"
name = "Your Pronunciation Dictionary"
description = "My custom pronunciation dictionary"
# Upload the pronunciation dictionary and receive its ID and version ID
pronunciation_dictionary_id, version_id = upload_pronunciation_dictionary(file_path, name, description)
# Check if the upload was successful
if pronunciation_dictionary_id and version_id:
print("Pronunciation Dictionary Uploaded Successfully!")
print("Pronunciation Dictionary ID:", pronunciation_dictionary_id)
print("Version ID:", version_id)
else:
print("Failed to upload pronunciation dictionary.")
# Ensure this script block runs only when executed as a script, not when imported
if __name__ == "__main__":
main()
```
## Using a pronunciation dictionary
Here is some example code on how to use these identifiers or locators in your text-to-speech call.
```python
import requests
# Set your API key and base URL
XI_API_KEY = "API_KEY_HERE"
BASE_URL = "https://api.elevenlabs.io/v1"
VOICE_ID = "TxGEqnHWrfWFTfGW9XjX"
# Headers for the request
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
def text_to_speech(text, pronunciation_dictionary_id, version_id):
"""
Sends a text to speech request using a pronunciation dictionary.
Returns:
An audio file.
"""
# Define the URL for the text-to-speech endpoint
url = f"{BASE_URL}/text-to-speech/{VOICE_ID}"
# Payload for the request
payload = {
"model_id": "eleven_monolingual_v1",
"pronunciation_dictionary_locators": [
{
"pronunciation_dictionary_id": pronunciation_dictionary_id,
"version_id": version_id
}
],
"text": text,
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
}
}
# Make the POST request
response = requests.post(url, json=payload, headers=headers)
# Check the response status
if response.status_code == 200:
# Here you can save the audio response to a file if needed
print("Audio file generated successfully.")
# Save the audio to a file
with open("output_audio.mp3", "wb") as audio_file:
audio_file.write(response.content)
else:
print("Error:", response.status_code)
def main():
# Example text and dictionary IDs (replace with actual values)
text = "Hello, world! I can now use pronunciation dictionaries."
pronunciation_dictionary_id = "PD_ID_HERE"
version_id = "VERSION_ID_HERE"
# Call the text to speech function
text_to_speech(text, pronunciation_dictionary_id, version_id)
if __name__ == "__main__":
main()
```
# Get dictionary by id
get /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/
Get metadata for a pronunciation dictionary
# Add rules
post /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/add-rules
Add rules to the pronunciation dictionary
# Remove rules
post /v1/pronunciation-dictionaries/{pronunciation_dictionary_id}/remove-rules
Remove rules from the pronunciation dictionary
# Download version by id
get /v1/pronunciation-dictionaries/{dictionary_id}/{version_id}/download
Get PLS file with a pronunciation dictionary version rules
# Get dictionaries
get /v1/pronunciation-dictionaries/
Get a list of the pronunciation dictionaries you have access to and their metadata
# Get Voices
get /v1/shared-voices
Gets a list of shared voices.
# Node Library
# Python Library
# Sound Generation
post /v1/sound-generation
API that converts text into sounds & uses the most advanced AI audio model ever. Create sound effects for your videos, voice-overs or video games.
## Pricing
The API is charged at 100 characters per generation with automatic duration or 25 characters per second with set duration.
# Speech To Speech
post /v1/speech-to-speech/{voice_id}
Use Speech to Speech API to transform uploaded speech so it sounds like it was spoken by another voice. STS gives you full control over the emotions, timing and delivery.
## Audio generation
Generating speech-to-speech involves a similar process to text-to-speech, but with some adjustments in the API parameters. Instead of providing text when calling the API, you provide the path to an audio file that you would like to convert from one voice to another. Here’s a modified version of your code to illustrate how to generate speech-to-speech using the given API:
```python
# Import necessary libraries
import requests # Used for making HTTP requests
import json # Used for working with JSON data
# Define constants for the script
CHUNK_SIZE = 1024 # Size of chunks to read/write at a time
XI_API_KEY = "" # Your API key for authentication
VOICE_ID = "" # ID of the voice model to use
AUDIO_FILE_PATH = "" # Path to the input audio file
OUTPUT_PATH = "output.mp3" # Path to save the output audio file
# Construct the URL for the Speech-to-Speech API request
sts_url = f"https://api.elevenlabs.io/v1/speech-to-speech/{VOICE_ID}/stream"
# Set up headers for the API request, including the API key for authentication
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
# Set up the data payload for the API request, including model ID and voice settings
# Note: voice settings are converted to a JSON string
data = {
"model_id": "eleven_english_sts_v2",
"voice_settings": json.dumps({
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
})
}
# Set up the files to send with the request, including the input audio file
files = {
"audio": open(AUDIO_FILE_PATH, "rb")
}
# Make the POST request to the STS API with headers, data, and files, enabling streaming response
response = requests.post(sts_url, headers=headers, data=data, files=files, stream=True)
# Check if the request was successful
if response.ok:
# Open the output file in write-binary mode
with open(OUTPUT_PATH, "wb") as f:
# Read the response in chunks and write to the file
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
f.write(chunk)
# Inform the user of success
print("Audio stream saved successfully.")
else:
# Print the error message if the request was not successful
print(response.text)
```
## Voices
We offer 1000s of voices in 29 languages. Visit the [Voice Lab](https://elevenlabs.io/voice-lab) to explore our pre-made voices or [clone your own](https://elevenlabs.io/voice-cloning). Visit the [Voices Library](https://elevenlabs.io/voice-library) to see voices generated by ElevenLabs users.
## Supported languages
Our STS API is multilingual and currently supports the following languages:
`Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil, English, Polish, German, Spanish, French, Italian, Hindi and Portuguese`.
To use them, simply provide the input audio in the language of your choice.
***
# Streaming
post /v1/speech-to-speech/{voice_id}/stream
Create speech by combining the content and emotion of the uploaded audio with a voice of your choice and returns an audio stream.
# Stream Chapter Audio
post /v1/projects/{project_id}/chapters/{chapter_id}/snapshots/{chapter_snapshot_id}/stream
Stream the audio from a chapter snapshot. Use `GET /v1/projects/{project_id}/chapters/{chapter_id}/snapshots` to return the chapter snapshots of a chapter.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Stream Project Audio
post /v1/projects/{project_id}/snapshots/{project_snapshot_id}/stream
Stream the audio from a project snapshot.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).
# Text To Speech Streaming
post /v1/text-to-speech/{voice_id}/stream
Converts text into speech using a voice of your choice and returns audio as an audio stream.
# Text To Speech Streaming With Timestamps
post /v1/text-to-speech/{voice_id}/stream/with-timestamps
Converts text into audio together with timestamps on when which word was spoken in a streaming way.
## Audio generation
You can generate audio together with information on when which character was spoken in a streaming way using the following script:
```python
import requests
import json
import base64
VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Rachel
YOUR_XI_API_KEY = "ENTER_YOUR_API_KEY_HERE"
url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream/with-timestamps"
headers = {
"Content-Type": "application/json",
"xi-api-key": YOUR_XI_API_KEY
}
data = {
"text": (
"Born and raised in the charming south, "
"I can add a touch of sweet southern hospitality "
"to your audiobooks and podcasts"
),
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75
}
}
response = requests.post(
url,
json=data,
headers=headers,
stream=True
)
if response.status_code != 200:
print(f"Error encountered, status: {response.status_code}, "
f"content: {response.text}")
quit()
audio_bytes = b""
characters = []
character_start_times_seconds = []
character_end_times_seconds = []
for line in response.iter_lines():
if line: # filter out keep-alive new line
# convert the response which contains bytes into a JSON string from utf-8 encoding
json_string = line.decode("utf-8")
# parse the JSON string and load the data as a dictionary
response_dict = json.loads(json_string)
# the "audio_base64" entry in the dictionary contains the audio as a base64 encoded string,
# we need to decode it into bytes in order to save the audio as a file
audio_bytes_chunk = base64.b64decode(response_dict["audio_base64"])
audio_bytes += audio_bytes_chunk
if response_dict["alignment"] is not None:
characters.extend(response_dict["alignment"]["characters"])
character_start_times_seconds.extend(response_dict["alignment"]["character_start_times_seconds"])
character_end_times_seconds.extend(response_dict["alignment"]["character_end_times_seconds"])
with open('output.mp3', 'wb') as f:
f.write(audio_bytes)
print({
"characters": characters,
"character_start_times_seconds": character_start_times_seconds,
"character_end_times_seconds": character_end_times_seconds
})
```
This prints out a dictionary like:
```python
{
'characters': ['B', 'o', 'r', 'n', ' ', 'a', 'n', 'd', ' ', 'r', 'a', 'i', 's', 'e', 'd', ' ', 'i', 'n', ' ', 't', 'h', 'e', ' ', 'c', 'h', 'a', 'r', 'm', 'i', 'n', 'g', ' ', 's', 'o', 'u', 't', 'h', ',', ' ', 'I', ' ', 'c', 'a', 'n', ' ', 'a', 'd', 'd', ' ', 'a', ' ', 't', 'o', 'u', 'c', 'h', ' ', 'o', 'f', ' ', 's', 'w', 'e', 'e', 't', ' ', 's', 'o', 'u', 't', 'h', 'e', 'r', 'n', ' ', 'h', 'o', 's', 'p', 'i', 't', 'a', 'l', 'i', 't', 'y', ' ', 't', 'o', ' ', 'y', 'o', 'u', 'r', ' ', 'a', 'u', 'd', 'i', 'o', 'b', 'o', 'o', 'k', 's', ' ', 'a', 'n', 'd', ' ', 'p', 'o', 'd', 'c', 'a', 's', 't', 's'],
'character_start_times_seconds': [0.0, 0.186, 0.279, 0.348, 0.406, 0.441, 0.476, 0.499, 0.522, 0.58, 0.65, 0.72, 0.778, 0.824, 0.882, 0.906, 0.952, 0.975, 1.01, 1.045, 1.068, 1.091, 1.115, 1.149, 1.196, 1.254, 1.3, 1.358, 1.416, 1.474, 1.498, 1.521, 1.602, 1.66, 1.811, 1.869, 1.927, 1.974, 2.009, 2.043, 2.067, 2.136, 2.183, 2.218, 2.252, 2.287, 2.322, 2.357, 2.392, 2.426, 2.45, 2.508, 2.531, 2.589, 2.635, 2.682, 2.717, 2.763, 2.786, 2.81, 2.879, 2.937, 3.007, 3.065, 3.123, 3.17, 3.239, 3.286, 3.367, 3.402, 3.437, 3.46, 3.483, 3.529, 3.564, 3.599, 3.634, 3.68, 3.75, 3.82, 3.889, 3.971, 4.087, 4.168, 4.214, 4.272, 4.331, 4.389, 4.412, 4.447, 4.528, 4.551, 4.574, 4.609, 4.644, 4.702, 4.748, 4.807, 4.865, 4.923, 5.016, 5.074, 5.12, 5.155, 5.201, 5.248, 5.283, 5.306, 5.329, 5.352, 5.41, 5.457, 5.573, 5.654, 5.735, 5.886, 5.944, 6.06],
'character_end_times_seconds': [0.186, 0.279, 0.348, 0.406, 0.441, 0.476, 0.499, 0.522, 0.58, 0.65, 0.72, 0.778, 0.824, 0.882, 0.906, 0.952, 0.975, 1.01, 1.045, 1.068, 1.091, 1.115, 1.149, 1.196, 1.254, 1.3, 1.358, 1.416, 1.474, 1.498, 1.521, 1.602, 1.66, 1.811, 1.869, 1.927, 1.974, 2.009, 2.043, 2.067, 2.136, 2.183, 2.218, 2.252, 2.287, 2.322, 2.357, 2.392, 2.426, 2.45, 2.508, 2.531, 2.589, 2.635, 2.682, 2.717, 2.763, 2.786, 2.81, 2.879, 2.937, 3.007, 3.065, 3.123, 3.17, 3.239, 3.286, 3.367, 3.402, 3.437, 3.46, 3.483, 3.529, 3.564, 3.599, 3.634, 3.68, 3.75, 3.82, 3.889, 3.971, 4.087, 4.168, 4.214, 4.272, 4.331, 4.389, 4.412, 4.447, 4.528, 4.551, 4.574, 4.609, 4.644, 4.702, 4.748, 4.807, 4.865, 4.923, 5.016, 5.074, 5.12, 5.155, 5.201, 5.248, 5.283, 5.306, 5.329, 5.352, 5.41, 5.457, 5.573, 5.654, 5.735, 5.886, 5.944, 6.06, 6.548]
}
```
As you can see this dictionary contains three lists of the same size. For example response\_dict\['alignment']\['characters']\[3] contains the fourth character in the text you provided 'n',
response\_dict\['alignment']\['character\_start\_times\_seconds']\[3] and response\_dict\['alignment']\['character\_end\_times\_seconds']\[3] contain its start (0.348 seconds) and end (0.406 seconds) timestamps.
# Text To Speech
post /v1/text-to-speech/{voice_id}
API that converts text into lifelike speech with best-in-class latency & uses the most advanced AI audio model ever. Create voiceovers for your videos, audiobooks, or create AI chatbots for free.
***
# Introduction
Our AI model produces the highest-quality AI voices in the industry.
Our [text to speech](https://elevenlabs.io/text-to-speech) [API](https://elevenlabs.io/api) allows you to convert text into audio in 32 languages and 1000s of voices. Integrate our realistic text to speech voices into your react app, use our Python library or our websockets guide to get started.
### API Features
1000s of voices, in 32 languages, for every use-case, at 128kbps
As low as \~300ms (+ network latency) audio generation times with our Turbo model.
Understands text nuances for appropriate intonation and resonance.
***
# Quick Start
## Audio generation
Generate spoken audio from text with a simple request like the following Python example:
```python
import requests
CHUNK_SIZE = 1024
url = "https://api.elevenlabs.io/v1/text-to-speech/"
headers = {
"Accept": "audio/mpeg",
"Content-Type": "application/json",
"xi-api-key": ""
}
data = {
"text": "Born and raised in the charming south,
I can add a touch of sweet southern hospitality
to your audiobooks and podcasts",
"model_id": "eleven_monolingual_v1",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.5
}
}
response = requests.post(url, json=data, headers=headers)
with open('output.mp3', 'wb') as f:
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
if chunk:
f.write(chunk)
```
## Voices
We offer 1000s of voices in 29 languages. Visit the [Voice Lab](https://elevenlabs.io/voice-lab) to explore our pre-made voices or [clone your own](https://elevenlabs.io/voice-cloning). Visit the [Voices Library](https://elevenlabs.io/voice-library) to see voices generated by ElevenLabs users.
## Generation & Concurrency Limits
All our models support up to 10k characters (\~10 minutes of audio) in a single request. To achieve consistency over long form audio, try [request stitching](https://elevenlabs.io/docs/api-reference/how-to-use-request-stitching).
The concurrency limit (the maximum number of concurrent requests you can run in parallel) depends on the tier you are on.
* Free: 2
* Starter: 3
* Creator: 5
* Pro: 10
* Scale: 15
* Business: 15
If you need a higher limit, reach out to our [Enterprise team](https://elevenlabs.io/enterprise) to discuss a custom plan.
## Supported languages
Our TTS API is multilingual and currently supports the following languages:
`Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Russian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil, English, Polish, German, Spanish, French, Italian, Hindi, Portuguese, Hungarian, Vietnamese and Norwegian`.
To use them, simply provide the input text in the language of your choice.
Dig into the details of using the ElevenLabs TTS API.
Learn how to use our API with websockets.
A great place to ask questions and get help from the community.
Learn how to integrate ElevenLabs into your workflow.
***
# Text To Speech With Timestamps
post /v1/text-to-speech/{voice_id}/with-timestamps
Converts text into audio together with timestamps on when which word was spoken.
***
## Audio generation
You can generate audio together with information on when which character was spoken using the following script:
```python
import requests
import json
import base64
VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Rachel
YOUR_XI_API_KEY = "ENTER_YOUR_API_KEY_HERE"
url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/with-timestamps"
headers = {
"Content-Type": "application/json",
"xi-api-key": YOUR_XI_API_KEY
}
data = {
"text": (
"Born and raised in the charming south, "
"I can add a touch of sweet southern hospitality "
"to your audiobooks and podcasts"
),
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75
}
}
response = requests.post(
url,
json=data,
headers=headers,
)
if response.status_code != 200:
print(f"Error encountered, status: {response.status_code}, "
f"content: {response.text}")
quit()
# convert the response which contains bytes into a JSON string from utf-8 encoding
json_string = response.content.decode("utf-8")
# parse the JSON string and load the data as a dictionary
response_dict = json.loads(json_string)
# the "audio_base64" entry in the dictionary contains the audio as a base64 encoded string,
# we need to decode it into bytes in order to save the audio as a file
audio_bytes = base64.b64decode(response_dict["audio_base64"])
with open('output.mp3', 'wb') as f:
f.write(audio_bytes)
# the 'alignment' entry contains the mapping between input characters and their timestamps
print(response_dict['alignment'])
```
This prints out a dictionary like:
```python
{
'characters': ['B', 'o', 'r', 'n', ' ', 'a', 'n', 'd', ' ', 'r', 'a', 'i', 's', 'e', 'd', ' ', 'i', 'n', ' ', 't', 'h', 'e', ' ', 'c', 'h', 'a', 'r', 'm', 'i', 'n', 'g', ' ', 's', 'o', 'u', 't', 'h', ',', ' ', 'I', ' ', 'c', 'a', 'n', ' ', 'a', 'd', 'd', ' ', 'a', ' ', 't', 'o', 'u', 'c', 'h', ' ', 'o', 'f', ' ', 's', 'w', 'e', 'e', 't', ' ', 's', 'o', 'u', 't', 'h', 'e', 'r', 'n', ' ', 'h', 'o', 's', 'p', 'i', 't', 'a', 'l', 'i', 't', 'y', ' ', 't', 'o', ' ', 'y', 'o', 'u', 'r', ' ', 'a', 'u', 'd', 'i', 'o', 'b', 'o', 'o', 'k', 's', ' ', 'a', 'n', 'd', ' ', 'p', 'o', 'd', 'c', 'a', 's', 't', 's'],
'character_start_times_seconds': [0.0, 0.186, 0.279, 0.348, 0.406, 0.441, 0.476, 0.499, 0.522, 0.58, 0.65, 0.72, 0.778, 0.824, 0.882, 0.906, 0.952, 0.975, 1.01, 1.045, 1.068, 1.091, 1.115, 1.149, 1.196, 1.254, 1.3, 1.358, 1.416, 1.474, 1.498, 1.521, 1.602, 1.66, 1.811, 1.869, 1.927, 1.974, 2.009, 2.043, 2.067, 2.136, 2.183, 2.218, 2.252, 2.287, 2.322, 2.357, 2.392, 2.426, 2.45, 2.508, 2.531, 2.589, 2.635, 2.682, 2.717, 2.763, 2.786, 2.81, 2.879, 2.937, 3.007, 3.065, 3.123, 3.17, 3.239, 3.286, 3.367, 3.402, 3.437, 3.46, 3.483, 3.529, 3.564, 3.599, 3.634, 3.68, 3.75, 3.82, 3.889, 3.971, 4.087, 4.168, 4.214, 4.272, 4.331, 4.389, 4.412, 4.447, 4.528, 4.551, 4.574, 4.609, 4.644, 4.702, 4.748, 4.807, 4.865, 4.923, 5.016, 5.074, 5.12, 5.155, 5.201, 5.248, 5.283, 5.306, 5.329, 5.352, 5.41, 5.457, 5.573, 5.654, 5.735, 5.886, 5.944, 6.06],
'character_end_times_seconds': [0.186, 0.279, 0.348, 0.406, 0.441, 0.476, 0.499, 0.522, 0.58, 0.65, 0.72, 0.778, 0.824, 0.882, 0.906, 0.952, 0.975, 1.01, 1.045, 1.068, 1.091, 1.115, 1.149, 1.196, 1.254, 1.3, 1.358, 1.416, 1.474, 1.498, 1.521, 1.602, 1.66, 1.811, 1.869, 1.927, 1.974, 2.009, 2.043, 2.067, 2.136, 2.183, 2.218, 2.252, 2.287, 2.322, 2.357, 2.392, 2.426, 2.45, 2.508, 2.531, 2.589, 2.635, 2.682, 2.717, 2.763, 2.786, 2.81, 2.879, 2.937, 3.007, 3.065, 3.123, 3.17, 3.239, 3.286, 3.367, 3.402, 3.437, 3.46, 3.483, 3.529, 3.564, 3.599, 3.634, 3.68, 3.75, 3.82, 3.889, 3.971, 4.087, 4.168, 4.214, 4.272, 4.331, 4.389, 4.412, 4.447, 4.528, 4.551, 4.574, 4.609, 4.644, 4.702, 4.748, 4.807, 4.865, 4.923, 5.016, 5.074, 5.12, 5.155, 5.201, 5.248, 5.283, 5.306, 5.329, 5.352, 5.41, 5.457, 5.573, 5.654, 5.735, 5.886, 5.944, 6.06, 6.548]
}
```
As you can see this dictionary contains three lists of the same size. For example response\_dict\['alignment']\['characters']\[3] contains the fourth character in the text you provided 'n',
response\_dict\['alignment']\['character\_start\_times\_seconds']\[3] and response\_dict\['alignment']\['character\_end\_times\_seconds']\[3] contain its start (0.348 seconds) and end (0.406 seconds) timestamps.
# Generate Voice Previews From Description
post /v1/text-to-voice/create-previews
Generate custom voice previews based on provided voice description. The response includes a list of voice previews, each containing an id and a sample of the voice audio. If you like the voice preview and want to create a permanent voice, call `/v1/text-to-voice/create-voice-from-preview` with the corresponding voice id.
Follow our [Voice Design Prompt
Guide](/product/voices/voice-lab/voice-design#voice-design-prompt-guide) for
best results.
When you hit generate, we'll create three voice previews. You will be charged credits equal to the length of the text you submit (you are charged this amount once per call, even though you receive three voice previews). "Text" should be no less than 100 characters and no more than 1k characters.
# Create Voice From Voice Preview
post /v1/text-to-voice/create-voice-from-preview
Create a new voice from previously generated voice preview. This endpoint should be called after you fetched a `generated_voice_id` using `/v1/text-to-voice/create-previews`.
# Update Pronunciation Dictionaries
post /v1/projects/{project_id}/update-pronunciation-dictionaries
Updates the set of pronunciation dictionaries acting on a project. This will automatically mark text within this project as requiring reconverting where the new dictionary would apply or the old one no longer does.
Projects API avaliable upon request. To get access, [contact sales](https://elevenlabs.io/contact-sales).You can use the Pronunciation Dictionaries API to add a pronunciation dictionary from a file in order to get a valid id
# Get Characters Usage Metrics
get /v1/usage/character-stats
Returns the credit usage metrics for the current user or the entire workspace they are part of. The response will return a time axis with unix timestamps for each day and daily usage along that axis. The usage will be broken down by the specified breakdown type. For example, breakdown type "voice" will return the usage of each voice along the time axis.
# Websockets
This API provides real-time [text-to-speech](https://elevenlabs.io/text-to-speech) conversion using WebSockets. This allows you to send a text message and receive audio data back in real-time.
Endpoint: `wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id={model}`
# When to use
The Text-to-Speech Websockets API is designed to generate audio from partial text input while ensuring consistency throughout the generated audio. Although highly flexible, the Websockets API isn't a one-size-fits-all solution. It's well-suited for scenarios where:
* The input text is being streamed or generated in chunks.
* Word-to-audio alignment information is required.
For a practical demonstration in a real world application, refer to the [Example of voice streaming using ElevenLabs and OpenAI](#example-voice-streaming-using-elevenlabs-and-openai) section.
# When not to use
However, it may not be the best choice when:
* The entire input text is available upfront. Given that the generations are partial, some buffering is involved, which could potentially result in slightly higher latency compared to a standard HTTP request.
* You want to quickly experiment or prototype. Working with Websockets can be harder and more complex than using a standard HTTP API, which might slow down rapid development and testing.
In these cases, use the [Text to Speech API](/api-reference/text-to-speech) instead.
# Protocol
The WebSocket API uses a bidirectional protocol that encodes all messages as JSON objects.
# Streaming input text
The client can send messages with text input to the server. The messages can contain the following fields:
```json
{
"text": "This is a sample text ",
"voice_settings": {
"stability": 0.8,
"similarity_boost": 0.8
},
"generation_config": {
"chunk_length_schedule": [120, 160, 250, 290]
},
"xi_api_key": "",
"authorization": "Bearer "
}
```
Should always end with a single space string `" "`. In the first message, the text should be a space `" "`.
This is an advanced setting that most users shouldn't need to use. It relates to our generation schedule explained [here](#understanding-how-our-websockets-buffer-text).
Use this to attempt to immediately trigger the generation of audio, overriding the `chunk_length_schedule`. Unlike flush, `try_trigger_generation` will only generate audio if our [buffer](#understanding-how-our-websockets-buffer-text) contains more than a minimum threshold of characters, this is to ensure a higher quality response from our model.
Note that overriding the chunk schedule to generate small amounts of text may result in lower quality audio, therefore, only use this parameter if you really need text to be processed immediately. We generally recommend keeping the default value of `false` and adjusting the `chunk_length_schedule` in the `generation_config` instead.
This property should only be provided in the first message you send.
Defines the stability for voice settings.
Defines the similarity boost for voice settings.
Defines the style for voice settings. This parameter is available on V2+ models.
Defines the use speaker boost for voice settings. This parameter is available on V2+ models.
This property should only be provided in the first message you send.
This is an advanced setting that most users shouldn't need to use. It relates to our generation schedule explained [here](#understanding-how-our-websockets-buffer-text).
Determines the minimum amount of text that needs to be sent and present in our buffer before audio starts being generated. This is to maximise the amount of context available to the model to improve audio quality, whilst balancing latency of the returned audio chunks.
The default value is: \[120, 160, 250, 290].
This means that the first chunk of audio will not be generated until you send text that totals at least 120 characters long. The next chunk of audio will only be generated once a further 160 characters have been sent. The third audio chunk will be generated after the next 250 characters. Then the fourth, and beyond, will be generated in sets of at least 290 characters.
Customize this array to suit your needs. If you want to generate audio more frequently to optimise latency, you can reduce the values in the array. Note that setting the values too low may result in lower quality audio. Please test and adjust as needed.
Each item should be in the range 50-500.
Flush forces the generation of audio. Set this value to `true` when you have finished sending text, but want to keep the websocket connection open.
This is useful when you want to ensure that the last chunk of audio is generated even when the length of text sent is smaller than the value set in `chunk_length_schedule` (e.g. 120 or 50).
To understand more about how our websockets buffer text before audio is generated, please refer to [this](#understanding-how-our-websockets-buffer-text) section.
Provide the XI API Key in the first message if it's not in the header.
Authorization bearer token. Should be provided only in the first message if not present in the header and the XI API Key is not provided.
For best latency we recommend streaming word-by-word, this way we will start generating as soon as we reach the predefined number of un-generated characters.
## Close connection
In order to close the connection, the client should send an End of Sequence (EOS) message. The EOS message should always be an empty string:
```json End of Sequence (EOS) message
{
"text": ""
}
```
Should always be an empty string `""`.
## Streaming output audio
The server will always respond with a message containing the following fields:
```json Response message
{
"audio": "Y3VyaW91cyBtaW5kcyB0aGluayBhbGlrZSA6KQ==",
"isFinal": false,
"normalizedAlignment": {
"charStartTimesMs": [0, 3, 7, 9, 11, 12, 13, 15, 17, 19, 21],
"charDurationsMs": [3, 4, 2, 2, 1, 1, 2, 2, 2, 2, 3],
"chars": ["H", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d"]
},
"alignment": {
"charStartTimesMs": [0, 3, 7, 9, 11, 12, 13, 15, 17, 19, 21],
"charDurationsMs": [3, 4, 2, 2, 1, 1, 2, 2, 2, 2, 3],
"chars": ["H", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d"]
}
}
```
A generated partial audio chunk, encoded using the selected output\_format, by default this is MP3 encoded as a base64 string.
Indicates if the generation is complete. If set to `True`, `audio` will be null.
Alignment information for the generated audio given the input normalized text sequence.
A list of starting times (in milliseconds) for each character in the normalized text as it corresponds to the audio. For instance, the character 'H' starts at time 0 ms in the audio. Note these times are relative to the returned chunk from the model, and not the full audio response. See an example [here](#example-getting-word-start-times-using-alignment-values) for how to use this.
A list providing the duration (in milliseconds) for each character's pronunciation in the audio. For instance, the character 'H' has a pronunciation duration of 3 ms.
The list of characters in the normalized text sequence that corresponds with the timings and durations. This list is used to map the characters to their respective starting times and durations.
Alignment information for the generated audio given the original text sequence.
A list of starting times (in milliseconds) for each character in the original text as it corresponds to the audio. For instance, the character 'H' starts at time 0 ms in the audio. Note these times are relative to the returned chunk from the model, and not the full audio response. See an example [here](#example-getting-word-start-times-using-alignment-values) for how to use this.
A list providing the duration (in milliseconds) for each character's pronunciation in the audio. For instance, the character 'H' has a pronunciation duration of 3 ms.
The list of characters in the original text sequence that corresponds with the timings and durations. This list is used to map the characters to their respective starting times and durations.
## Path parameters
Voice ID to be used, you can use [Get Voices](/api-reference/get-voices) to list all the available voices.
## Query parameters
Identifier of the model that will be used, you can query them using [Get Models](/api-reference/get-models).
Language code (ISO 639-1) used to enforce a language for the model. Currently only Turbo v2.5 supports language enforcement. For other models, an error will be returned if language code is provided.
Whether to enable request logging, if disabled the request will not be present in history nor bigtable.
Enabled by default. Note: simple logging (aka printing) to stdout/stderr is always enabled.
Whether to enable/disable parsing of SSML tags within the provided text. For best results, we recommend
sending SSML tags as fully contained messages to the websockets endpoint, otherwise this may result in additional latency.
Please note that rendered text, in normalizedAlignment, will be altered in support of SSML tags. The
rendered text will use a . as a placeholder for breaks, and syllables will be reported using the CMU arpabet alphabet where SSML phoneme tags are used to specify pronunciation.
Disabled by default.
You can turn on latency optimizations at some cost of quality. The best possible final latency varies by model. Possible values:
| Value | Description |
| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 0 | default mode (no latency optimizations) |
| 1 | normal latency optimizations (about 50% of possible latency improvement of option 3) |
| 2 | strong latency optimizations (about 75% of possible latency improvement of option 3) |
| 3 | max latency optimizations |
| 4 | max latency optimizations, but also with text normalizer turned off for even more latency savings (best latency, but can mispronounce eg numbers and dates). |
Defaults to `0`
Output format of the generated audio. Must be one of:
| Value | Description |
| ---------- | ------------------------------------------------------------------------------------------------------------- |
| mp3\_44100 | default output format, mp3 with 44.1kHz sample rate |
| pcm\_16000 | PCM format (S16LE) with 16kHz sample rate |
| pcm\_22050 | PCM format (S16LE) with 22.05kHz sample rate |
| pcm\_24000 | PCM format (S16LE) with 24kHz sample rate |
| pcm\_44100 | PCM format (S16LE) with 44.1kHz sample rate |
| ulaw\_8000 | μ-law format (mulaw) with 8kHz sample rate. (Note that this format is commonly used for Twilio audio inputs.) |
Defaults to `mp3_44100`
The number of seconds that the connection can be inactive before it is automatically closed.
Defaults to `20` seconds, with a maximum allowed value of `180` seconds.
The audio for each text sequence is delivered in multiple chunks. By default when it's set to false, you'll receive all timing data (alignment information) with the first chunk only.
However, if you enable this option, you'll get the timing data with every audio chunk instead. This can help you precisely match each audio segment with its corresponding text.
# Example - Voice streaming using ElevenLabs and OpenAI
The following example demonstrates how to leverage the ElevenLabs Websockets API to stream input from OpenAI's GPT model, while the answer is being generated, thereby minimizing the overall latency of the operation.
```python
import asyncio
import websockets
import json
import base64
import shutil
import os
import subprocess
from openai import AsyncOpenAI
# Define API keys and voice ID
OPENAI_API_KEY = ''
ELEVENLABS_API_KEY = ''
VOICE_ID = '21m00Tcm4TlvDq8ikWAM'
# Set OpenAI API key
aclient = AsyncOpenAI(api_key=OPENAI_API_KEY)
def is_installed(lib_name):
return shutil.which(lib_name) is not None
async def text_chunker(chunks):
"""Split text into chunks, ensuring to not break sentences."""
splitters = (".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")
buffer = ""
async for text in chunks:
if buffer.endswith(splitters):
yield buffer + " "
buffer = text
elif text.startswith(splitters):
yield buffer + text[0] + " "
buffer = text[1:]
else:
buffer += text
if buffer:
yield buffer + " "
async def stream(audio_stream):
"""Stream audio data using mpv player."""
if not is_installed("mpv"):
raise ValueError(
"mpv not found, necessary to stream audio. "
"Install instructions: https://mpv.io/installation/"
)
mpv_process = subprocess.Popen(
["mpv", "--no-cache", "--no-terminal", "--", "fd://0"],
stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,
)
print("Started streaming audio")
async for chunk in audio_stream:
if chunk:
mpv_process.stdin.write(chunk)
mpv_process.stdin.flush()
if mpv_process.stdin:
mpv_process.stdin.close()
mpv_process.wait()
async def text_to_speech_input_streaming(voice_id, text_iterator):
"""Send text to ElevenLabs API and stream the returned audio."""
uri = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id=eleven_turbo_v2_5"
async with websockets.connect(uri) as websocket:
await websocket.send(json.dumps({
"text": " ",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8},
"xi_api_key": ELEVENLABS_API_KEY,
}))
async def listen():
"""Listen to the websocket for audio data and stream it."""
while True:
try:
message = await websocket.recv()
data = json.loads(message)
if data.get("audio"):
yield base64.b64decode(data["audio"])
elif data.get('isFinal'):
break
except websockets.exceptions.ConnectionClosed:
print("Connection closed")
break
listen_task = asyncio.create_task(stream(listen()))
async for text in text_chunker(text_iterator):
await websocket.send(json.dumps({"text": text}))
await websocket.send(json.dumps({"text": ""}))
await listen_task
async def chat_completion(query):
"""Retrieve text from OpenAI and pass it to the text-to-speech function."""
response = await aclient.chat.completions.create(model='gpt-4', messages=[{'role': 'user', 'content': query}],
temperature=1, stream=True)
async def text_iterator():
async for chunk in response:
delta = chunk.choices[0].delta
yield delta.content
await text_to_speech_input_streaming(VOICE_ID, text_iterator())
# Main execution
if __name__ == "__main__":
user_query = "Hello, tell me a very long story."
asyncio.run(chat_completion(user_query))
```
# Example - Other examples for interacting with our Websocket API
Some examples for interacting with the Websocket API in different ways are provided below
```python Python websockets and asyncio
import asyncio
import websockets
import json
import base64
async def text_to_speech(voice_id):
model = 'eleven_turbo_v2_5'
uri = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id={model}"
async with websockets.connect(uri) as websocket:
# Initialize the connection
bos_message = {
"text": " ",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8
},
"xi_api_key": "api_key_here", # Replace with your API key
}
await websocket.send(json.dumps(bos_message))
# Send "Hello World" input
input_message = {
"text": "Hello World "
}
await websocket.send(json.dumps(input_message))
# Send EOS message with an empty string instead of a single space
# as mentioned in the documentation
eos_message = {
"text": ""
}
await websocket.send(json.dumps(eos_message))
# Added a loop to handle server responses and print the data received
while True:
try:
response = await websocket.recv()
data = json.loads(response)
print("Server response:", data)
if data["audio"]:
chunk = base64.b64decode(data["audio"])
print("Received audio chunk")
else:
print("No audio data in the response")
break
except websockets.exceptions.ConnectionClosed:
print("Connection closed")
break
asyncio.get_event_loop().run_until_complete(text_to_speech("voice_id_here"))
```
```javascript Javascript websockets
const voiceId = "voice_id_here"; // replace with your voice_id
const model = 'eleven_turbo_v2_5';
const wsUrl = `wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=${model}`;
const socket = new WebSocket(wsUrl);
// 2. Initialize the connection by sending the BOS message
socket.onopen = function (event) {
const bosMessage = {
"text": " ",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8
},
"xi_api_key": "api_key_here", // replace with your API key
};
socket.send(JSON.stringify(bosMessage));
// 3. Send the input text message ("Hello World")
const textMessage = {
"text": "Hello World "
};
socket.send(JSON.stringify(textMessage));
// 4. Send the EOS message with an empty string
const eosMessage = {
"text": ""
};
socket.send(JSON.stringify(eosMessage));
};
// 5. Handle server responses
socket.onmessage = function (event) {
const response = JSON.parse(event.data);
console.log("Server response:", response);
if (response.audio) {
// decode and handle the audio data (e.g., play it)
const audioChunk = atob(response.audio); // decode base64
console.log("Received audio chunk");
} else {
console.log("No audio data in the response");
}
if (response.isFinal) {
// the generation is complete
}
if (response.normalizedAlignment) {
// use the alignment info if needed
}
};
// Handle errors
socket.onerror = function (error) {
console.error(`WebSocket Error: ${error}`);
};
// Handle socket closing
socket.onclose = function (event) {
if (event.wasClean) {
console.info(`Connection closed cleanly, code=${event.code}, reason=${event.reason}`);
} else {
console.warn('Connection died');
}
};
```
```python elevenlabs-python
from elevenlabs import generate, stream
def text_stream():
yield "Hi there, I'm Eleven "
yield "I'm a text to speech API "
audio_stream = generate(
text=text_stream(),
voice="Nicole",
model="eleven_turbo_v2_5",
stream=True
)
stream(audio_stream)
```
# Example - Getting word start times using alignment values
This code example shows how the start times of words can be retrieved using the alignment values returned from our API.
```python
import asyncio
import websockets
import json
import base64
# Define API keys and voice ID
ELEVENLABS_API_KEY = "INSERT HERE" <- INSERT YOUR API KEY HERE
VOICE_ID = 'nPczCjzI2devNBz1zQrb' #Brian
def calculate_word_start_times(alignment_info):
# Alignment start times are indexed from the start of the audio chunk that generated them
# In order to analyse runtime over the entire response we keep a cumulative count of played audio
full_alignment = {'chars': [], 'charStartTimesMs': [], 'charDurationsMs': []}
cumulative_run_time = 0
for old_dict in alignment_info:
full_alignment['chars'].extend([" "] + old_dict['chars'])
full_alignment['charDurationsMs'].extend([old_dict['charStartTimesMs'][0]] + old_dict['charDurationsMs'])
full_alignment['charStartTimesMs'].extend([0] + [time+cumulative_run_time for time in old_dict['charStartTimesMs']])
cumulative_run_time += sum(old_dict['charDurationsMs'])
# We now have the start times of every character relative to the entire audio output
zipped_start_times = list(zip(full_alignment['chars'], full_alignment['charStartTimesMs']))
# Get the start time of every character that appears after a space and match this to the word
words = ''.join(full_alignment['chars']).split(" ")
word_start_times = list(zip(words, [0] + [zipped_start_times[i+1][1] for (i, (a,b)) in enumerate(zipped_start_times) if a == ' ']))
print(f"total duration:{cumulative_run_time}")
print(word_start_times)
async def text_to_speech_alignment_example(voice_id, text_to_send):
"""Send text to ElevenLabs API and stream the returned audio and alignment information."""
uri = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id=eleven_turbo_v2_5"
async with websockets.connect(uri) as websocket:
await websocket.send(json.dumps({
"text": " ",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8, "use_speaker_boost": False},
"generation_config": {
"chunk_length_schedule": [120, 160, 250, 290]
},
"xi_api_key": ELEVENLABS_API_KEY,
}))
async def text_iterator(text):
"""Split text into chunks to mimic streaming from an LLM or similar"""
split_text = text.split(" ")
words = 0
to_send = ""
for chunk in split_text:
to_send += chunk + ' '
words += 1
if words >= 10:
print(to_send)
yield to_send
words = 0
to_send = ""
yield to_send
async def listen():
"""Listen to the websocket for audio data and write it to a file."""
audio_chunks = []
alignment_info = []
received_final_chunk = False
print("Listening for chunks from ElevenLabs...")
while not received_final_chunk:
try:
message = await websocket.recv()
data = json.loads(message)
if data.get("audio"):
audio_chunks.append(base64.b64decode(data["audio"]))
if data.get("alignment"):
alignment_info.append(data.get("alignment"))
if data.get('isFinal'):
received_final_chunk = True
break
except websockets.exceptions.ConnectionClosed:
print("Connection closed")
break
print("Writing audio to file")
with open("output_file.mp3", "wb") as f:
f.write(b''.join(audio_chunks))
calculate_word_start_times(alignment_info)
listen_task = asyncio.create_task(listen())
async for text in text_iterator(text_to_send):
await websocket.send(json.dumps({"text": text}))
await websocket.send(json.dumps({"text": " ", "flush": True}))
await listen_task
# Main execution
if __name__ == "__main__":
text_to_send = "The twilight sun cast its warm golden hues upon the vast rolling fields, saturating the landscape with an ethereal glow."
asyncio.run(text_to_speech_alignment_example(VOICE_ID, text_to_send))
```
# Understanding how our websockets buffer text
Our websocket service incorporates a buffer system designed to optimize the Time To First Byte (TTFB) while maintaining high-quality streaming.
All text sent to the websocket endpoint is added to this buffer and only when that buffer reaches a certain size is an audio generation attempted. This is because our model provides higher quality audio when the model has longer inputs, and can deduce more context about how the text should be delivered.
The buffer ensures smooth audio data delivery and is automatically emptied with a final audio generation either when the stream is closed, or upon sending a `flush` command. We have advanced settings for changing the chunk schedule, which can improve latency at the cost of quality by generating audio more frequently with smaller text inputs.
# Delete Existing Invitation
delete /v1/workspace/invites
Invalidates an existing email invitation. The invitation will still show up in the inbox it has been delivered to, but activating it to join the workspace won't work. This endpoint may only be called by workspace administrators.
Workspaces are currently only available for Enterprise customers. To upgrade, [get in touch with our sales team](https://elevenlabs.io/enterprise).
# Invite User
post /v1/workspace/invites/add
Sends an email invitation to join your workspace to the provided email. If the user doesn't have an account they will be prompted to create one. If the user accepts this invite they will be added as a user to your workspace and your subscription using one of your seats. This endpoint may only be called by workspace administrators.
Workspaces are currently only available for Enterprise customers. To upgrade, [get in touch with our sales team](https://elevenlabs.io/enterprise).
# Update Member
post /v1/workspace/members
Updates attributes of a workspace member. Apart from the email identifier, all parameters will remain unchanged unless specified. This endpoint may only be called by workspace administrators.
Workspaces are currently only available for Enterprise customers. To upgrade, [get in touch with our sales team](https://elevenlabs.io/enterprise).
# Product Updates
New updates and improvements
## API Updates
* **u-law Audio Formats**: Added u-law audio formats to the Convai API for integrations with Twilio.
* **TTS Websocket Improvements**: TTS websocket improvements, flushes and generation work more intuitively now.
* **TTS Websocket Auto Mode**: A more streamlined mode for using websockets. This setting focuses on reducing the latency by disabling the chunk schedule and all buffers. It is only recommended when sending full sentences, sending partial sentences will result in highly reduced quality.
* **Improvements to latency consistency**: Improvements to latency consistency for all models.
## Website Updates
* **TTS Redesign**: The website TTS redesign is now in alpha!
## API Updates
* **Normalize Text with the API**: Added the option normalize the input text in the TTS API. The new parameter is called `apply_text_normalization` and works on all non-turbo models.
## Feature Additions
* **Voice Design**: The Voice Design feature is now in beta!
## Model Updates
* **Stability Improvements**: Significant improvements in the audio stability of all models, but especially noticeable on `turbo_v2` and `turbo_v2.5`, when using:
* Websockets
* Projects
* Reader app
* TTS with request stitching
* ConvAI
* **Latency Improvements**: Time to first byte latency improvements by around 20-30ms for all models.
## API Updates
* **Remove Background Noise Voice Samples**: Added the ability to remove background noise from voice samples using our audio isolation model to improve quality for IVCs and PVCs at no additional cost.
* **Remove Background Noise STS Input**: Added the ability to remove background noise from STS audio input using our audio isolation model to improve quality at no additional cost.
### Feature Additions
* **Conversational AI Beta**: The conversational AI feature is now in beta!
# Delete Agent
delete /v1/convai/agents/{agent_id}
Delete an agent
# Get Agent
get /v1/convai/agents/{agent_id}
Retrieve config for an agent
# Get Agents
get /v1/convai/agents
Returns a page of your agents and their metadata.
# Get Conversations
get /v1/convai/conversations
Get all conversations of agents that user owns. With option to restrict to a specific agent.
# Get Conversation Audio
get /v1/convai/conversations/{conversation_id}/audio
Get the audio recording of a particular conversation
# Get Conversation Details
get /v1/convai/conversations/{conversation_id}
Get the details of a particular conversation
# Get Knowledge Base Document
get /v1/convai/agents/{agent_id}/knowledge-base/{documentation_id}
Get details about a specific documentation making up the agent's knowledge base
# Get Signed URL
get /v1/convai/conversation/get_signed_url
Get a signed url to start a conversation with an agent with an agent that requires authorization
# Get Widget
get /v1/convai/agents/{agent_id}/widget
Retrieve the widget configuration for an agent
# Update Agent
patch /v1/convai/agents/{agent_id}
Patches an Agent settings
# Create Agent
post /v1/convai/agents/create
Create an agent from a config object
# Create Knowledge Base Document
post /v1/convai/agents/{agent_id}/add-to-knowledge-base
Uploads a file or reference a webpage for the agent to use as part of it's knowledge base
# Create Agent Avatar
post /v1/convai/agents/{agent_id}/avatar
Sets the avatar for an agent displayed in the widget
# WebSocket
Create real-time, interactive voice conversations with AI agents
This documentation is for developers integrating directly with the ElevenLabs
WebSocket API. For convenience, consider using [the official SDKs provided by
ElevenLabs](/conversational-ai/docs/introduction).
The ElevenLabs [Conversational AI](https://elevenlabs.io/conversational-ai) WebSocket API enables real-time, interactive voice conversations with AI agents. By establishing a WebSocket connection, you can send audio input and receive audio responses in real-time, creating life-like conversational experiences.
Endpoint: `wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}`
## Authentication
### Using Agent ID
For public agents, you can directly use the `agent_id` in the WebSocket URL without additional authentication:
```bash
wss://api.elevenlabs.io/v1/convai/conversation?agent_id=
```
### Using a Signed URL
For private agents or conversations requiring authorization, obtain a signed URL from your server, which securely communicates with the ElevenLabs API using your API key.
### Example using cURL
**Request:**
```bash
curl -X GET "https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=" \
-H "xi-api-key: "
```
**Response:**
```json
{
"signed_url": "wss://api.elevenlabs.io/v1/convai/conversation?agent_id=&token="
}
```
Never expose your ElevenLabs API key on the client side.
## Communication
### Client-to-Server Messages
#### User Audio Chunk
Send audio data from the user to the server.
**Format:**
```json
{
"user_audio_chunk": ""
}
```
**Notes:**
* **Audio Format Requirements:**
* PCM 16-bit mono format
* Base64 encoded
* Sample rate of 16,000 Hz
* **Recommended Chunk Duration:**
* Send audio chunks approximately every **250 milliseconds (0.25 seconds)**
* This equates to chunks of about **4,000 samples** at a 16,000 Hz sample rate
* **Optimizing Latency and Efficiency:**
* **Balance Latency and Efficiency:** Sending audio chunks every 250 milliseconds offers a good trade-off between responsiveness and network overhead.
* **Adjust Based on Needs:**
* *Lower Latency Requirements:* Decrease the chunk duration to send smaller chunks more frequently.
* *Higher Efficiency Requirements:* Increase the chunk duration to send larger chunks less frequently.
* **Network Conditions:** Adapt the chunk size if you experience network constraints or variability.
#### Pong Message
Respond to server `ping` messages by sending a `pong` message, ensuring the `event_id` matches the one received in the `ping` message.
**Format:**
```json
{
"type": "pong",
"event_id": 12345
}
```
### Server-to-Client Messages
#### conversation\_initiation\_metadata
Provides initial metadata about the conversation.
**Format:**
```json
{
"type": "conversation_initiation_metadata",
"conversation_initiation_metadata_event": {
"conversation_id": "conv_123456789",
"agent_output_audio_format": "pcm_16000"
}
}
```
### Other Server-to-Client Messages
| Type | Purpose |
| ---------------- | --------------------------------------------------- |
| user\_transcript | Transcriptions of the user's speech |
| agent\_response | Agent's textual response |
| audio | Chunks of the agent's audio response |
| interruption | Indicates that the agent's response was interrupted |
| ping | Server pings to measure latency |
##### Message Formats
**user\_transcript:**
```json
{
"type": "user_transcript",
"user_transcription_event": {
"user_transcript": "Hello, how are you today?"
}
}
```
**agent\_response:**
```json
{
"type": "agent_response",
"agent_response_event": {
"agent_response": "Hello! I'm doing well, thank you for asking. How can I assist you today?"
}
}
```
**audio:**
```json
{
"type": "audio",
"audio_event": {
"audio_base_64": "SGVsbG8sIHRoaXMgaXMgYSBzYW1wbGUgYXVkaW8gY2h1bms=",
"event_id": 67890
}
}
```
**interruption:**
```json
{
"type": "interruption",
"interruption_event": {
"event_id": 54321
}
}
```
**internal\_tentative\_agent\_response:**
```json
{
"type": "internal_tentative_agent_response",
"tentative_agent_response_internal_event": {
"tentative_agent_response": "I'm thinking about how to respond..."
}
}
```
**ping:**
```json
{
"type": "ping",
"ping_event": {
"event_id": 13579,
"ping_ms": 50
}
}
```
## Latency Management
To ensure smooth conversations, implement these strategies:
* **Adaptive Buffering:** Adjust audio buffering based on network conditions.
* **Jitter Buffer:** Implement a jitter buffer to smooth out variations in packet arrival times.
* **Ping-Pong Monitoring:** Use ping and pong events to measure round-trip time and adjust accordingly.
## Security Best Practices
* Rotate API keys regularly and use environment variables to store them.
* Implement rate limiting to prevent abuse.
* Clearly explain the intention when prompting users for microphone access.
* Optimized Chunking: Tweak the audio chunk duration to balance latency and efficiency.
## Additional Resources
* [ElevenLabs Conversational AI Documentation](https://elevenlabs.io/docs/conversational-ai/overview)
* [ElevenLabs Conversational AI SDKs](https://elevenlabs.io/docs/conversational-ai/client-sdk)
# Custom LLM Integration
Guide for using your own LLM or server with ElevenLabs SDK.
## Using Your Own OpenAI Key for LLM
To integrate a custom OpenAI key, create a secret containing your OPENAI\_API\_KEY:
Navigate to the "Secrets" page and select "Add Secret"
Choose "Custom LLM" from the dropdown menu.
Enter the URL, your model, and the secret you created.
## Custom LLM Server
To bring a custom LLM server, set up a compatible server endpoint using OpenAI's style, specifically targeting create\_chat\_completion.
Here's an example server implementation using FastAPI and OpenAI's Python SDK:
```python
python
import json
import os
import fastapi
from fastapi.responses import StreamingResponse
from openai import AsyncOpenAI
import uvicorn
import logging
from dotenv import load_dotenv
from pydantic import BaseModel
from typing import List, Optional
# Load environment variables from .env file
load_dotenv()
# Retrieve API key from environment
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
if not OPENAI_API_KEY:
raise ValueError("OPENAI_API_KEY not found in environment variables")
app = fastapi.FastAPI()
oai_client = AsyncOpenAI(api_key=OPENAI_API_KEY)
class Message(BaseModel):
role: str
content: str
class ChatCompletionRequest(BaseModel):
messages: List[Message]
model: str
temperature: Optional[float] = 0.7
max_tokens: Optional[int] = None
stream: Optional[bool] = False
user_id: Optional[str] = None
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest) -> StreamingResponse:
oai_request = request.dict(exclude_none=True)
if "user_id" in oai_request:
oai_request["user"] = oai_request.pop("user_id")
chat_completion_coroutine = await oai_client.chat.completions.create(**oai_request)
async def event_stream():
try:
async for chunk in chat_completion_coroutine:
yield f"data: {json.dumps(chunk)}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logging.error("An error occurred: %s", str(e))
yield f"data: {json.dumps({'error': 'Internal error occurred!'})}\n\n"
return StreamingResponse(event_stream(), media_type="text/event-stream")
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8013)
```
Run this code or your own server code.
### Setting Up a Public URL for Your Server
To make your server accessible, create a public URL using a tunneling tool like ngrok:
```shell
ngrok http --url=.ngrok.app 8013
```
### Configuring Elevenlabs CustomLLM
Now let's make the changes in Elevenlabs
Direct your server URL to ngrok endpoint, also setup "Limit token usage" to 5000.
You can start interacting with Conversational AI with your own LLM server
# Knowledge Base
Learn how to enhance your conversational agent with custom knowledge
Knowledge bases allow you to provide additional context to your conversational
agent beyond its base LLM knowledge.
Non-enterprise users can add up to 5 files/links (max 20MB, 300,000 characters
total).
## Adding Knowledge Items
There are 3 options to enhance your conversational agent's knowledge:
### 1. File Upload
### 2. URL Import
Ensure you have permission to use the content from the URLs you provide
### 3. Direct Text Input
## Best Practices
Provide clear, well-structured information that's relevant to your agent's
purpose
Break large documents into smaller, focused pieces for better processing
## Enterprise Features
Need higher limits? Contact our sales team to discuss enterprise plans with
expanded knowledge base capabilities.
# Tools
Provide your agent with real time information and the ability to take action in third party apps with external function calls.
Tools allow you to make external function calls to third party apps so you can get real-time information. You might use tools to:
Schedule appointments and manage availability on someone's calendar
Book restaurant reservations and manage dining arrangements
Create or update customer records in a CRM system
Get inventory data to make product recommendations
To help you get started with Tools, we'll walk through an "AI receptionist" we created by integrating with the Cal.com API.
## Tools Overview
### Secrets
Before we proceed with creating our Tools, we will first create a Secret to securely store our API keys. The Cal.com API we will use for our example takes a Bearer token so we will first add a Secret named "Bearer" and provide the Bearer token as the value.
You can find Secrets within the Conversational AI Dashboard in the Agent subnav.
### Webhooks
Next, look for "Tools" in the "Agent" subnav. Add a new Tool to configure your webhook. For our AI receptionist, we created two Tools to interact with the Cal.com API:
This tool allows the AI receptionist to check calendar availability. It can answer questions like "When is Sam available to meet tomorrow?" or "Is Sam free at 10:30am on Tuesday?"
```bash
Name: Get_Available_Slots
Method: GET
URL: https://api.cal.com/v2/slots/available
```
Uses Cal.com's [Get Available Slots](https://cal.com/docs/api-reference/v2/slots/get-available-slots#get-available-slots) endpoint to fetch open calendar slots within a specified date/time range.
This tool handles the actual meeting booking once a suitable time has been selected.
```bash
Name: Book_Meeting
Method: POST
URL: https://api.cal.com/v2/bookings
```
Uses Cal.com's [Create a booking](https://cal.com/docs/api-reference/v2/bookings/create-a-booking#create-a-booking) endpoint. This should only be called after collecting:
* Caller's full name
* Meeting time
* Email address
### Headers
Within the Cal.com documentation, we see that both our availability and booking endpoints require the same three headers:
```bash
Content-Type: application/json
cal-api-version: 2024-08-13
Authorization: Bearer
```
We configured that as follows:
| Type | Name | Value |
| ------ | --------------- | ------------------------------------------ |
| Value | Content-Type | application/json |
| Value | cal-api-version | 2024-08-13 |
| Secret | Bearer | Bearer (the secret key we defined earlier) |
### Path Parameters
You can add path parameters by including variables surrounded by curly brackets in your URL like this {variable}. Once added to the URL path, it will appear under Path Parameters with the ability to update the Data Type and Description.
Our AI receptionist does not call for Path Parameters so we will not be defining any.
### Query Parameters
Get and Delete requests typically have query parameters while Post and Patch do not. Our Get\_Available\_Slots tool relies on a Get request that requires the following query parameters: startTime, endTime, eventTypeId, eventTypeSlug, and duration.
In our Description for each, we define a prompt that our Conversational Agent will use to extract the relevant information from the call transcript using an LLM.
Here's how we defined our query parameters for our AI receptionist:
| Identifier | Data Type | Required | Description |
| ------------- | --------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| startTime | String | Yes | The start time of the slot the person is checking availability for in UTC timezone, formatted as ISO 8601 (e.g., '2024-08-13T09:00:00Z'). Extract time from natural language and convert to UTC. |
| endTime | String | Yes | The end time of the slot the person is checking availability for in UTC timezone, formatted as ISO 8601 (e.g., '2024-08-13T09:00:00Z'). Extract time from natural language and convert to UTC. |
| eventTypeSlug | String | Yes | The desired meeting length. Should be 15minutes, 30minutes, or 60minutes. |
| eventTypeId | Number | Yes | The desired meeting length, as an event id. If 15 minutes, return 1351800. If 30 minutes, return 1351801. If 60 minutes, return 1351802. |
### Body Parameters
Post and Patch requests typically have body parameters while Get and Delete do not. Our Book\_Meeting tool is a Post request and requires the following Body Parameters: startTime, eventTypeId, attendee.
In our Description for each, we define a prompt that our Conversational Agent will use to extract the relevant information from the call transcript using an LLM.
Here's how we defined our body parameters for our AI receptionist:
| Identifier | Data Type | Required | Description |
| ----------- | --------- | -------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| startTime | String | Yes | The start time of the slot the person is checking availability for in UTC timezone, formatted as ISO 8601 (e.g., '2024-08-13T09:00:00Z'). Extract time from natural language and convert to UTC. |
| eventTypeId | String | Yes | The unique Cal event ID for the meeting duration. Use 1351800 for a 15-minute meeting, 1351801 for 30 minutes, and 1351802 for 60 minutes. If no specific duration is provided, default to 1351801 (30 minutes). |
| attendee | Object | Yes | The info on the attendee including their full name, email address and time zone. |
Since attendee is an object, it's subfields are defined as their own parameters:
| Identifier | Data Type | Required | Description |
| ---------- | --------- | -------- | --------------------------------------------------------------------------------------------------------------- |
| name | String | Yes | The full name of the person booking the meeting. |
| email | String | Yes | The email address of the person booking the meeting. Should be a properly formatted email. |
| timeZone | String | Yes | The caller's timezone. Should be in the format of 'Continent/City' like 'Europe/London' or 'America/New\_York'. |
### Adjusting System Prompt to reference your Tools
Now that you've defined your Tools, instruct your agent on when and how to invoke them in your system prompt. If your Tools require the user to provide information, it's best to ask your agent to collect that information before calling it (though in many cases your agent will be able to realize it is missing information and will request for it anyway).
Here's the System Prompt we use for our AI Receptionist:
> You are my receptionist and people are calling to book a time with me.
>
> You can check my availability by using Get\_Available\_Slots. That endpoint takes start and end date/time and returns open slots in between. If someone asks for my availability but doesn't specify a date / time, just check for open slots tomorrow. If someone is checking availability and there are no open slots, keep checking the next day until you find one with availability.
>
> Once you've agreed upon a time to meet, you can use Book\_Meeting to book a call.
> You will need to collect their full name, the time they want to meet, whether they want to meet for 15, 30 or 60 minutes, and their email address to book a meeting.
>
> If you call Book\_Meeting and it fails, it's likely either that the email address is formatted in an invalid way or the selected time is not one where I am available.
# Agent Setup
Deploy customized, conversational voice agents in minutes.
## The Web Dashboard
The easiest way to get started with ElevenLabs Conversational AI is through our web dashboard.
You can access your dashboard [here](https://elevenlabs.io/sign-up). The web dashboard enables you to:
* Create and manage AI assistants
* Configure voice settings and conversation parameters
* Review conversation analytics and transcripts
* Manage API keys and integration settings
The web dashboard uses our Web SDK (link to it) under the hood to handle
real-time conversations.
## Pierogi Palace Assistant
In this guide, we'll create an AI assistant for "Pierogi Palace" - a modern Polish restaurant that takes orders through voice conversations. Our assistant will help customers order traditional Polish dishes with a contemporary twist.
![Pierogi Palace](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/pierogi-palace.png)
The assistant will guide customers through:
* **Menu Selection**
* Various pierogi options with traditional and modern fillings
* Portion sizes (available in dozens)
* **Order Details**
* Quantity confirmation
* Price calculation in Polish złoty
* Order review and modifications if needed
* **Delivery Information**
* Delivery address collection
* Estimated preparation time (10 minutes)
* Delivery time calculation based on location
* Payment method confirmation (cash on delivery)
## Assistant Setup
In this guide, we'll walk through configuring your **Pierogi Palace** assistant using ElevenLabs Conversational AI. We'll set up the assistant's voice, language model, and transcription settings to help customers place orders seamlessly.
### Prerequisites
* An [ElevenLabs account](https://www.elevenlabs.io)
### 1. Access the Dashboard
Go to [elevenlabs.io](https://elevenlabs.io/sign-up) and sign in to your account.
In the ElevenLabs dashboard, click on **Conversational AI** in the left sidebar.{" "}
![Dashboard](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-perogi-1.png)
### 2. Create Your Assistant
* Click the **+** button to create a new AI Agent.
* Choose the **Blank Template** option & call the agent `Pierogi Palace`.
![Create New Assistant](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-perogi-2.png)
Set the `First message` & `System prompt` fields to the following, leaving the **Knowledge Base** and **Tools** empty for now:
```plaintext Greeting Message
Welcome to Pierogi Palace! I'm here to help you place your order. What can I get started for you today?
```
```plaintext System Prompt
You are a friendly and efficient virtual assistant for Pierogi Palace, a modern Polish restaurant specializing in pierogis. It is located in the Zakopane mountains in Poland.
Your role is to help customers place orders over voice conversations. You have comprehensive knowledge of the menu items and their prices.
Menu Items:
- Potato & Cheese Pierogi – 30 Polish złoty per dozen
- Beef & Onion Pierogi – 40 Polish złoty per dozen
- Spinach & Feta Pierogi – 30 Polish złoty per dozen
Your Tasks:
1. Greet the Customer: Start with a warm welcome and ask how you can assist.
2. Take the Order: Listen carefully to the customer's selection, confirm the type and quantity of pierogis.
3. Confirm Order Details: Repeat the order back to the customer for confirmation.
4. Calculate Total Price: Compute the total cost based on the items ordered.
5. Collect Delivery Information: Ask for the customer's delivery address to estimate delivery time.
6. Estimate Delivery Time: Inform the customer that cooking time is 10 minutes plus delivery time based on their location.
7. Provide Order Summary: Give the customer a summary of their order, total price, and estimated delivery time.
8. Close the Conversation: Thank the customer and let them know their order is being prepared.
Guidelines:
- Use a friendly and professional tone throughout the conversation.
- Be patient and attentive to the customer's needs.
- If unsure about any information, politely ask the customer to repeat or clarify.
- Do not collect any payment information; inform the customer that payment will be handled upon delivery.
- Avoid discussing topics unrelated to taking and managing the order.
```
### 3. Configure Voice Settings
In this step, you can choose from over 3,000 life-like voices available in ElevenLabs, for this demo we will be using Jessica's voice.
![Assistant Settings](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-perogi-3.png)
Higher quality settings may increase response time slightly. For an optimal customer experience, we recommend balancing quality and latency based on your assistant's expected use case.
### 4. Test Your Assistant
* Press the **Order** button and try ordering some Pierogi to see how the assistant handles the conversation.
![Assistant Testing Interface](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-perogi-4.png)
### 5. Configure Data Collection
Configure evaluation criteria and data collection to analyze conversations and improve your assistant's performance.
![Assistant Analysis Interface](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-perogi-6.png)
Navigate to the **ANALYSIS** section in your assistant's settings to define custom criteria for evaluating conversations.
1. **Goal Prompt Criteria**
This passes the conversation transcript to the LLM to verify if specific goals were met. Results will be:
* success
* failure
* unknown
Plus a rationale explaining the chosen result.
Configure the following fields:
* **Name**: Enter a descriptive name
* **Prompt**: Enter detailed instructions for evaluating the conversation
```plaintext order_completion
Name: order_completion
Prompt: Evaluate if the conversation resulted in a successful order completion.
Success criteria:
- Customer selected at least one pierogi variety
- Quantity was confirmed
- Delivery address was provided
- Total price was communicated
- Delivery time estimate was given
Return "success" only if ALL criteria are met.
```
```plaintext customer_satisfaction
Name: customer_satisfaction
Prompt: Analyze the conversation for signs of customer satisfaction.
Success criteria:
- Customer's questions were answered clearly
- No repeated requests for clarification
- No signs of frustration or confusion
- Positive or neutral customer responses
Return "failure" if there are clear signs of dissatisfaction.
```
```plaintext menu_explanation
Name: menu_explanation
Prompt: Evaluate if the assistant properly explained menu options when asked.
Success criteria:
- Mentioned available pierogi varieties
- Provided prices when relevant
- Explained portion sizes (dozens)
Return "unknown" if menu items were not discussed.
```
In the **Data collection** section, define specifications for extracting data from conversation transcripts.
Click **Add data collection item** and configure:
* **Data type**: Select "string"
* **Identifier**: Enter a unique identifier for this data point
* **Description**: Provide detailed instructions for the LLM about how to extract the specific data from the transcript
Example data collection items:
```plaintext Order Type
Identifier: order_type
Description: Extract the type of order from the conversation.
Should be one of:
- delivery
- pickup
- inquiry_only
```
```plaintext Ordered Items
Identifier: ordered_items
Description: List all pierogi varieties and quantities ordered in the format: "item: quantity".
Should be one of:
- completed_order
- abandoned_order
- menu_inquiry
- general_inquiry
If no order was placed, return "none"
```
```plaintext Delivery Zone
Identifier: delivery_zone
Description: Based on the delivery address, categorize the location.
Should be one of:
- central_zakopane
- outer_zakopane
- outside_delivery_zone
```
```plaintext Interaction Type
Identifier: interaction_type
Description: Categorize the conversation.
Should be one of:
- completed_order
- abandoned_order
- menu_inquiry
- general_inquiry
```
![Conversation History](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-perogi-7.png)
Your Pierogi Palace assistant is now ready to take orders 🥟! The assistant can handle menu inquiries, process orders, and provide delivery estimates.
# Introduction
Deploy customized, conversational voice agents in minutes.
![Conversational AI Platform](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai.png)
## What is Conversational AI?
ElevenLabs Conversational AI is a platform for deploying customized, conversational voice agents. Built in response to our customers' needs, our platform eliminates months of development time typically spent building conversation stacks from scratch. Our Conversational AI orchestration platform combines three key components:
**The ears** - Converts spoken language into text with high accuracy
**The brain** - Processes and understands context to generate intelligent
responses
**The voice** - Transforms text responses into natural-sounding speech using
our industry-leading voice technology
## Why Choose ElevenLabs?
Get to production in hours, not months. Our platform handles the complex infrastructure including speech processing, turn-taking, and conversation management.
![Playground](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/playground.png)
Get access to over **3,000** premium voices or clone your own. Our industry-leading voice technology delivers the most natural-sounding AI conversations.
![Voices](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/voices.png)
Bring your own brain - integrate custom RAG, LLMs, and functions, or
leverage our built-in solutions. You maintain complete control over your
business logic.
Monitor and optimize your AI conversations with comprehensive analytics.
Track engagement metrics, conversation quality, and user satisfaction in
real-time through our intuitive dashboard.
## Pricing
Conversational AI is currently in beta. During this period, we're covering the
LLM costs, though these will be passed through to customers in the future.
* Setup & Prompt Testing: **500 credits per minute**
* Production: **1,000 credits per minute**
## Key Features
Deploy sophisticated AI agents that can:
* Engage in natural, context-aware conversations
* Handle interruptions gracefully with built-in turn-taking logic
* Process complex queries using advanced language models
* Maintain conversation history and context
* Scale to handle multiple concurrent conversations
Multiple flexible deployment options:
* Quick embed via customizable widget
* Direct API access for custom implementations
* Comprehensive SDKs for Python and JavaScript
* WebSocket support for real-time interactions
* Simple authentication and usage monitoring
{" "}
Powerful tools to save development time:
* Pre-built templates for common use cases
* Customizable system prompts and
knowledge bases
* Built-in analytics and success metrics
* Voice customization options
Advanced capabilities for business needs:
* Custom LLM support
* Enhanced security and compliance options
* Advanced analytics and reporting
* Priority support channels
* Custom integration assistance
## Popular Applications
Companies and creators use our Conversational AI orchestration platform to create:
AI agents trained on company help documentation that can handle complex
customer queries, troubleshoot issues, and provide 24/7 support in multiple
languages.
Personal AI helpers that manage scheduling, set reminders, look up
information, and help users stay organized throughout their day.
Shopping assistan ts that help customers find products, provide personalized
recommendations, track orders, and answer product-specific questions.
Engaging NPCs and storytelling agents that create immersive experiences,
guide players through custom worlds, and adapt narratives based on user
interactions.
Ready to get started? Check out our [quickstart
guide](/conversational-ai/docs/agent-setup) to create your first AI agent in
minutes.
# Next.JS
Learn how to create a web application that enables voice conversations with ElevenLabs AI agents
This tutorial will guide you through creating a web client that can interact with a Conversational AI agent. You'll learn how to implement real-time voice conversations, allowing users to speak with an AI agent that can listen, understand, and respond naturally using voice synthesis.
## What You'll Need
1. An ElevenLabs agent created following [this guide](/conversational-ai/docs/agent-setup)
2. `npm` installed on your local system.
3. We'll use Typescript for this tutorial, but you can use Javascript if you prefer.
Looking for a complete example? Check out our [Next.js demo on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/nextjs).
![Convai Example Project](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-nextjs-example.png)
## Setup
Open a terminal window and run the following command:
```bash
npm create next-app my-conversational-agent
```
It will ask you some questions about how to build your project. We'll follow the default suggestions for this tutorial.
```shell
cd my-conversational-agent
```
```shell
npm install @11labs/react
```
Run the following command to start the development server and open the provided URL in your browser:
```shell
npm run dev
```
![Verce Default Screen](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-nextjs-1.png)
## Implement Conversational AI
Create a new file `app/components/conversation.tsx`:
```tsx app/components/conversation.tsx
'use client';
import { useConversation } from '@11labs/react';
import { useCallback } from 'react';
export function Conversation() {
const conversation = useConversation({
onConnect: () => console.log('Connected'),
onDisconnect: () => console.log('Disconnected'),
onMessage: (message) => console.log('Message:', message),
onError: (error) => console.error('Error:', error),
});
const startConversation = useCallback(async () => {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
// Start the conversation with your agent
await conversation.startSession({
agentId: 'YOUR_AGENT_ID', // Replace with your agent ID
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation]);
const stopConversation = useCallback(async () => {
await conversation.endSession();
}, [conversation]);
return (
Status: {conversation.status}
Agent is {conversation.isSpeaking ? 'speaking' : 'listening'}
);
}
```
Replace the contents of `app/page.tsx` with:
```tsx app/page.tsx
import { Conversation } from './components/conversation';
export default function Home() {
return (
ElevenLabs Conversational AI
);
}
```
This authentication step is only required for private agents. If you're using
a public agent, you can skip this section and directly use the `agentId` in
the `startSession` call.
If you're using a private agent that requires authentication, you'll need to generate
a signed URL from your server. This section explains how to set this up.
### What You'll Need
1. An ElevenLabs account and API key. Sign up [here](https://www.elevenlabs.io/sign-up).
Create a `.env.local` file in your project root:
```yaml .env.local
ELEVENLABS_API_KEY=your-api-key-here
NEXT_PUBLIC_AGENT_ID=your-agent-id-here
```
1. Make sure to add `.env.local` to your `.gitignore` file to prevent accidentally committing sensitive credentials to version control.
2. Never expose your API key in the client-side code. Always keep it secure on the server.
Create a new file `app/api/get-signed-url/route.ts`:
```tsx app/api/get-signed-url/route.ts
import { NextResponse } from 'next/server';
export async function GET() {
try {
const response = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=${process.env.NEXT_PUBLIC_AGENT_ID}`,
{
headers: {
'xi-api-key': process.env.ELEVENLABS_API_KEY!,
},
}
);
if (!response.ok) {
throw new Error('Failed to get signed URL');
}
const data = await response.json();
return NextResponse.json({ signedUrl: data.signed_url });
} catch (error) {
return NextResponse.json(
{ error: 'Failed to generate signed URL' },
{ status: 500 }
);
}
}
```
Modify your `conversation.tsx` to fetch and use the signed URL:
```tsx app/components/conversation.tsx {5-12,19,23}
// ... existing imports ...
export function Conversation() {
// ... existing conversation setup ...
const getSignedUrl = async (): Promise => {
const response = await fetch("/api/get-signed-url");
if (!response.ok) {
throw new Error(`Failed to get signed url: ${response.statusText}`);
}
const { signedUrl } = await response.json();
return signedUrl;
};
const startConversation = useCallback(async () => {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
const signedUrl = await getSignedUrl();
// Start the conversation with your signed url
await conversation.startSession({
signedUrl,
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}, [conversation]);
// ... rest of the component ...
}
```
Signed URLs expire after a short period. However, any conversations initiated before expiration will continue uninterrupted. In a production environment, implement proper error handling and URL refresh logic for starting new conversations.
## Next Steps
Now that you have a basic implementation, you can:
1. Add visual feedback for voice activity
2. Implement error handling and retry logic
3. Add a chat history display
4. Customize the UI to match your brand
For more advanced features and customization options, check out the
[@11labs/react](https://www.npmjs.com/package/@11labs/react) package.
# Vite (Javascript)
Learn how to create a web application that enables voice conversations with ElevenLabs AI agents
This tutorial will guide you through creating a web client that can interact with a Conversational AI agent. You'll learn how to implement real-time voice conversations, allowing users to speak with an AI agent that can listen, understand, and respond naturally using voice synthesis.
Looking to build with React/Next.js? Check out our [Next.js
guide](/conversational-ai/guides/conversational-ai-guide-nextjs).
## What You'll Need
1. An ElevenLabs agent created following [this guide](/conversational-ai/docs/agent-setup)
2. `npm` installed on your local system
3. Basic knowledge of JavaScript
Looking for a complete example? Check out our [Vanilla JS demo on
GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai/javascript).
## Project Setup
Open a terminal and create a new directory for your project:
```bash
mkdir elevenlabs-conversational-ai
cd elevenlabs-conversational-ai
```
Initialize a new npm project and install the required packages:
```bash
npm init -y
npm install vite @11labs/client
```
Add this to your `package.json`:
```json package.json {4}
{
"scripts": {
...
"dev:frontend": "vite"
}
}
```
Create the following file structure:
```shell {2,3}
elevenlabs-conversational-ai/
├── index.html
├── script.js
├── package-lock.json
├── package.json
└── node_modules
```
## Implementing the Voice Chat Interface
In `index.html`, set up a simple user interface:
![Conversational AI HTML interface step](https://mintlify.s3-us-west-1.amazonaws.com/elevenlabs-docs/conversational-ai/images/convai-vite-1.png)
```html index.html
ElevenLabs Conversational AI
ElevenLabs Conversational AI
Status: Disconnected
Agent is listening
```
In `script.js`, implement the functionality:
```javascript script.js
import { Conversation } from '@11labs/client';
const startButton = document.getElementById('startButton');
const stopButton = document.getElementById('stopButton');
const connectionStatus = document.getElementById('connectionStatus');
const agentStatus = document.getElementById('agentStatus');
let conversation;
async function startConversation() {
try {
// Request microphone permission
await navigator.mediaDevices.getUserMedia({ audio: true });
// Start the conversation
conversation = await Conversation.startSession({
agentId: 'YOUR_AGENT_ID', // Replace with your agent ID
onConnect: () => {
connectionStatus.textContent = 'Connected';
startButton.disabled = true;
stopButton.disabled = false;
},
onDisconnect: () => {
connectionStatus.textContent = 'Disconnected';
startButton.disabled = false;
stopButton.disabled = true;
},
onError: (error) => {
console.error('Error:', error);
},
onModeChange: (mode) => {
agentStatus.textContent = mode.mode === 'speaking' ? 'speaking' : 'listening';
},
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}
async function stopConversation() {
if (conversation) {
await conversation.endSession();
conversation = null;
}
}
startButton.addEventListener('click', startConversation);
stopButton.addEventListener('click', stopConversation);
```
```shell
npm run dev:frontend
```
Make sure to replace `'YOUR_AGENT_ID'` with your actual agent ID from
ElevenLabs.
This authentication step is only required for private agents. If you're using a public agent, you can skip this section and directly use the `agentId` in the `startSession` call.
Create a `.env` file in your project root:
```env .env
ELEVENLABS_API_KEY=your-api-key-here
AGENT_ID=your-agent-id-here
```
Make sure to add `.env` to your `.gitignore` file to prevent accidentally committing sensitive credentials.
1. Install additional dependencies:
```bash
npm install express cors dotenv
```
2. Create a new folder called `backend`:
```shell {2}
elevenlabs-conversational-ai/
├── backend
...
```
```javascript backend/server.js
require("dotenv").config();
const express = require("express");
const cors = require("cors");
const app = express();
app.use(cors());
app.use(express.json());
const PORT = process.env.PORT || 3001;
app.get("/api/get-signed-url", async (req, res) => {
try {
const response = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=${process.env.AGENT_ID}`,
{
headers: {
"xi-api-key": process.env.ELEVENLABS_API_KEY,
},
}
);
if (!response.ok) {
throw new Error("Failed to get signed URL");
}
const data = await response.json();
res.json({ signedUrl: data.signed_url });
} catch (error) {
console.error("Error:", error);
res.status(500).json({ error: "Failed to generate signed URL" });
}
});
app.listen(PORT, () => {
console.log(`Server running on http://localhost:${PORT}`);
});
```
Modify your `script.js` to fetch and use the signed URL:
```javascript script.js {2-10,16,19,20}
// ... existing imports and variables ...
async function getSignedUrl() {
const response = await fetch('http://localhost:3001/api/get-signed-url');
if (!response.ok) {
throw new Error(`Failed to get signed url: ${response.statusText}`);
}
const { signedUrl } = await response.json();
return signedUrl;
}
async function startConversation() {
try {
await navigator.mediaDevices.getUserMedia({ audio: true });
const signedUrl = await getSignedUrl();
conversation = await Conversation.startSession({
signedUrl,
// agentId has been removed...
onConnect: () => {
connectionStatus.textContent = 'Connected';
startButton.disabled = true;
stopButton.disabled = false;
},
onDisconnect: () => {
connectionStatus.textContent = 'Disconnected';
startButton.disabled = false;
stopButton.disabled = true;
},
onError: (error) => {
console.error('Error:', error);
},
onModeChange: (mode) => {
agentStatus.textContent = mode.mode === 'speaking' ? 'speaking' : 'listening';
},
});
} catch (error) {
console.error('Failed to start conversation:', error);
}
}
// ... rest of the code ...
```
Signed URLs expire after a short period. However, any conversations initiated before expiration will continue uninterrupted. In a production environment, implement proper error handling and URL refresh logic for starting new conversations.
```json package.json {4,5}
{
"scripts": {
...
"dev:backend": "node backend/server.js",
"dev": "npm run dev:frontend & npm run dev:backend"
}
}
```
Start the application with:
```bash
npm run dev
```
## Next Steps
Now that you have a basic implementation, you can:
1. Add visual feedback for voice activity
2. Implement error handling and retry logic
3. Add a chat history display
4. Customize the UI to match your brand
For more advanced features and customization options, check out the
[@11labs/client](https://www.npmjs.com/package/@11labs/client) package.
# Twilio Integration
Guide for integrating Twilio Voice with Conversational AI.
## Overview
This guide explains how to set up a voice call integration between Twilio and Conversational AI.
The integration allows you to handle incoming phone calls and connect Conversational AI agent to phone calls directly.
## Prerequisites
* ElevenLabs API key
* Twilio account & phone number
* Python 3.7+
* ngrok for local development
## Elevenlabs Agent Configurations
We need to make sure audio encoding for Output and Input is set to "μ-law 8000 Hz". This is the audio encoding needed for
Twilio Voice API, default audio encoding is PCM 16000 Hz
### Set TTS Output Format
Navigate to your agent -> Go to Voice Section -> Select "μ-law 8000 Hz"
### Set Input Audio Format
Navigate to your agent -> Go to Advanced Section -> Select "μ-law 8000 Hz"
## Project Setup
First, install the required dependencies:
```shell
pip install fastapi uvicorn python-dotenv twilio elevenlabs websockets
```
Set up your environment variables by creating a `.env` file:
```shell
ELEVENLABS_API_KEY=your_elevenlabs_api_key
AGENT_ID=your_agent_id
```
Create the main server file (main.py):
```python
import json
import traceback
import os
from dotenv import load_dotenv
from fastapi import FastAPI, Request, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from twilio.twiml.voice_response import VoiceResponse, Connect
from elevenlabs import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from twilio_audio_interface import TwilioAudioInterface
# Load environment variables
load_dotenv()
# Initialize FastAPI app
app = FastAPI()
# Initialize ElevenLabs client
eleven_labs_client = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
ELEVEN_LABS_AGENT_ID = os.getenv("AGENT_ID")
@app.get("/")
async def root():
return {"message": "Twilio-ElevenLabs Integration Server"}
@app.api_route("/incoming-call-eleven", methods=["GET", "POST"])
async def handle_incoming_call(request: Request):
"""Handle incoming call and return TwiML response to connect to Media Stream."""
response = VoiceResponse()
host = request.url.hostname
connect = Connect()
connect.stream(
url=f"wss://{host}/media-stream-eleven",
)
response.append(connect)
return HTMLResponse(content=str(response), media_type="application/xml")
@app.websocket("/media-stream-eleven")
async def handle_media_stream(websocket: WebSocket):
"""Handle WebSocket connections for Eleven Labs integration"""
await websocket.accept()
print("WebSocket connection established")
audio_interface = TwilioAudioInterface(websocket)
conversation = None
try:
conversation = Conversation(
client=eleven_labs_client,
agent_id=ELEVEN_LABS_AGENT_ID,
requires_auth=False,
audio_interface=audio_interface,
callback_agent_response=lambda text: print(f"Agent said: {text}"),
callback_user_transcript=lambda text: print(f"User said: {text}"),
)
conversation.start_session()
print("Conversation session started")
async for message in websocket.iter_text():
if not message:
continue
try:
data = json.loads(message)
await audio_interface.handle_twilio_message(data)
except Exception as e:
print(f"Error processing message: {str(e)}")
traceback.print_exc()
except WebSocketDisconnect:
print("WebSocket disconnected")
finally:
if conversation:
print("Ending conversation session...")
conversation.end_session()
conversation.wait_for_session_end()
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
```
Create the Twilio audio interface (twilio\_audio\_interface.py):
```python
import asyncio
from typing import Callable
import queue
import threading
import base64
from elevenlabs.conversational_ai.conversation import AudioInterface
import websockets
class TwilioAudioInterface(AudioInterface):
def __init__(self, websocket):
self.websocket = websocket
self.output_queue = queue.Queue()
self.should_stop = threading.Event()
self.stream_sid = None
self.input_callback = None
self.output_thread = None
def start(self, input_callback: Callable[[bytes], None]):
"""Start the audio interface with the provided callback for input audio"""
self.input_callback = input_callback
self.output_thread = threading.Thread(target=self._output_thread)
self.output_thread.start()
def stop(self):
"""Stop the audio interface and clean up resources"""
self.should_stop.set()
if self.output_thread:
self.output_thread.join(timeout=5.0)
self.stream_sid = None
def output(self, audio: bytes):
"""Queue audio for output to Twilio
Audio should already be in mulaw 8kHz format from ElevenLabs"""
self.output_queue.put(audio)
def interrupt(self):
"""Clear the output queue to stop any audio"""
try:
while True:
_ = self.output_queue.get(block=False)
except queue.Empty:
pass
asyncio.run(self._send_clear_message_to_twilio())
def _output_thread(self):
"""Thread for handling audio output to Twilio"""
while not self.should_stop.is_set():
asyncio.run(self._send_audio_to_twilio())
async def _send_audio_to_twilio(self):
try:
audio = self.output_queue.get(timeout=0.2)
audio_payload = base64.b64encode(audio).decode("utf-8")
audio_delta = {
"event": "media",
"streamSid": self.stream_sid,
"media": {"payload": audio_payload},
}
await self.websocket.send_json(audio_delta)
except queue.Empty:
pass
except Exception as e:
print(f"Error sending audio: {e}")
async def _send_clear_message_to_twilio(self):
try:
clear_message = {"event": "clear", "streamSid": self.stream_sid}
await self.websocket.send_json(clear_message)
except Exception as e:
print(f"Error sending clear message to Twilio: {e}")
async def handle_twilio_message(self, data):
"""Handle incoming Twilio WebSocket messages."""
try:
if data["event"] == "start":
self.stream_sid = data["start"]["streamSid"]
print(f"Started stream with stream_sid: {self.stream_sid}")
if data["event"] == "media":
audio_data = base64.b64decode(data["media"]["payload"])
if self.input_callback:
self.input_callback(audio_data)
except websockets.exceptions.ConnectionClosed:
self.stop()
self.stream_sid = None
print("WebSocket closed, stopping audio processing")
except Exception as e:
print(f"Error in input_callback: {e}")
```
## Setting Up Twilio
Start your local server:
```shell
python main.py
```
Create a public URL using ngrok: `shell ngrok http 8003 ` Note down the
HTTPS URL provided by ngrok (e.g., [https://your-ngrok-url.ngrok.app](https://your-ngrok-url.ngrok.app))
Configure your Twilio phone number:
1. Go to the Twilio Console
2. Navigate to Phone Numbers → Manage → Active numbers
3. Select your phone number
4. Under "Voice Configuration", set the webhook for incoming calls to:
`https://your-ngrok-url.ngrok.app/incoming-call-eleven`
5. Make sure the HTTP method is set to POST
## Testing the Integration
1. Call your Twilio phone number
2. You should see console output indicating:
* WebSocket connection established
* Stream SID assigned
* Conversation session started
3. Speak into the phone - you should see transcripts of your speech and the agent's responses in the console
## Troubleshooting
### Common Issues
1. **WebSocket Connection Fails**
* Verify your ngrok URL is correct in the Twilio webhook settings
* Check that your server is running and accessible
2. **No Audio Output**
* Ensure your ElevenLabs API key is correct
* Verify the AGENT\_ID is properly configured
3. **Audio Quality Issues**
* The integration uses mulaw 8kHz format as required by Twilio
* Check your network connectivity and latency
### Debug Logging
To enable detailed logging, add these lines to your main.py:
```python
import logging
logging.basicConfig(level=logging.DEBUG)
```
## Security Considerations
1. Always use environment variables for sensitive information
2. Implement proper authentication for your endpoints
3. Use HTTPS for all communications
4. Regularly rotate API keys
5. Monitor usage to prevent abuse
# JavaScript SDK
Conversational AI SDK: deploy customized, interactive voice agents in minutes.
Also see the [Conversational AI
overview](/conversational-ai/docs/introduction)
## Installation
Install the package in your project through package manager.
```shell
npm install @11labs/client
# or
yarn add @11labs/client
# or
pnpm install @11labs/client
```
## Usage
This library is primarily meant for development in vanilla JavaScript projects, or as a base for libraries tailored to specific frameworks.
It is recommended to check whether your specific framework has it's own library.
However, you can use this library in any JavaScript-based project.
### Initialize conversation
First, initialize the Conversation instance:
```js
const conversation = await Conversation.startSession(options);
```
This will kick off the websocket connection and start using microphone to communicate with the ElevenLabs Conversational AI agent. Consider explaining and allowing microphone access in your apps UI before the Conversation kicks off:
```js
// call after explaning to the user why the microphone access is needed
await navigator.mediaDevices.getUserMedia();
```
#### Session configuration
The options passed to `startSession` specifiy how the session is established. There are two ways to start a session:
##### Using Agent ID
Agent ID can be acquired through [ElevenLabs UI](https://elevenlabs.io/app/conversational-ai).
For public agents, you can use the ID directly:
```js
const conversation = await Conversation.startSession({
agentId: "",
});
```
##### Using a signed URL
If the conversation requires authorization, you will need to add a dedicated endpoint to your server that
will request a signed url using the [ElevenLabs API](https://elevenlabs.io/docs/introduction) and pass it back to the client.
Here's an example of how it could be set up:
```js
// Node.js server
app.get("/signed-url", yourAuthMiddleware, async (req, res) => {
const response = await fetch(
`https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=${process.env.AGENT_ID}`,
{
method: "GET",
headers: {
// Requesting a signed url requires your ElevenLabs API key
// Do NOT expose your API key to the client!
"xi-api-key": process.env.XI_API_KEY,
},
}
);
if (!response.ok) {
return res.status(500).send("Failed to get signed URL");
}
const body = await response.json();
res.send(body.signed_url);
});
```
```js
// Client
const response = await fetch("/signed-url", yourAuthHeaders);
const signedUrl = await response.text();
const conversation = await Conversation.startSession({ signedUrl });
```
#### Optional callbacks
The options passed to `startSession` can also be used to register optional callbacks:
* **onConnect** - handler called when the conversation websocket connection is established.
* **onDisconnect** - handler called when the conversation websocket connection is ended.
* **onMessage** - handler called when a new text message is received. These can be tentative or final transcriptions of user voice, replies produced by LLM. Primarily used for handling conversation transcription.
* **onError** - handler called when an error is encountered.
* **onStatusChange** - handler called whenever connection status changes. Can be `connected`, `connecting` and `disconnected` (initial).
* **onModeChange** - handler called when a status changes, eg. agent switches from `speaking` to `listening`, or the other way around.
#### Return value
`startSession` returns a `Conversation` instance that can be used to control the session. The method will throw an error if the session cannot be established. This can happen if the user denies microphone access, or if the websocket connection
fails.
##### endSession
A method to manually end the conversation. The method will end the conversation and disconnect from websocket.
Afterwards the conversation instance will be unusable and can be safely discarded.
```js
await conversation.endSession();
```
##### getId
A method returning the conversation ID.
```js
const id = conversation.geId();
```
##### setVolume
A method to set the output volume of the conversation. Accepts object with volume field between 0 and 1.
```js
await conversation.setVolume({ volume: 0.5 });
```
##### getInputVolume / getOutputVolume
Methods that return the current input/output volume on a scale from `0` to `1` where `0` is -100 dB and `1` is -30 dB.
```js
const inputVolume = await conversation.getInputVolume();
const outputVolume = await conversation.getOutputVolume();
```
##### getInputByteFrequencyData / getOutputByteFrequencyData
Methods that return `Uint8Array`s containg the current input/output frequency data. See [AnalyserNode.getByteFrequencyData](https://developer.mozilla.org/en-US/docs/Web/API/AnalyserNode/getByteFrequencyData) for more information.
# Python SDK
Conversational AI SDK: deploy customized, interactive voice agents in minutes.
Also see the [Conversational AI
overview](/conversational-ai/docs/introduction)
## Installation
Install the `elevenlabs` Python package in your project:
```shell
pip install elevenlabs
# or
poetry add elevenlabs
```
If you want to use the default implementation of audio input/output you will also need the `pyaudio` extra:
```shell
pip install "elevenlabs[pyaudio]"
# or
poetry add "elevenlabs[pyaudio]"
```
The `pyaudio` package installation might require additional system dependencies.
See [PyAudio package README](https://pypi.org/project/PyAudio/) for more information.
On Debian-based systems you can install the dependencies with:
```shell
sudo apt install portaudio19
```
On macOS with Homebrew you can install the dependencies with:
```shell
brew install portaudio
```
## Usage
In this example we will create a simple script that runs a conversation with the ElevenLabs Conversational AI agent.
You can find the full code in the [ElevenLabs examples repository](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai-python).
First import the necessary dependencies:
```python
import os
import signal
from elevenlabs.client import ElevenLabs
from elevenlabs.conversational_ai.conversation import Conversation
from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface
```
Next load the agent ID and API key from environment variables:
```python
agent_id = os.getenv("AGENT_ID")
api_key = os.getenv("ELEVENLABS_API_KEY")
```
The API key is only required for non-public agents that have authentication enabled.
You don't have to set it for public agents and the code will work fine without it.
Then create the `ElevenLabs` client instance:
```python
client = ElevenLabs(api_key=API_KEY)
```
Now we initialize the `Conversation` instance:
```python
conversation = Conversation(
# API client and agent ID.
client,
AGENT_ID,
# Assume auth is required when API_KEY is set.
requires_auth=bool(API_KEY),
# Use the default audio interface.
audio_interface=DefaultAudioInterface(),
# Simple callbacks that print the conversation to the console.
callback_agent_response=lambda response: print(f"Agent: {response}"),
callback_agent_response_correction=lambda original, corrected: print(f"Agent: {original} -> {corrected}"),
callback_user_transcript=lambda transcript: print(f"User: {transcript}"),
# Uncomment if you want to see latency measurements.
# callback_latency_measurement=lambda latency: print(f"Latency: {latency}ms"),
)
```
We are using the `DefaultAudioInterface` which uses the default system audio input/output devices for the conversation.
You can also implement your own audio interface by subclassing `elevenlabs.conversational_ai.conversation.AudioInterface`.
Now we can start the conversation:
```python
conversation.start_session()
```
To get a clean shutdown when the user presses `Ctrl+C` we can add a signal handler which will call `end_session()`:
```python
signal.signal(signal.SIGINT, lambda sig, frame: conversation.end_session())
```
And lastly we wait for the conversation to end and print out the conversation ID (which can be used for reviewing the conversation history and debugging):
```python
conversation_id = conversation.wait_for_session_end()
print(f"Conversation ID: {conversation_id}")
```
All that is left is to run the script and start talking to the agent:
```shell
# For public agents:
AGENT_ID=youragentid python demo.py
# For private agents:
AGENT_ID=youragentid ELEVENLABS_API_KEY=yourapikey python demo.py
```
# React SDK
Conversational AI SDK: deploy customized, interactive voice agents in minutes.
Also see the [Conversational AI
overview](/conversational-ai/docs/introduction)
## Installation
Install the package in your project through package manager.
```shell
npm install @11labs/react
# or
yarn add @11labs/react
# or
pnpm install @11labs/react
```
## Usage
### useConversation
React hook for managing websocket connection and audio usage for ElevenLabs Conversational AI.
#### Initialize conversation
First, initialize the Conversation instance.
```tsx
const conversation = useConversation();
```
Note that Conversational AI requires microphone access.
Consider explaining and allowing access in your apps UI before the Conversation kicks off.
```js
// call after explaning to the user why the microphone access is needed
await navigator.mediaDevices.getUserMedia();
```
#### Options
The Conversation can be initialized with certain options. Those are all optional.
```tsx
const conversation = useConversation({
/* options object */
});
```
* **onConnect** - handler called when the conversation websocket connection is established.
* **onDisconnect** - handler called when the conversation websocket connection is ended.
* **onMessage** - handler called when a new message is received. These can be tentative or final transcriptions of user voice, replies produced by LLM, or debug message when a debug option is enabled.
* **onError** - handler called when a error is encountered.
#### Methods
##### startConversation
`startConversation` method kick off the websocket connection and starts using microphone to communicate with the ElevenLabs Conversational AI agent.\
The method accepts options object, with the `url` or `agentId` option being required.
Agent ID can be acquired through [ElevenLabs UI](https://elevenlabs.io/app/conversational-ai) and is always necessary.
```js
const conversation = useConversation();
const conversationId = await conversation.startSession({ url });
```
For the public agents, define `agentId` - no signed link generation necessary.
In case the conversation requires authorization, use the REST API to generate signed links. Use the signed link as a `url` parameter.
`startSession` returns promise resolving to `conversationId`. The value is a globally unique conversation ID you can use to identify separate conversations.
```js
// your server
const requestHeaders: HeadersInit = new Headers();
requestHeaders.set("xi-api-key", process.env.XI_API_KEY); // use your ElevenLabs API key
const response = await fetch(
"https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id={{agent id created through ElevenLabs UI}}",
{
method: "GET",
headers: requestHeaders,
}
);
if (!response.ok) {
return Response.error();
}
const body = await response.json();
const url = body.signed_url; // use this URL for startConversation method.
```
##### endSession
A method to manually end the conversation. The method will end the conversation and disconnect from websocket.
```js
await conversation.endSession();
```
##### setVolume
A method to set the output volume of the conversation. Accepts object with volume field between 0 and 1.
```js
await conversation.setVolume({ volume: 0.5 });
```
##### status
A React state containing the current status of the conversation.
```js
const { status } = useConversation();
console.log(status); // "connected" or "disconnected"
```
##### isSpeaking
A React state containing the information of whether the agent is currently speaking.
This is helpful for indicating the mode in your UI.
```js
const { isSpeaking } = useConversation();
console.log(isSpeaking); // boolean
```
# iOS SDK
Conversational AI SDK: deploy customized, interactive voice agents in your Swift applications.
Also see the [Conversational AI
overview](/conversational-ai/docs/introduction)
## Installation
Add the ElevenLabs Swift SDK to your project using Swift Package Manager:
<>
1. Open your project in Xcode
2. Go to `File` > `Add Packages...`
3. Enter the repository URL: `https://github.com/elevenlabs/ElevenLabsSwift`
4. Select your desired version
>
<>
```swift
import ElevenLabsSDK
```
>
Ensure you add `NSMicrophoneUsageDescription` to your Info.plist to explain
microphone access to users.
## Usage
This library is primarily designed for Conversational AI integration in Swift applications. Please use an alternative dependency for other features, such as speech synthesis.
### Initialize Conversation
First, create a session configuration and set up the necessary callbacks:
```swift
// Configure the session
let config = ElevenLabsSDK.SessionConfig(agentId: "your-agent-id")
// Set up callbacks
var callbacks = ElevenLabsSDK.Callbacks()
callbacks.onConnect = { conversationId in
print("Connected with ID: \(conversationId)")
}
callbacks.onDisconnect = {
print("Disconnected")
}
callbacks.onMessage = { message, role in
print("\(role.rawValue): \(message)")
}
callbacks.onError = { error, info in
print("Error: \(error), Info: \(String(describing: info))")
}
callbacks.onStatusChange = { status in
print("Status changed to: \(status.rawValue)")
}
callbacks.onModeChange = { mode in
print("Mode changed to: \(mode.rawValue)")
}
callbacks.onVolumeUpdate = { volume in
print("Volume updated: \(volume)")
}
```
### Session Configuration
There are two ways to initialize a session:
You can obtain an Agent ID through the [ElevenLabs UI](https://elevenlabs.io/app/conversational-ai):
```swift
let config = ElevenLabsSDK.SessionConfig(agentId: "")
```
For conversations requiring authorization, implement a server endpoint that requests a signed URL:
```swift
// Swift example using URLSession
func getSignedUrl() async throws -> String {
let url = URL(string: "https://api.elevenlabs.io/v1/convai/conversation/get_signed_url")!
var request = URLRequest(url: url)
request.setValue("YOUR-API-KEY", forHTTPHeaderField: "xi-api-key")
let (data, _) = try await URLSession.shared.data(for: request)
let response = try JSONDecoder().decode(SignedUrlResponse.self, from: data)
return response.signedUrl
}
// Use the signed URL
let signedUrl = try await getSignedUrl()
let config = ElevenLabsSDK.SessionConfig(signedUrl: signedUrl)
```
### Starting the Conversation
Initialize the conversation session asynchronously:
```swift
Task {
do {
let conversation = try await ElevenLabsSDK.Conversation.startSession(
config: config,
callbacks: callbacks
)
// Use the conversation instance
} catch {
print("Failed to start conversation: \(error)")
}
}
```
### Audio Sample Rates
The ElevenLabs SDK currently uses a default input sample rate of `16,000 Hz`. However, the output sample rate is configurable based on the agent's settings. Ensure that the output sample rate aligns with your specific application's audio requirements for smooth interaction.
The SDK does not currently support ulaw format for audio encoding. For compatibility, consider using alternative formats.
### Managing the Session
```swift:End Session
// Starts the session
conversation.startSession()
// Ends the session
conversation.endSession()
```
```swift:Recording Controls
// Start recording
conversation.startRecording()
// Stop recording
conversation.stopRecording()
```
### Example Implementation
For a full, working example, check out the [example application on GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/conversational-ai-swift-example).
Here's an example SwiftUI view implementing the conversation interface:
```swift
struct ConversationalAIView: View {
@State private var conversation: ElevenLabsSDK.Conversation?
@State private var mode: ElevenLabsSDK.Mode = .listening
@State private var status: ElevenLabsSDK.Status = .disconnected
@State private var audioLevel: Float = 0.0
private func startConversation() {
Task {
do {
let config = ElevenLabsSDK.SessionConfig(agentId: "your-agent-id")
var callbacks = ElevenLabsSDK.Callbacks()
callbacks.onConnect = { conversationId in
status = .connected
}
callbacks.onDisconnect = {
status = .disconnected
}
callbacks.onModeChange = { newMode in
DispatchQueue.main.async {
mode = newMode
}
}
callbacks.onVolumeUpdate = { newVolume in
DispatchQueue.main.async {
audioLevel = newVolume
}
}
conversation = try await ElevenLabsSDK.Conversation.startSession(
config: config,
callbacks: callbacks
)
} catch {
print("Failed to start conversation: \(error)")
}
}
}
var body: some View {
VStack {
// Your UI implementation
Button(action: startConversation) {
Text(status == .connected ? "End Call" : "Start Call")
}
}
}
}
```
This SDK is currently experimental and under active development. While it's
stable enough for testing and development, it's not recommended for production
use yet.
# Project Examples
# How to dub video and audio with ElevenLabs
Learn how to automate the dubbing of audio and video files into various languages using the ElevenLabs API
## Introduction
Dubbing videos and audio files from one language to another can be a great way to reach a wider audience. The ElevenLabs API provides a convenient way to automatically dub media files using state-of-the-art technology. In this guide, we will walk you through how to upload a video or audio file, dub it, and download the translated video. We'll also discuss how to directly dub a link such as a YouTube, TikTok, or Twitter video.
If you're looking to jump straight into the action, the complete code is available on the following repos:
* [Python example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/dubbing/python).
On the 8th of May 2024 we launched the Dubbing API for all ElevenLabs tiers
## How to upload and dub a video or audio file
### Requirements
Before proceeding, please ensure that you have the following:
* An ElevenLabs account with an API key (here’s how to [find your API key](/api-reference/text-to-speech#authentication)).
* Python or Node.js installed on your machine
Then, install the ElevenLabs SDK as shown below
```bash Python
pip install elevenlabs
```
Install the necessary packages to manage your environmental variables:
```bash Python
pip install python-dotenv
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
### Start the dubbing
First we want to send the file to the ElevenLabs dubbing service
```python Python
def create_dub_from_file(
input_file_path: str,
file_format: str,
source_language: str,
target_language: str,
) -> Optional[str]:
"""
Dubs an audio or video file from one language to another and saves the output.
Args:
input_file_path (str): The file path of the audio or video to dub.
file_format (str): The file format of the input file.
source_language (str): The language of the input file.
target_language (str): The target language to dub into.
Returns:
Optional[str]: The file path of the dubbed file or None if operation failed.
"""
if not os.path.isfile(input_file_path):
raise FileNotFoundError(f"The input file does not exist: {input_file_path}")
with open(input_file_path, "rb") as audio_file:
response = client.dubbing.dub_a_video_or_an_audio_file(
file=(os.path.basename(input_file_path), audio_file, file_format), # Optional file
target_lang=target_language, # The target language to dub the content into. Can be none if dubbing studio editor is enabled and running manual mode
mode="automatic", # automatic or manual.
source_lang=source_language, # Source language
num_speakers=1, # Number of speakers to use for the dubbing.
watermark=False, # Whether to apply watermark to the output video.
)
# rest of the code
```
### Check for completion
The `wait_for_dubbing_completion()` function within the `dubbing_utils.py` file polls the API to check whether the dubbing process is complete. If completed, it proceeds to the next step; otherwise, it reports the status or failure.
```python Python
def wait_for_dubbing_completion(dubbing_id: str) -> bool:
"""
Waits for the dubbing process to complete by periodically checking the status.
Args:
dubbing_id (str): The dubbing project id.
Returns:
bool: True if the dubbing is successful, False otherwise.
"""
MAX_ATTEMPTS = 120
CHECK_INTERVAL = 10 # In seconds
for _ in range(MAX_ATTEMPTS):
metadata = client.dubbing.get_dubbing_project_metadata(dubbing_id)
if metadata.status == "dubbed":
return True
elif metadata.status == "dubbing":
print(
"Dubbing in progress... Will check status again in",
CHECK_INTERVAL,
"seconds.",
)
time.sleep(CHECK_INTERVAL)
else:
print("Dubbing failed:", metadata.error_message)
return False
print("Dubbing timed out")
return False
```
### Save the video locally
Upon completion of dubbing, the `download_dubbed_file()` function in `dubbing_utils.py` will save the dubbed file to a local directory, typically under the `data/{dubbing_id}/{language_code}.mp4`.
```python Python
def download_dubbed_file(dubbing_id: str, language_code: str) -> str:
"""
Downloads the dubbed file for a given dubbing ID and language code.
Args:
dubbing_id: The ID of the dubbing project.
language_code: The language code for the dubbing.
Returns:
The file path to the downloaded dubbed file.
"""
dir_path = f"data/{dubbing_id}"
os.makedirs(dir_path, exist_ok=True)
file_path = f"{dir_path}/{language_code}.mp4"
with open(file_path, "wb") as file:
for chunk in client.dubbing.get_dubbed_file(dubbing_id, language_code):
file.write(chunk)
return file_path
```
### Putting it together
We add the `wait_for_dubbing_completion`(`waitForDubbingCompletion`) function and the `download_dubbed_file`(`downloadDubbedFile`) function together to create the final function.
```python Python
def create_dub_from_file(
input_file_path: str,
file_format: str,
source_language: str,
target_language: str,
) -> Optional[str]:
"""
Dubs an audio or video file from one language to another and saves the output.
Args:
input_file_path (str): The file path of the audio or video to dub.
file_format (str): The file format of the input file.
source_language (str): The language of the input file.
target_language (str): The target language to dub into.
Returns:
Optional[str]: The file path of the dubbed file or None if operation failed.
"""
if not os.path.isfile(input_file_path):
raise FileNotFoundError(f"The input file does not exist: {input_file_path}")
with open(input_file_path, "rb") as audio_file:
response = client.dubbing.dub_a_video_or_an_audio_file(
file=(os.path.basename(input_file_path), audio_file, file_format),
target_lang=target_language,
mode="automatic",
source_lang=source_language,
num_speakers=1,
watermark=False, # reduces the characters used if enabled, only works for videos not audio
)
dubbing_id = response.dubbing_id
if wait_for_dubbing_completion(dubbing_id):
output_file_path = download_dubbed_file(dubbing_id, target_language)
return output_file_path
else:
return None
```
We then use the final the function as shown below.
```python create_a_dub_from_file.py (Python)
if __name__ == "__main__":
result = create_dub_from_file(
"../example_speech.mp3", # Input file path
"audio/mpeg", # File format
"en", # Source language
"es", # Target language
)
if result:
print("Dubbing was successful! File saved at:", result)
else:
print("Dubbing failed or timed out.")
```
## How to dub a video from YouTube, TikTok, Twitter or Vimeo
For dubbing web-based content, instead of uploading a file you can pass in a URL. This supports popular platforms like YouTube, TikTok, Twitter, and Vimeo.
```python Python
def create_dub_from_url(
source_url: str,
source_language: str,
target_language: str,
) -> Optional[str]:
"""
Downloads a video from a URL, and creates a dubbed version in the target language.
Args:
source_url (str): The URL of the source video to dub. Can be a YouTube link, TikTok, X (Twitter) or a Vimeo link.
source_language (str): The language of the source video.
target_language (str): The target language to dub into.
Returns:
Optional[str]: The file path of the dubbed file or None if operation failed.
"""
response = client.dubbing.dub_a_video_or_an_audio_file(
source_url=source_url, # URL of the source video/audio file.
target_lang=target_language, # The Target language to dub the content into. Can be none if dubbing studio editor is enabled and running manual mode
mode="automatic", # automatic or manual.
source_lang=source_language, # Source language.
num_speakers=1, # Number of speakers to use for the dubbing.
watermark=True, # Whether to apply watermark to the output video.
)
dubbing_id = response.dubbing_id
if wait_for_dubbing_completion(dubbing_id):
output_file_path = download_dubbed_file(dubbing_id, target_language)
return output_file_path
else:
return None
```
You can then call the function as shown below.
```python Python
if __name__ == "__main__":
source_url = "https://www.youtube.com/watch?v=0EqSXDwTq6U" # Charlie bit my finger
source_language = "en"
target_language = "fr"
result = create_dub_from_url(source_url, source_language, target_language)
if result:
print("Dubbing was successful! File saved at:", result)
else:
print("Dubbing failed or timed out.")
```
## Conclusion
With this guide and the accompanying code structure, you now have a basic setup for dubbing audio and video content using the ElevenLabs API. Whether you're working with local files or content from URLs, you can create multilingual versions of your media to cater to diverse audiences.
Remember to always follow the best practices when dealing with API keys and sensitive data, and consult the ElevenLabs API documentation for more advanced features and options. Happy dubbing!
For additional information on dubbing capabilities, translation services, and available languages, please refer to the [ElevenLabs API documentation](https://elevenlabs.docs.buildwithfern.com/docs/developers/api-reference/dubbing/dub-a-video-or-an-audio-file).
Should you encounter any issues or have questions, our [GitHub Issues page](https://github.com/elevenlabs/elevenlabs-docs/issues) is open for your queries and feedback.
## List of supported languages for dubbing
| No | Language Name | Language Code |
| -- | ------------- | ------------- |
| 1 | English | en |
| 2 | Hindi | hi |
| 3 | Portuguese | pt |
| 4 | Chinese | zh |
| 5 | Spanish | es |
| 6 | French | fr |
| 7 | German | de |
| 8 | Japanese | ja |
| 9 | Arabic | ar |
| 10 | Russian | ru |
| 11 | Korean | ko |
| 12 | Indonesian | id |
| 13 | Italian | it |
| 14 | Dutch | nl |
| 15 | Turkish | tr |
| 16 | Polish | pl |
| 17 | Swedish | sv |
| 18 | Filipino | fil |
| 19 | Malay | ms |
| 20 | Romanian | ro |
| 21 | Ukrainian | uk |
| 22 | Greek | el |
| 23 | Czech | cs |
| 24 | Danish | da |
| 25 | Finnish | fi |
| 26 | Bulgarian | bg |
| 27 | Croatian | hr |
| 28 | Slovak | sk |
| 29 | Tamil | ta |
# How to Use Pronunciation Dictionaries
How to add, view, and remove rules to pronunciation dictionaries with the Python SDK
In this tutorial, you'll learn how to use a pronunciation dictionary with the ElevenLabs Python SDK. Pronunciation dictionaries are useful for controlling the specific pronunciation of words. We support both [IPA](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet) and [CMU](https://en.wikipedia.org/wiki/CMU_Pronouncing_Dictionary) alphabets. It is useful for correcting rare or specific pronunciations, such as names or companies. For example, the word `nginx` could be pronounced incorrectly. Instead, we can add our version of pronunciation. Based on IPA, `nginx` is pronounced as `/ˈɛndʒɪnˈɛks/`. Finding IPA or CMU of words manually can be difficult. Instead, LLMs like ChatGPT can help you to make the search easier.
We'll start by adding rules to the pronunciation dictionary from a file and comparing the text-to-speech results that use and do not use the dictionary. After that, we'll discuss how to add and remove specific rules to existing dictionaries.
If you want to jump straight to the finished repo you can find it [here](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/pronunciation-dictionaries/python)
Alias tags are supported by all models excluding `eleven_turbo_v2_5`. Phoneme
tags only work with the models `eleven_turbo_v2` and `eleven_monolingual_v1`.
If you use phoneme tags with other models, they will silently skip the word.
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/api-reference/text-to-speech#authentication)).
* Python installed on your machine
* FFMPEG to play audio
## Setup
### Installing our SDK
Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the updating pronunciation dictionary and using text-to-speech conversion. You can install it using pip:
```bash
pip install elevenlabs
```
Additionally, install `python-dotenv` to manage your environmental variables:
```bash
pip install python-dotenv
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Initiate the Client SDK
We'll start by initializing the client SDK.
```python
import os
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
```
## Create a Pronunciation Dictionary From a File
To create a pronunciation dictionary from a File, we'll create a `.pls` file for our rules.
This rule will use the "IPA" alphabet and update the pronunciation for `tomato` and `Tomato` with a different pronunciation. PLS files are case sensitive which is why we include it both with and without a capital "T". Save it as `dictionary.pls`.
```xml filename="dictionary.pls"
tomato/tə'meɪtoʊ/Tomato/tə'meɪtoʊ/
```
In the following snippet, we start by adding rules from a file and get the uploaded result. Finally, we generate and play two different text-to-speech audio to compare the custom pronunciation dictionary.
```python
import requests
from elevenlabs import play, PronunciationDictionaryVersionLocator
with open("dictionary.pls", "rb") as f:
# this dictionary changes how tomato is pronounced
pronunciation_dictionary = client.pronunciation_dictionary.add_from_file(
file=f.read(), name="example"
)
audio_1 = client.generate(
text="Without the dictionary: tomato",
voice="Rachel",
model="eleven_turbo_v2",
)
audio_2 = client.generate(
text="With the dictionary: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary.id,
version_id=pronunciation_dictionary.version_id,
)
],
)
# play the audio
play(audio_1)
play(audio_2)
```
## Remove Rules From a Pronunciation Dictionary
To remove rules from a pronunciation dictionary, we can simply call `remove_rules_from_the_pronunciation_dictionary` method in the pronunciation dictionary module. In the following snippet, we start by removing rules based on the rule string and get the updated result. Finally, we generate and play another text-to-speech audio to test the difference. In the example, we take pronunciation dictionary version id from `remove_rules_from_the_pronunciation_dictionary` response because every changes to pronunciation dictionary will create a new version, so we need to use the latest version returned from the response. The old version also still available.
```python
pronunciation_dictionary_rules_removed = (
client.pronunciation_dictionary.remove_rules_from_the_pronunciation_dictionary(
pronunciation_dictionary_id=pronunciation_dictionary.id,
rule_strings=["tomato", "Tomato"],
)
)
audio_3 = client.generate(
text="With the rule removed: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
version_id=pronunciation_dictionary_rules_removed.version_id,
)
],
)
play(audio_3)
```
## Add Rules to Pronunciation Dictionary
We can add rules directly to the pronunciation dictionary with `PronunciationDictionaryRule_Phoneme` class and call `add_rules_to_the_pronunciation_dictionary` from the pronunciation dictionary. The snippet will demonstrate adding rules with the class and get the updated result. Finally, we generate and play another text-to-speech audio to test the difference. This example also use pronunciation dictionary version returned from `add_rules_to_the_pronunciation_dictionary` to ensure we use the latest dictionary version.
```python
from elevenlabs import PronunciationDictionaryRule_Phoneme
pronunciation_dictionary_rules_added = client.pronunciation_dictionary.add_rules_to_the_pronunciation_dictionary(
pronunciation_dictionary_id=pronunciation_dictionary_rules_removed.id,
rules=[
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="tomato",
phoneme="/tə'meɪtoʊ/",
),
PronunciationDictionaryRule_Phoneme(
type="phoneme",
alphabet="ipa",
string_to_replace="Tomato",
phoneme="/tə'meɪtoʊ/",
),
],
)
audio_4 = client.generate(
text="With the rule added again: tomato",
voice="Rachel",
model="eleven_turbo_v2",
pronunciation_dictionary_locators=[
PronunciationDictionaryVersionLocator(
pronunciation_dictionary_id=pronunciation_dictionary_rules_added.id,
version_id=pronunciation_dictionary_rules_added.version_id,
)
],
)
play(audio_4)
```
## Conclusion
You know how to use a pronunciation dictionary for generating text-to-speech audio. These functionailities open up opportunities to generate text-to-speech audio based on your pronunciation dictionary, making it more flexible for your use case.
For more details, visit our [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/pronunciation-dictionaries/python) to see the full project files which give a clear structure for setting up your application:
* `env.example`: Template for your environment variables.
* `main.py`: The complete code for snippets above.
* `dictionary.pls`: Custom dictionary example with XML format.
* `requirements.txt`: List of python package used for this example.
If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).
# Combine Multiple Generations
Learn how to keep your voice stable across multiple generations
## What is Request Stitching?
When one has a large text to convert into audio and sends the text in chunks without further context there can be abrupt changes in prosody from one chunk to another.
It would be much better to give the model context on what was already generated and what will be generated in the future, this is exactly what Request Stitching does.
As you can see below the difference between not using Request Stitching and using it is subtle but noticeable:
#### Without Request Stitching:
#### With Request Stitching:
## Conditioning on text
We will use Pydub for concatenating multiple audios together, you can install it using:
```bash
pip install pydub
```
One of the two ways on how to give the model context is to provide the text before and / or after the current chunk by using the 'previous\_text' and 'next\_text' parameters:
```python
import os
import requests
from pydub import AudioSegment
import io
YOUR_XI_API_KEY = ""
VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Rachel
PARAGRAPHS = [
"The advent of technology has transformed countless sectors, with education "
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way "
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, "
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
]
segments = []
for i, paragraph in enumerate(PARAGRAPHS):
is_last_paragraph = i == len(PARAGRAPHS) - 1
is_first_paragraph = i == 0
response = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream",
json={
"text": paragraph,
"model_id": "eleven_multilingual_v2",
"previous_text": None if is_first_paragraph else " ".join(PARAGRAPHS[:i]),
"next_text": None if is_last_paragraph else " ".join(PARAGRAPHS[i + 1:])
},
headers={"xi-api-key": YOUR_XI_API_KEY},
)
if response.status_code != 200:
print(f"Error encountered, status: {response.status_code}, "
f"content: {response.text}")
quit()
print(f"Successfully converted paragraph {i + 1}/{len(PARAGRAPHS)}")
segments.append(AudioSegment.from_mp3(io.BytesIO(response.content)))
segment = segments[0]
for new_segment in segments[1:]:
segment = segment + new_segment
audio_out_path = os.path.join(os.getcwd(), "with_text_conditioning.wav")
segment.export(audio_out_path, format="wav")
print(f"Success! Wrote audio to {audio_out_path}")
```
## Conditioning on past generations
Text conditioning works well when there has been no previous or next chunks generated yet. If there have been however, it works much better to provide the actual past generations to the model instead of just the text.
This is done by using the previous\_request\_ids and next\_request\_ids parameters.
Every text-to-speech request has an associated request-id which is obtained by reading from the response header. Below is an example on how to use this request\_id in order to condition requests on the previous generations.
```python
import os
import requests
from pydub import AudioSegment
import io
YOUR_XI_API_KEY = ""
VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Rachel
PARAGRAPHS = [
"The advent of technology has transformed countless sectors, with education "
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way "
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, "
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
]
segments = []
previous_request_ids = []
for i, paragraph in enumerate(PARAGRAPHS):
response = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream",
json={
"text": paragraph,
"model_id": "eleven_multilingual_v2",
# A maximum of three next or previous history item ids can be send
"previous_request_ids": previous_request_ids[-3:],
},
headers={"xi-api-key": YOUR_XI_API_KEY},
)
if response.status_code != 200:
print(f"Error encountered, status: {response.status_code}, "
f"content: {response.text}")
quit()
print(f"Successfully converted paragraph {i + 1}/{len(PARAGRAPHS)}")
previous_request_ids.append(response.headers["request-id"])
segments.append(AudioSegment.from_mp3(io.BytesIO(response.content)))
segment = segments[0]
for new_segment in segments[1:]:
segment = segment + new_segment
audio_out_path = os.path.join(os.getcwd(), "with_previous_request_ids_conditioning.wav")
segment.export(audio_out_path, format="wav")
print(f"Success! Wrote audio to {audio_out_path}")
```
Note that the order matters here: When one converts a text split into 5
chunks and has already converted chunks 1, 2, 4 and 5 and now wants to convert
chunk 3 the previous\_request\_ids one neeeds to send would be
\[request\_id\_chunk\_1, request\_id\_chunk\_2] and the next\_request\_ids would be
\[request\_id\_chunk\_4, request\_id\_chunk\_5].
## Conditioning both on text and past generations
The best possible results are achieved when conditioning both on text and past generations so lets combine the two by providing previous\_text, next\_text and previous\_request\_ids in one request:
```python
import os
import requests
from pydub import AudioSegment
import io
YOUR_XI_API_KEY = ""
VOICE_ID = "21m00Tcm4TlvDq8ikWAM" # Rachel
PARAGRAPHS = [
"The advent of technology has transformed countless sectors, with education "
"standing out as one of the most significantly impacted fields.",
"In recent years, educational technology, or EdTech, has revolutionized the way "
"teachers deliver instruction and students absorb information.",
"From interactive whiteboards to individual tablets loaded with educational software, "
"technology has opened up new avenues for learning that were previously unimaginable.",
"One of the primary benefits of technology in education is the accessibility it provides.",
]
segments = []
previous_request_ids = []
for i, paragraph in enumerate(PARAGRAPHS):
is_first_paragraph = i == 0
is_last_paragraph = i == len(PARAGRAPHS) - 1
response = requests.post(
f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream",
json={
"text": paragraph,
"model_id": "eleven_multilingual_v2",
# A maximum of three next or previous history item ids can be send
"previous_request_ids": previous_request_ids[-3:],
"previous_text": None if is_first_paragraph else " ".join(PARAGRAPHS[:i]),
"next_text": None if is_last_paragraph else " ".join(PARAGRAPHS[i + 1:])
},
headers={"xi-api-key": YOUR_XI_API_KEY},
)
if response.status_code != 200:
print(f"Error encountered, status: {response.status_code}, "
f"content: {response.text}")
quit()
print(f"Successfully converted paragraph {i + 1}/{len(PARAGRAPHS)}")
previous_request_ids.append(response.headers["request-id"])
segments.append(AudioSegment.from_mp3(io.BytesIO(response.content)))
segment = segments[0]
for new_segment in segments[1:]:
segment = segment + new_segment
audio_out_path = os.path.join(os.getcwd(), "with_full_conditioning.wav")
segment.export(audio_out_path, format="wav")
print(f"Success! Wrote audio to {audio_out_path}")
```
## Things to note
1. Providing wrong previous\_request\_ids and next\_request\_ids will not result in an error.
2. In order to use the request\_id of a request for conditioning it needs to have processed completely. In case of streaming this means the audio has to be read completely from the response body.
3. How well Request Stitching works varies greatly dependent on the model, voice and voice settings used.
4. previous\_request\_ids and next\_request\_ids should contain request\_ids which are not too old. When the request\_ids are older than two hours it will diminish the effect of conditioning.
5. Enterprises with increased privacy requirements will have Request Stitching disabled.
# How to Use the Text to Sound Effects API
Learn how to use the text to sound effects API to generate sound effects from text.
## Introduction
Our [text to sound effects](https://elevenlabs.io/sound-effects) model enables you to create high-quality sound effects from a short description. These sound effects could be used in a variety of applications, including game development and building apps for music production.
In this tutorial, we will use the text to sound effects API to generate a sound effect from a short description using the Python SDK. We'll then save this sound effect to a file.
For general tips on prompting, see the [sound effects product
docs](/product/sound-effects/overview). And for information on the API
configuration visit [the API reference](/api-reference/sound-generation).
## How to generate a sound effect with the API
### Requirements
Before proceeding, please ensure that you have the following:
* An ElevenLabs account with an API key (here’s how to [find your API key](/api-reference/text-to-speech#authentication))
* Python or Node.js installed on your machine
Then, install the ElevenLabs SDK as shown below
```bash Python
pip install elevenlabs
```
Install the necessary packages to manage your environmental variables:
```bash Python
pip install python-dotenv
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
### Using the sound effects SDK
Now we can use the SDK to generate a sound effect from a short description and save it to a file as shown below.
```python
import os
from elevenlabs.client import ElevenLabs
from dotenv import load_dotenv
load_dotenv()
elevenlabs = ElevenLabs(api_key=os.getenv("ELEVENLABS_API_KEY"))
def generate_sound_effect(text: str, output_path: str):
print("Generating sound effects...")
result = elevenlabs.text_to_sound_effects.convert(
text=text,
duration_seconds=10, # Optional, if not provided will automatically determine the correct length
prompt_influence=0.3, # Optional, if not provided will use the default value of 0.3
)
with open(output_path, "wb") as f:
for chunk in result:
f.write(chunk)
print(f"Audio saved to {output_path}")
if __name__ == "__main__":
generate_sound_effect("Dog barking", "output.mp3")
```
## Configuration
* `duration_seconds`: The duration of the sound effect in seconds. If not provided, the API will automatically determine the correct length. The maximum value is 22
* `prompt_influence`: The amount of influence the prompt has on the generated sound effect. If not provided, the API will use the default value of 0.3
### API pricing
The API is charged at 100 characters per generation with automatic duration or 25 characters per second with a set duration.
### Next steps
We're excited to see what you build with the API. Here are some ideas of what you might want to use it for:
* Adding sound effect generation to a video editing application
* Enabling users to create on-demand samples for their music production
* A new type of video game where every sound is generated dynamically
For higher rate limits of volume based discounts please [contact sales](https://elevenlabs.io/contact-sales).
# How to use text to speech with streaming in Python or Node.js
How to convert text into speech, upload to S3, and share with a signed URL
In this tutorial, you'll learn how to convert [text to speech](https://elevenlabs.io/text-to-speech) with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application.
If you want to jump straight to the finished repo you can find it.
* [Python](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python).
* [Node.js](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node).
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/developer-guides/quickstart#authentication)).
* Python(Node.js, TypeScript) installed on your machine
* (Optionally) an AWS account with access to S3.
## Setup
### Installing our SDK
Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip:
```bash Python
pip install elevenlabs
```
```bash TypeScript
npm install elevenlabs
```
Additionally, install necessary packages to manage your environmental variables:
```bash Python
pip install python-dotenv
```
```bash TypeScript
npm install dotenv
npm install @types/dotenv --save-dev
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
## Convert text to speech (file)
To convert text to speech and save it as a file, we’ll use the `convert` method of the ElevenLabs SDK and then it locally as a `.mp3` file.
```python text_to_speech_file.py (Python)
import os
import uuid
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
def text_to_speech_file(text: str) -> str:
# Calling the text_to_speech conversion API with detailed parameters
response = client.text_to_speech.convert(
voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
output_format="mp3_22050_32",
text=text,
model_id="eleven_turbo_v2_5", # use the turbo model for low latency
voice_settings=VoiceSettings(
stability=0.0,
similarity_boost=1.0,
style=0.0,
use_speaker_boost=True,
),
)
# uncomment the line below to play the audio back
# play(response)
# Generating a unique file name for the output MP3 file
save_file_path = f"{uuid.uuid4()}.mp3"
# Writing the audio to a file
with open(save_file_path, "wb") as f:
for chunk in response:
if chunk:
f.write(chunk)
print(f"{save_file_path}: A new audio file was saved successfully!")
# Return the path of the saved audio file
return save_file_path
```
```typescript text_to_speech_file.ts (Typescript)
import { ElevenLabsClient } from "elevenlabs";
import { createWriteStream } from "fs";
import { v4 as uuid } from "uuid";
import * as dotenv from "dotenv";
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const client = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export const createAudioFileFromText = async (
text: string
): Promise => {
return new Promise(async (resolve, reject) => {
try {
const audio = await client.generate({
voice: "Rachel",
model_id: "eleven_turbo_v2_5",
text,
});
const fileName = `${uuid()}.mp3`;
const fileStream = createWriteStream(fileName);
audio.pipe(fileStream);
fileStream.on("finish", () => resolve(fileName)); // Resolve with the fileName
fileStream.on("error", reject);
} catch (error) {
reject(error);
}
});
};
```
You can then run this function with:
```python Python
text_to_speech_file("Hello World")
```
```typescript TypeScript
await createAudioFileFromText("Hello World");
```
## Convert text to speech (streaming)
If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature.
```python text_to_speech_stream.py (Python)
import os
from typing import IO
from io import BytesIO
from elevenlabs import VoiceSettings
from elevenlabs.client import ElevenLabs
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
client = ElevenLabs(
api_key=ELEVENLABS_API_KEY,
)
def text_to_speech_stream(text: str) -> IO[bytes]:
# Perform the text-to-speech conversion
response = client.text_to_speech.convert(
voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
output_format="mp3_22050_32",
text=text,
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.0,
similarity_boost=1.0,
style=0.0,
use_speaker_boost=True,
),
)
# Create a BytesIO object to hold the audio data in memory
audio_stream = BytesIO()
# Write each chunk of audio data to the stream
for chunk in response:
if chunk:
audio_stream.write(chunk)
# Reset stream position to the beginning
audio_stream.seek(0)
# Return the stream for further use
return audio_stream
```
```typescript text_to_speech_stream.ts (Typescript)
import { ElevenLabsClient } from "elevenlabs";
import * as dotenv from "dotenv";
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
if (!ELEVENLABS_API_KEY) {
throw new Error("Missing ELEVENLABS_API_KEY in environment variables");
}
const client = new ElevenLabsClient({
apiKey: ELEVENLABS_API_KEY,
});
export const createAudioStreamFromText = async (
text: string
): Promise => {
const audioStream = await client.generate({
voice: "Rachel",
model_id: "eleven_turbo_v2_5",
text,
});
const chunks: Buffer[] = [];
for await (const chunk of audioStream) {
chunks.push(chunk);
}
const content = Buffer.concat(chunks);
return content;
};
```
You can then run this function with:
```python Python
text_to_speech_stream("This is James")
```
```typescript TypeScript
await createAudioStreamFromText("This is James");
```
## Bonus - Uploading to AWS S3 and getting a secure sharing link
Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link.
To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your `.env` file. Follow these steps to find the credentials:
1. Log in to your AWS Management Console: Navigate to the AWS home page and sign in with your account.
2. Access the IAM (Identity and Access Management) Dashboard: You can find IAM under "Security, Identity, & Compliance" on the services menu. The IAM dashboard manages access to your AWS services securely.
3. Create a New User (if necessary): On the IAM dashboard, select "Users" and then "Add user". Enter a user name.
4. Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it's best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket.
5. Review and create the user: Review your settings and create the user. Upon creation, you'll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step.
6. Get AWS region name: ex. us-east-1
If you do not have an AWS S3 bucket, you will need to create a new one by following these steps:
1. Access the S3 dashboard: You can find S3 under "Storage" on the services menu.
2. Create a new bucket: On the S3 dashboard, click the "Create bucket" button.
3. Enter a bucket name and click on the "Create bucket" button. You can leave the other bucket options as default. The newly added bucket will appear in the list.
Install `boto3` for interacting with AWS services using `pip` and `npm`.
```bash Python
pip install boto3
```
```bash TypeScript
npm install @aws-sdk/client-s3
npm install @aws-sdk/s3-request-presigner
```
Then add the environment variables to `.env` file like so:
```
AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
AWS_REGION_NAME=your_aws_region_name_here
AWS_S3_BUCKET_NAME=your_s3_bucket_name_here
```
Add the following functions to upload the audio stream to S3 and generate a signed URL.
```python s3_uploader.py (Python)
import os
import boto3
import uuid
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
AWS_REGION_NAME = os.getenv("AWS_REGION_NAME")
AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")
session = boto3.Session(
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_REGION_NAME,
)
s3 = session.client("s3")
def generate_presigned_url(s3_file_name: str) -> str:
signed_url = s3.generate_presigned_url(
"get_object",
Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name},
ExpiresIn=3600,
) # URL expires in 1 hour
return signed_url
def upload_audiostream_to_s3(audio_stream) -> str:
s3_file_name = f"{uuid.uuid4()}.mp3" # Generates a unique file name using UUID
s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name)
return s3_file_name
```
```typescript s3_uploader.ts (TypeScript)
import {
S3Client,
PutObjectCommand,
GetObjectCommand,
} from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
import { v4 as uuid } from "uuid";
import * as dotenv from "dotenv";
dotenv.config();
const {
AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY,
AWS_REGION_NAME,
AWS_S3_BUCKET_NAME,
} = process.env;
if (
!AWS_ACCESS_KEY_ID ||
!AWS_SECRET_ACCESS_KEY ||
!AWS_REGION_NAME ||
!AWS_S3_BUCKET_NAME
) {
throw new Error(
"One or more environment variables are not set. Please check your .env file."
);
}
const s3 = new S3Client({
credentials: {
accessKeyId: AWS_ACCESS_KEY_ID,
secretAccessKey: AWS_SECRET_ACCESS_KEY,
},
region: AWS_REGION_NAME,
});
export const generatePresignedUrl = async (objectKey: string) => {
const getObjectParams = {
Bucket: AWS_S3_BUCKET_NAME,
Key: objectKey,
Expires: 3600,
};
const command = new GetObjectCommand(getObjectParams);
const url = await getSignedUrl(s3, command, { expiresIn: 3600 });
return url;
};
export const uploadAudioStreamToS3 = async (audioStream: Buffer) => {
const remotePath = `${uuid()}.mp3`;
await s3.send(
new PutObjectCommand({
Bucket: AWS_S3_BUCKET_NAME,
Key: remotePath,
Body: audioStream,
ContentType: "audio/mpeg",
})
);
return remotePath;
};
```
You can then call uploading function with the audio stream from the text.
```python Python
s3_file_name = upload_audiostream_to_s3(audio_stream)
```
```typescript TypeScript
const s3path = await uploadAudioStreamToS3(stream);
```
After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing.
You can now generate a URL from a file with:
```python Python
signed_url = generate_presigned_url(s3_file_name)
print(f"Signed URL to access the file: {signed_url}")
```
```typescript TypeScript
const presignedUrl = await generatePresignedUrl(s3path);
console.log("Presigned URL:", presignedUrl);
```
If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire.
To put it all together, you can use the following script:
```python main.py (Python)
import os
from dotenv import load_dotenv
load_dotenv()
from text_to_speech_stream import text_to_speech_stream
from s3_uploader import upload_audiostream_to_s3, generate_presigned_url
def main():
text = "This is James"
audio_stream = text_to_speech_stream(text)
s3_file_name = upload_audiostream_to_s3(audio_stream)
signed_url = generate_presigned_url(s3_file_name)
print(f"Signed URL to access the file: {signed_url}")
if __name__ == "__main__":
main()
```
```typescript index.ts (Typescript)
import "dotenv/config";
import { createAudioFileFromText } from "./text_to_speech_file";
import { createAudioStreamFromText } from "./text_to_speech_stream";
import { generatePresignedUrl, uploadAudioStreamToS3 } from "./s3_uploader";
(async () => {
// save the audio file to disk
const fileName = await createAudioFileFromText(
"Today, the sky is exceptionally clear, and the sun shines brightly."
);
console.log("File name:", fileName);
// OR stream the audio, upload to S3, and get a presigned URL
const stream = await createAudioStreamFromText(
"Today, the sky is exceptionally clear, and the sun shines brightly."
);
const s3path = await uploadAudioStreamToS3(stream);
const presignedUrl = await generatePresignedUrl(s3path);
console.log("Presigned URL:", presignedUrl);
})();
```
## Conclusion
You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically.
Here are some examples of what you could build with this.
1. **Educational Podcasts**: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting.
2. **Accessibility Features for Websites**: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning.
3. **Automated Customer Support Messages**: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails.
4. **Audio Books and Narration**: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading.
5. **Language Learning Tools**: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way.
For more details, visit the following to see the full project files which give a clear structure for setting up your application:
For Python: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/python)
For TypeScript: [example repo](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/text-to-speech/node)
If you have any questions please create an issue on the [elevenlabs-doc Github](https://github.com/elevenlabs/elevenlabs-docs/issues).
# How to use text-to-speech with websocket streaming in Python or Node.js
How to convert text to speech via websocket and save to mp3
Websocket streaming is a method of sending and receiving data over a single, long-lived connection. This method is useful for real-time applications where you need to stream audio data as it becomes available.
If you want to quickly test out the latency (time to first byte) of a websocket connection to the ElevenLabs text-to-speech API, you can install `elevenlabs-latency` via `npm` and follow the instructions [here](https://www.npmjs.com/package/elevenlabs-latency?activeTab=readme).
## Requirements
* An ElevenLabs account with an API key (here’s how to [find your API key](/api-reference/text-to-speech#authentication)).
* Python or Node.js/Typescript installed on your machine
## Setup
Install dotenv package to manage your environmental variables:
```bash Python
pip install python-dotenv
pip install websockets
```
```bash TypeScript
npm install dotenv
npm install @types/dotenv --save-dev
npm install ws
```
Next, create a `.env` file in your project directory and fill it with your credentials like so:
```bash .env
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
```
Last, create a new file to write the code in. You can name it `text-to-speech-websocket.py` for Python or `text-to-speech-websocket.ts` for Typescript.
## Initiate the websocket connection
Pick a voice from the voice library and a text-to-speech model; Then initiate a websocket connection to the text-to-speech API.
```python text-to-speech-websocket.py (Python)
import os
from dotenv import load_dotenv
import websockets
# Load the API key from the .env file
load_dotenv()
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
voice_id = 'kmSVBPu7loj4ayNinwWM'
model_id = 'eleven_turbo_v2'
async def text_to_speech_ws_streaming(voice_id, model_id):
uri = f"wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id={model_id}"
async with websockets.connect(uri) as websocket:
...
```
```typescript text-to-speech-websocket.ts (Typescript)
import * as dotenv from "dotenv";
// @ts-ignore
import WebSocket from "ws";
// Load the API key from the .env file
dotenv.config();
const ELEVENLABS_API_KEY = process.env.ELEVENLABS_API_KEY;
const voiceId = "Xb7hH8MSUJpSbSDYk0k2";
const model = "eleven_turbo_v2";
const uri = `wss://api.elevenlabs.io/v1/text-to-speech/${voiceId}/stream-input?model_id=${model}`;
const websocket = new WebSocket(uri, {
headers: { "xi-api-key": `${ELEVENLABS_API_KEY}` },
});
```
For TypeScript, create a write stream ahead for saving the audio into mp3 which can be passed to the websocket listener.
```typescript text-to-speech-websocket.ts (Typescript)
import * as fs from "node:fs";
const outputDir = "./output";
try {
fs.accessSync(outputDir, fs.constants.R_OK | fs.constants.W_OK);
} catch (err) {
fs.mkdirSync(outputDir);
}
const writeStream = fs.createWriteStream(outputDir + "/test.mp3", {
flags: "a",
});
```
## Send the input text
Once the websocket connection is open, set up voice settings first. Next, send the text message to the API.
```python text-to-speech-websocket.py (Python)
async def text_to_speech_ws_streaming(voice_id, model_id):
async with websockets.connect(uri) as websocket:
await websocket.send(json.dumps({
"text": " ",
"voice_settings": {"stability": 0.5, "similarity_boost": 0.8, "use_speaker_boost": False},
"generation_config": {
"chunk_length_schedule": [120, 160, 250, 290]
},
"xi_api_key": ELEVENLABS_API_KEY,
}))
text = "The twilight sun cast its warm golden hues upon the vast rolling fields, saturating the landscape with an ethereal glow. Silently, the meandering brook continued its ceaseless journey, whispering secrets only the trees seemed privy to."
await websocket.send(json.dumps({"text": text}))
// Send empty string to indicate the end of the text sequence which will close the websocket connection
await websocket.send(json.dumps({"text": ""}))
```
```typescript text-to-speech-websocket.ts (Typescript)
const text =
"The twilight sun cast its warm golden hues upon the vast rolling fields, saturating the landscape with an ethereal glow. ";
websocket.on("open", async () => {
websocket.send(
JSON.stringify({
text: " ",
voice_settings: {
stability: 0.5,
similarity_boost: 0.8,
use_speaker_boost: false,
},
generation_config: { chunk_length_schedule: [120, 160, 250, 290] },
})
);
websocket.send(JSON.stringify({ text: text }));
// Send empty string to indicate the end of the text sequence which will close the websocket connection
websocket.send(JSON.stringify({ text: "" }));
});
```
## Save the audio to file
Read the incoming message from the websocket connection and write the audio chunks to a local file.
```python text-to-speech-websocket.py (Python)
import asyncio
async def write_to_local(audio_stream):
"""Write the audio encoded in base64 string to a local mp3 file."""
with open(f'./output/test.mp3', "wb") as f:
async for chunk in audio_stream:
if chunk:
f.write(chunk)
async def listen(websocket):
"""Listen to the websocket for audio data and stream it."""
while True:
try:
message = await websocket.recv()
data = json.loads(message)
if data.get("audio"):
yield base64.b64decode(data["audio"])
elif data.get('isFinal'):
break
except websockets.exceptions.ConnectionClosed:
print("Connection closed")
break
async def text_to_speech_ws_streaming(voice_id, model_id):
async with websockets.connect(uri) as websocket:
...
# Add listen task to submit the audio chunks to the write_to_local function
listen_task = asyncio.create_task(write_to_local(listen(websocket)))
await listen_task
asyncio.run(text_to_speech_ws_streaming(voice_id, model_id))
```
```typescript text-to-speech-websocket.ts (Typescript)
// Helper function to write the audio encoded in base64 string into local file
function writeToLocal(base64str: any, writeStream: fs.WriteStream) {
const audioBuffer: Buffer = Buffer.from(base64str, "base64");
writeStream.write(audioBuffer, (err) => {
if (err) {
console.error("Error writing to file:", err);
}
});
}
// Listen to the incoming message from the websocket connection
websocket.on("message", function incoming(event) {
const data = JSON.parse(event.toString());
if (data["audio"]) {
writeToLocal(data["audio"], writeStream);
}
});
// Close the writeStream when the websocket connection closes
websocket.on("close", () => {
writeStream.end();
});
```
## Run the script
You can run the script by executing the following command in your terminal. An mp3 audio file will be saved in the `output` directory.
`python Python python text-to-speech-websocket.py ` `typescript
Typescript tsx text-to-speech-websocket.ts `
## Understanding buffering
A key concept to understand when using websockets is buffering. The API only runs model generations when a certain amount of text above a threshold has been sent. This is to optimize the quality of the generated audio by maximising the amount of context available to the model while balancing latency.
For example, if the threshold is set to 120 characters and you send 'Hello, how are you?', the audio won't be generated immediately. This is because the sent message has only 19 characters which is below the threshold. However, if you keep sending text, the API will generate audio once the total text sent since the last generation has at least 120 characters.
In the case that you want force the immediate return of the audio, you can use `flush=true` to clear out the buffer and force generate any buffered text. This can be useful, for example, when you have reached the end of a document and want to generate audio for the final section.
In addition, closing the websocket will automatically force generate any buffered text.
## Best practice
* We suggest using the default setting for `chunk_length_schedule` in `generation_config`. Avoid using `try_trigger_generation` as it is deprecated.
* When developing a real-time conversational AI application, we advise using `flush=true` along with the text at the end of conversation turn to ensure timely audio generation.
* If the default setting doesn't provide optimal latency for your use case, you can modify the `chunk_length_schedule`. However, be mindful that reducing latency through this adjustment may come at the expense of quality.
## Tips
* The API maintains a internal buffer so that it only runs model generations when a certain amount of text above a threshold has been sent. For short texts with a character length smaller than the value set in `chunk_length_schedule`, you can use `flush=true` to clear out the buffer and force generate any buffered text.
* The websocket connection will automatically close after 20 seconds of inactivity. To keep the connection open, you can send a single space character `" "`. Please note that this string must include a space, as sending a fully empty string, `""`, will close the websocket.
* Send an empty string to close the websocket connection after sending the last text message.
* You can use `alignment` to get the word-level timestamps for each word in the text. This can be useful for aligning the audio with the text in a video or for other applications that require precise timing.
# How to send an AI message through a phone call using Twilio and ElevenLabs in Node.js
In this guide, you’ll learn how to send an AI generated message through a phone call using Twilio and ElevenLabs. This process allows you to send high-quality voice messages directly to your callers.
## Create accounts with Twilio and ngrok
We’ll be using Twilio and ngrok for this guide, so go ahead and create accounts with them.
* [twilio.com](https://www.twilio.com)
* [ngrok.com](https://ngrok.com)
## Get the code
If you want to get started quickly, you can get the entire code for this guide on [GitHub](https://github.com/elevenlabs/elevenlabs-examples/tree/main/examples/twilio/call)
## Create the server with Express
### Initialize your project
Create a new folder for your project
```
mkdir elevenlabs-twilio
cd elevenlabs-twilio
npm init -y
```
### Install dependencies
```
npm install elevenlabs express express-ws twilio
```
### Install dev dependencies
```
npm i @types/node @types/express @types/express-ws @types/ws dotenv tsx typescript
```
### Create your files
```ts
// src/app.ts
import 'dotenv/config';
import express, { Response } from 'express';
import ExpressWs from 'express-ws';
import VoiceResponse from 'twilio/lib/twiml/VoiceResponse';
import { ElevenLabsClient } from 'elevenlabs';
import { type WebSocket } from 'ws';
import { Readable } from 'stream';
const app = ExpressWs(express()).app;
const PORT: number = parseInt(process.env.PORT || '5000');
const elevenlabs = new ElevenLabsClient();
const voiceId = '21m00Tcm4TlvDq8ikWAM';
const outputFormat = 'ulaw_8000';
const text = 'This is a test. You can now hang up. Thank you.';
function startApp() {
app.post('/call/incoming', (_, res: Response) => {
const twiml = new VoiceResponse();
twiml.connect().stream({
url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
});
res.writeHead(200, { 'Content-Type': 'text/xml' });
res.end(twiml.toString());
});
app.ws('/call/connection', (ws: WebSocket) => {
ws.on('message', async (data: string) => {
const message: {
event: string;
start?: { streamSid: string; callSid: string };
} = JSON.parse(data);
if (message.event === 'start' && message.start) {
const streamSid = message.start.streamSid;
const response = await elevenlabs.textToSpeech.convert(voiceId, {
model_id: 'eleven_turbo_v2_5',
output_format: outputFormat,
text,
});
const readableStream = Readable.from(response);
const audioArrayBuffer = await streamToArrayBuffer(readableStream);
ws.send(
JSON.stringify({
streamSid,
event: 'media',
media: {
payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
},
}),
);
}
});
ws.on('error', console.error);
});
app.listen(PORT, () => {
console.log(`Local: http://localhost:${PORT}`);
console.log(`Remote: https://${process.env.SERVER_DOMAIN}`);
});
}
function streamToArrayBuffer(readableStream: Readable) {
return new Promise((resolve, reject) => {
const chunks: Buffer[] = [];
readableStream.on('data', (chunk) => {
chunks.push(chunk);
});
readableStream.on('end', () => {
resolve(Buffer.concat(chunks).buffer);
});
readableStream.on('error', reject);
});
}
startApp();
```
```env
# .env
SERVER_DOMAIN=
ELEVENLABS_API_KEY=
```
## Understanding the code
### Handling the incoming call
When you call your number, Twilio makes a POST request to your endpoint at `/call/incoming`.
We then use twiml.connect to tell Twilio that we want to handle the call via our websocket by setting the url to our `/call/connection` endpoint.
```ts
function startApp() {
app.post('/call/incoming', (_, res: Response) => {
const twiml = new VoiceResponse();
twiml.connect().stream({
url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
});
res.writeHead(200, { 'Content-Type': 'text/xml' });
res.end(twiml.toString());
});
```
### Creating the text to speech
Here we listen for messages that Twilio sends to our websocket endpoint. When we receive a `start` message event, we generate audio using the ElevenLabs [TypeScript SDK](https://github.com/elevenlabs/elevenlabs-js).
```ts
app.ws('/call/connection', (ws: WebSocket) => {
ws.on('message', async (data: string) => {
const message: {
event: string;
start?: { streamSid: string; callSid: string };
} = JSON.parse(data);
if (message.event === 'start' && message.start) {
const streamSid = message.start.streamSid;
const response = await elevenlabs.textToSpeech.convert(voiceId, {
model_id: 'eleven_turbo_v2_5',
output_format: outputFormat,
text,
});
```
### Sending the message
Upon receiving the audio back from ElevenLabs, we convert it to an array buffer and send the audio to Twilio via the websocket.
```ts
const readableStream = Readable.from(response);
const audioArrayBuffer = await streamToArrayBuffer(readableStream);
ws.send(
JSON.stringify({
streamSid,
event: 'media',
media: {
payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
},
}),
);
```
## Point ngrok to your application
Twilio requires a publicly accessible URL. We’ll use ngrok to forward the local port of our application and expose it as a public URL.
Run the following command in your terminal:
```
ngrok http 5000
```
Copy the ngrok domain (without https\://) to use in your environment variables.
## Update your environment variables
Update the `.env` file with your ngrok domain and ElevenLabs API key.
```
# .env
SERVER_DOMAIN=*******.ngrok.app
ELEVENLABS_API_KEY=*************************
```
## Start the application
Run the following command to start the app:
```
npm run dev
```
## Set up Twilio
Follow Twilio’s guides to create a new number. Once you’ve created your number, navigate to the “Configure” tab in Phone Numbers -> Manage -> Active numbers
In the “A call comes in” section, enter the full URL to your application (make sure to add the`/call/incoming` path):
E.g. https\://\*\*\*\*\*\*\*ngrok.app/call/incoming
## Make a phone call
Make a call to your number. You should hear a message using the ElevenLabs voice.
## Tips for deploying to production
When running the application in production, make sure to set the `SERVER_DOMAIN` environment variable to that of your server. Be sure to also update the URL in Twilio to point to your production server.
## Conclusion
You should now have a basic understanding of integrating Twilio with ElevenLabs voices. If you have any further questions, or suggestions on how to improve this blog post, please feel free to select the “Suggest edits” or “Raise issue” button below.
# Models
ElevenLabs is the leading provider of AI-powered audio technology. This guide helps developers choose the right model for their use case.
Our most advanced speech synthesis model.
Highest realism, emotional range
Best for voiceovers, audiobooks, content
High quality, low latency Turbo model
Ultra-low latency (\~300ms)
Ideal for real-time, multi-language AI
# Flagship Models
These models are maintained for backward compatibility but are not recommended for new projects.
| Model ID | Description | Max Characters | Languages | Best For |
| ---------------------------- | ------------------------------------------ | -------------- | ------------ | ----------------------------------------------------------- |
| `eleven_multilingual_v2` | Most life-like, emotionally rich model | 10,000 | 29 languages | Voice overs, audiobooks, content creation |
| `eleven_turbo_v2_5` | High quality, low-latency model | 40,000 | 32 languages | Developer use cases requiring speed and multiple languages |
| `eleven_turbo_v2` | High quality, low-latency model | 30,000 | English only | Developer use cases requiring speed (English only) |
| `eleven_english_sts_v2` | State-of-the-art speech-to-speech | 5,000 | English only | Maximum control over content and prosody |
| `eleven_multilingual_sts_v2` | Cutting-edge multilingual speech-to-speech | 10,000 | 29 languages | Advanced multilingual speech synthesis with prosody control |
These models are maintained for backward compatibility but are not recommended for new projects.
| Model ID | Description | Max Characters | Languages | Best For |
| ------------------------ | -------------------------- | -------------- | ------------ | ------------------------------------------- |
| `eleven_monolingual_v1` | First generation TTS model | 10,000 | English only | Legacy model (outclassed by newer versions) |
| `eleven_multilingual_v1` | First multilingual model | 10,000 | 9 languages | Legacy model (outclassed by newer versions) |
# Model Selection Guide
Choose your model based on these primary considerations:
• *Quality over Speed?* Use Standard Multilingual models • *Need
real-time?* Use Turbo models
• *English only?* → Consider `eleven_turbo_v2` • *Multiple
languages?* → Use `eleven_multilingual_v2` or `eleven_turbo_v2_5`
• *Content Creation* → `eleven_multilingual_v2` • *Conversational AI*
→ `eleven_turbo_v2_5` • *Professional Voice Clones* → Either model{" "}
• *Speech to Speech?* → `eleven_english_sts_v2` or
`eleven_multilingual_sts_v2` family
For detailed language support information and troubleshooting guidance, refer
to our [help documentation](https://help.elevenlabs.io).
# Quickstart
Start generating your first text-to-speech using Python and ElevenLabs API
## Accessing the API
With all plans, including the free plan, you gain full access to the ElevenLabs API, enabling you to generate speech programmatically across our entire range of features.
Both the API and the website draw from a shared quota pool. Usage on either platform will affect your total available usage quota.
## Authentication
All requests to the ElevenLabs API must include an `xi-api-key` header with your API key. If using Client SDKs, this header is set automatically; otherwise, include it manually in each request. API keys can be restricted to specific routes, disabled or deleted as needed.
To get your API key, [create an account](https://elevenlabs.io/sign-up), log in, and click on **"API Keys"** in the bottom left corner of the console.
Your API key is a secret. Do not share it or expose it in client-side code (browsers, apps). Load it securely from an environment variable or key management service on your backend server.
## Fetching the voice\_id
The [Text-To-Speech](https://elevenlabs.io/docs/api-reference/text-to-speech) (TTS) endpoint transforms text into speech in a given voice. The input consists of text, voice, and voice settings with an option to specify model.
Before we can generate anything, we need to get the `voice_id` for the voice we want to use.
The easiest way to get the `voice_id` is via the website. You can find the article on how to do that [here](https://help.elevenlabs.io/hc/en-us/articles/14599760033937-How-do-I-find-my-voice's-ID-of-my-voices-via-the-website-and-through-the-API-).
A better approach, especially when dealing with the API, is to retrieve the voices accessible to your account using the `GET /v1/voices` endpoint. If you do not receive the expected voices, you will either encounter an error or just see the pre-made voices, which is most likely indicating that you are not passing your API key correctly.
You can find more information in the [Get Voices](https://elevenlabs.io/docs/api-reference/get-voices) endpoint documentation. This endpoint is crucial if your application requires any form of flexibility in the voices you want to use via the API, as it provides information about each voice, including the `voice_id` for each of them. This `voice_id` is necessary when querying the TTS endpoint.
The API offers a wide range of capabilities, such as adding new voices, but we won't discuss that here. To gain a better understanding of how to do that, you will need to read through the documentation.
You can use the Python code below to print a list of the voices in your account, along with their names and associated voice\_ids, which will be required to use the voices via the API. Simply using the name won't be sufficient.
```python
# The 'requests' and 'json' libraries are imported.
# 'requests' is used to send HTTP requests, while 'json' is used for parsing the JSON data that we receive from the API.
import requests
import json
# An API key is defined here. You'd normally get this from the service you're accessing. It's a form of authentication.
XI_API_KEY = ""
# This is the URL for the API endpoint we'll be making a GET request to.
url = "https://api.elevenlabs.io/v1/voices"
# Here, headers for the HTTP request are being set up.
# Headers provide metadata about the request. In this case, we're specifying the content type and including our API key for authentication.
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY,
"Content-Type": "application/json"
}
# A GET request is sent to the API endpoint. The URL and the headers are passed into the request.
response = requests.get(url, headers=headers)
# The JSON response from the API is parsed using the built-in .json() method from the 'requests' library.
# This transforms the JSON data into a Python dictionary for further processing.
data = response.json()
# A loop is created to iterate over each 'voice' in the 'voices' list from the parsed data.
# The 'voices' list consists of dictionaries, each representing a unique voice provided by the API.
for voice in data['voices']:
# For each 'voice', the 'name' and 'voice_id' are printed out.
# These keys in the voice dictionary contain values that provide information about the specific voice.
print(f"{voice['name']}; {voice['voice_id']}")
```
## Text-to-speech
Once you have you've gotten the `voice_id` from the voices endpoint, you are ready to query the [Text-to-speech](https://elevenlabs.io/docs/api-reference/text-to-speech) endpoint.
First, you need to decide whether you want to use the streaming response or not. In general, we recommend streaming unless your client doesn't support it. You can find more information about the [streaming](https://elevenlabs.io/docs/api-reference/streaming) endpoint in our documentation.
Second, you need to choose your voice settings. We recommend using the default voice settings before experimenting with the expressivity and similarity settings. You can find more information about the [settings](https://elevenlabs.io/docs/speech-synthesis/voice-settings) in our documentation.
After generating something, we store each audio you generate together with other metadata in your personal history. You can retrieve all your history items via the `/v1/history` endpoint. We also provide endpoints for retrieving audio, deletion, and download of history items. You can find more information about how to use the [history](https://elevenlabs.io/docs/api-reference/get-generated-items) endpoint in our documentation. The only thing that is not saved is websocket streaming queries to save latency.
In the basic Python example below, we send a request to the text-to-speech endpoint and receive a stream of audio bytes in return. All you need to do is set your XI API Key first and try it out! If you cannot find it, please follow this article [here](https://help.elevenlabs.io/hc/en-us/articles/14599447207697-How-to-authorize-yourself-using-your-xi-api-key-).
You will need to replace the constants by inserting your ``, setting the correct ``, and entering the `` you want the AI to convert to speech. You can change the settings `stability` and `similarity_boost` if you want, but they are set to good default values.
Of course, the code below is only to showcase the very bare minimum to get the system up and running, and you can then use more advanced code to create a nice GUI with sliders for the variables, a list for the voices, and anything else that you want to add.
```python
# Import necessary libraries
import requests # Used for making HTTP requests
import json # Used for working with JSON data
# Define constants for the script
CHUNK_SIZE = 1024 # Size of chunks to read/write at a time
XI_API_KEY = "" # Your API key for authentication
VOICE_ID = "" # ID of the voice model to use
TEXT_TO_SPEAK = "" # Text you want to convert to speech
OUTPUT_PATH = "output.mp3" # Path to save the output audio file
# Construct the URL for the Text-to-Speech API request
tts_url = f"https://api.elevenlabs.io/v1/text-to-speech/{VOICE_ID}/stream"
# Set up headers for the API request, including the API key for authentication
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
# Set up the data payload for the API request, including the text and voice settings
data = {
"text": TEXT_TO_SPEAK,
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
}
}
# Make the POST request to the TTS API with headers and data, enabling streaming response
response = requests.post(tts_url, headers=headers, json=data, stream=True)
# Check if the request was successful
if response.ok:
# Open the output file in write-binary mode
with open(OUTPUT_PATH, "wb") as f:
# Read the response in chunks and write to the file
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
f.write(chunk)
# Inform the user of success
print("Audio stream saved successfully.")
else:
# Print the error message if the request was not successful
print(response.text)
```
## Speech-to-Speech
Generating speech-to-speech involves a similar process to text-to-speech, but with some adjustments in the API parameters. Instead of providing text when calling the API, you provide the path to an audio file that you would like to convert from one voice to another. Here's a modified version of your code to illustrate how to generate speech-to-speech using the given API:
```python
# Import necessary libraries
import requests # Used for making HTTP requests
import json # Used for working with JSON data
# Define constants for the script
CHUNK_SIZE = 1024 # Size of chunks to read/write at a time
XI_API_KEY = "" # Your API key for authentication
VOICE_ID = "" # ID of the voice model to use
AUDIO_FILE_PATH = "" # Path to the input audio file
OUTPUT_PATH = "output.mp3" # Path to save the output audio file
# Construct the URL for the Speech-to-Speech API request
sts_url = f"https://api.elevenlabs.io/v1/speech-to-speech/{VOICE_ID}/stream"
# Set up headers for the API request, including the API key for authentication
headers = {
"Accept": "application/json",
"xi-api-key": XI_API_KEY
}
# Set up the data payload for the API request, including model ID and voice settings
# Note: voice settings are converted to a JSON string
data = {
"model_id": "eleven_english_sts_v2",
"voice_settings": json.dumps({
"stability": 0.5,
"similarity_boost": 0.8,
"style": 0.0,
"use_speaker_boost": True
})
}
# Set up the files to send with the request, including the input audio file
files = {
"audio": open(AUDIO_FILE_PATH, "rb")
}
# Make the POST request to the STS API with headers, data, and files, enabling streaming response
response = requests.post(sts_url, headers=headers, data=data, files=files, stream=True)
# Check if the request was successful
if response.ok:
# Open the output file in write-binary mode
with open(OUTPUT_PATH, "wb") as f:
# Read the response in chunks and write to the file
for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
f.write(chunk)
# Inform the user of success
print("Audio stream saved successfully.")
else:
# Print the error message if the request was not successful
print(response.text)
```
This code takes an input audio file (`` should be something like `C:/User/Documents/input.mp3`), sends a request to the speech-to-speech endpoint, and receives a stream of audio bytes in return. It then saves the streamed audio to the specified output path (`output.mp3` if no specific path is specified, it will be saved in the same folder as the .py code file).
Make sure to replace placeholders like `` and `` with your actual API key and voice ID respectively. Additionally, adjust paths and settings as needed.
# Reducing Latency
Seven methods for reducing streaming latency, in order of highest to lowest effectiveness:
***
## 1. Use the [Turbo v2.5 model](https://elevenlabs.io/api)
Our cutting-edge Eleven Turbo v2.5 is ideally suited for tasks demanding extremely low latency. The new turbo model\_id is `eleven_turbo_v2_5`.
## 2. Use the [streaming API](/api-reference/streaming)
ElevenLabs provides three text-to-speech endpoints:
1. A **regular** text-to-speech endpoint
2. A **streaming** text-to-speech endpoint
3. A **websockets** text-to-speech endpoint
The regular endpoint renders the audio file before returning it in the response. The streaming endpoint streams back the audio as it is being generated, resulting in much lower response time from request to first byte of audio received. For applications that require low latency, the streaming endpoint is therefore recommended.
## 3. Use the [input streaming Websocket](/api-reference/websockets)
For applications where the text prompts can be streamed to the text-to-speech endpoints (such as LLM output), this allows for prompts to be fed to the endpoint while the speech is being generated. You can also configure the streaming chunk size when using the websocket, with smaller chunks generally rendering faster. As such, we recommend sending content word by word, our model and tooling leverages context to ensure that sentence structure and more are persisted to the generated audio even if we only receive a word at a time.
## 4. Update to the [Enterprise plan](https://elevenlabs.io/enterprise)
Enterprise customers receive top priority in the rendering queue, which ensures that they always experience the lowest possible latency, regardless of model usage load.
## 5. Use Default Voices, Synthetic Voices, & IVCs rather than PVCs
Based on our testing, we've observed that default voices (formerly called 'premade' voices), synthetic voices, and Instant Voice Clones tend to have lower latency compared to Professional Voice Clones. A Professional Voice Clone allows for a much more accurate clone of the original samples, albeit with a slight increase in latency at present. Unfortunately, it is a slight tradeoff and we're currently working on optimizing the latency of the Professional Voice Clones for Turbo v2.5.
## 6. Reuse HTTPS Sessions When Streaming
When streaming through the websocket, reusing an established SSL/TLS session helps reduce latency by skipping the handshake process. This improves latency for all requests after the session’s first.
### 6a. Limit the Number of Websocket Connection Closures
Similarly, for websockets we leverage the WSS protocol and so an SSL/TLS handshake takes place at the beginning of a connection, which adds overhead. As such, we recommend to limit the number of times a connection is closed and reopened to the extent possible.
## 7. Leverage Servers Closer to the US
Today, our APIs are served from the US, and as such users may experience latency from increased network routing when communicating with these APIs outside of the United States.
# Specifying Server Location
How to control the server location for your requests
When using ElevenLabs API you have two options for selecting the server location:
## Using the closest servers (default)
Requests to `api.elevenlabs.io` are automatically routed to the closest available server location which minimizes network latency. We recommend using this option for most users.
## Manually specifying the server location
If needed, you can select a specific server location for your requests.
#### When using the client library
Provide the `environment` option when constructing the client:
```python Python
import elevenlabs
client = elevenlabs.ElevenLabs(
api_key="YOUR_API_KEY",
# Use the US servers only.
environment=elevenlabs.ElevenLabsEnvironment.PRODUCTION_US
)
```
```typescript TypeScript
import { ElevenLabsClient, ElevenLabsEnvironment } from "elevenlabs";
const client = new ElevenLabsClient({
apiKey: "YOUR_API_KEY",
// Use the US servers only.
environment: ElevenLabsEnvironment.ProductionUs,
});
```
#### When using direct API calls
Replace the `api.elevenlabs.io` hostname in your requests with one of the following values:
* `api.us.elevenlabs.io`: Uses servers located in the US.
# How to set up an ElevenLabs audio player for your articles in React (Next.js, Vite)
Here's a guide on how you can use [Audio Native](/product/audio-native/overview) in your React projects. I'll be using Next.js, but this process will work for any React project.
Audio Native is an embedded audio player designed to vocalize the content of web pages through ElevenLabs' Text to Speech technology, as shown below.
## Activate Audio Native within ElevenLabs
First you need to make sure that you activate Audio Native in your ElevenLabs account. After signing in, visit: [https://elevenlabs.io/audio-native](https://elevenlabs.io/audio-native)
Select “Click to start with Audio Native” to begin the setup. Don't worry about completing all the sections in the setup, as we just need to get to the code snippet at the end.
Once you've gone through the setup, you should see a page like this:
This is the code snippet that is used to embed Audio Native on a normal website such as Wordpress, Ghost, or Webflow. However, you can't use this snippet directly in React.
## Creating the Audio Native React component
Here's a handy component that you can reuse across your project:
```tsx
// ElevenLabsAudioNative.tsx
'use client';
import { useEffect } from 'react';
export type ElevenLabsProps = {
publicUserId: string;
textColorRgba?: string;
backgroundColorRgba?: string;
size?: 'small' | 'large';
children?: React.ReactNode;
};
export const ElevenLabsAudioNative = ({
publicUserId,
size,
textColorRgba,
backgroundColorRgba,
children,
}: ElevenLabsProps) => {
useEffect(() => {
const script = document.createElement('script');
script.src = 'https://elevenlabs.io/player/audioNativeHelper.js';
script.async = true;
document.body.appendChild(script);
return () => {
document.body.removeChild(script);
};
}, []);
return (
{children ? children : 'Elevenlabs AudioNative Player'}
);
};
export default ElevenLabsAudioNative;
```
Here's a link to the component on GitHub - [ElevenLabsAudioNative.tsx](https://github.com/elevenlabs/elevenlabs-examples/blob/main/examples/audio-native/react/ElevenLabsAudioNative.tsx)
```tsx
'use client';
import { useEffect } from 'react';
```
We add the `use client` directive at the top of the file. This is mainly for Next.js, as we are using `useEffect` which can only be used in client side components.
```tsx
export type ElevenLabsProps = {
publicUserId: string;
textColorRgba?: string;
backgroundColorRgba?: string;
size?: "small" | "large";
children?: React.ReactNode;
};
```
Helpful type for the props so that we can specify the public user ID (described later), customize colors and size, and set a default content if the player hasn't loaded. You can ignore this if you're not using TypeScript (TypeScript is great however!).
```tsx
useEffect(() => {
const script = document.createElement("script");
script.src = "https://elevenlabs.io/player/audioNativeHelper.js";
script.async = true;
document.body.appendChild(script);
return () => {
document.body.removeChild(script);
};
}, []);
```
In order to load the Audio Native player, we use the useEffect hook to dynamically append a script tag to the body and set the source to the URL of the Audio Native helper script.
When the component is dismounted, we make sure to remove the script tag from the body. This ensures it doesn't get loaded twice if we remount the component.
```tsx
{children ? children : "Elevenlabs AudioNative Player"}
```
Here is our main div element which will be where our Audio Native player will be placed. The children of the component can be used to show content before the player has been loaded (e.g. Loading audio player…).
React components are rendered and managed entirely in JavaScript, and their rendering lifecycle is controlled by React's virtual DOM. When you try to include a script tag directly within a React component's JSX, it doesn't behave as it would when included directly in an HTML file. React's virtual DOM does not execute script tags inserted into the DOM as part of component rendering. This is a security feature to prevent unintended or malicious code execution.
This is why, if we were to just paste the Audio Native code snippet into our React application, it would not work.
## Get the public user ID from the Audio Native snippet
Before you can use this component, you'll need to retrieve your public user ID from the code snippet. Go back to [https://elevenlabs.io/audio-native](https://elevenlabs.io/audio-native), and in the code snippet, copy the property called `publicuserid`.
This public user ID is used to identify your Audio Native project.
## Use the Audio Native component
Now that you have the public user ID, you can use the component on your page. Simply import it, then pass it the public user ID from the previous step.
```tsx
import { ElevenLabsAudioNative } from "./path/to/ElevenLabsAudioNative";
export default function Page() {
return (
Your Blog Post Title
Your blog post...
);
}
```
### Preview
Start your development server, if you haven't already, and view the page. You should see something similar to the following, stating that the URL is not allowed. (If you don't see anything, please see the Troubleshooting section below to perform a hard refresh)
### Troubleshooting
If you don't see the Audio Native player, try doing a hard refresh. This can sometimes be an issue because of the development server not properly reloading the script.
In Chrome it's: (⌘ or Ctrl) + Shift + R
### Why am I seeing “URL not allowed”?
Here's what's happening behind the scenes. Remember that script we loaded in the useEffect hook? This script is trying to scrape the content from your page to get all the text and convert it to audio. However, it can't load your page because it's on `localhost`. Audio Native can only process pages that are publicly accessible on the internet.
## Local testing with ngrok
This is where a service such as ngrok can help us. ngrok is a way to get your site on localhost to map to a public URL on the internet. They have a free tier, so visit their website [https://ngrok.com](https://ngrok.com), create an account and install it.
Here's their getting started guide - [https://ngrok.com/docs/getting-started](https://ngrok.com/docs/getting-started)
Once you have it installed, you can use a command similar to the one below to point your local React project to a public URL with ngrok. I'm running Next.js locally on port `3000`, so here's the command I run. Your details may vary.
```
ngrok http http://localhost:3000
```
Running this command will give you a URL that you can use in the next step.
### Update the allowed URLs to include the ngrok URL
Go to the Audio Native section:
[https://elevenlabs.io/audio-native](https://elevenlabs.io/audio-native)
Select the “My Websites” tab.
Enter the ngrok URL (from the previous step) in the “Allowed URLs” section.
This ensures that your player can only show on websites that you specify. This is very important, as someone else may otherwise be able to use your public user ID on their website.
Now visit your ngrok URL, you should see Audio Native processing your content. In the background, we are creating a project in your ElevenLabs account just for your page. This project contains the text from your page and converts it to audio.
View the newly created project here:
[https://elevenlabs.io/app/projects](https://elevenlabs.io/app/projects)
## Deploy to production
Make sure to also add the URL of your website to the allowed URLs once you've deployed your React app and you're ready to push to production.
We only used ngrok for local development, it's not needed for public facing URLs as ElevenLabs will directly grab the content from the website.
## Updating audio content
When updating the content on a page, you may notice that the audio from the Audio Native player won't update automatically.
In order to update the audio you'll have to go to the project in ElevenLabs and update the content from there manually. [https://elevenlabs.io/app/projects](https://elevenlabs.io/app/projects)
## Conclusion
Now that you have Audio Native working in your React project, go ahead and add the component to more pages on your website to begin converting content into high quality audio for your visitors.
# How to set up an ElevenLabs audio player for your articles in Framer
Before adding Audio Native to Framer, you'll need to create & customize your player, whitelist your blog's domain, and copy your embed code. If you need help completing those steps, refer to our [Audio Native overview](https://elevenlabs.io/docs/audio-native/overview).
Now that you've created & customized your Audio Native player, navigate to Framer. Go to Site Settings in Extract the \