Getting Started
Start generating your first text-to-speech using Python and ElevenLabs API
All plans, even the free plan, come with access to the API (Application Programming Interface), allowing you to use our technology programmatically and its full range of features through code. An API is the fundamental component that applications using ElevenLabs or any other service rely on.
In this section, we will provide a fairly concise overview of how the API can be used to make requests to the endpoints to fetch the voices in your voice library and even generate our first text-to-speech.
Something that might be good to note is that the website is just a frontend for the API. So if you use Speech Synthesis or make calls directly to the API with code, there should be no difference in quality or output with the same settings. However, the AI is still non-deterministic so there might be a difference in the delivery of the generated output.
The API and the website also share the same pool of quota when generating audio. So, if you have 10,000 characters in your account and you generate audio, the characters for that generation will be deducted from your account regardless of whether you use the website or the API. Given the extensive nature of the API and its virtually limitless possibilities, we will only do a very quick overview and showcase a few examples to at least get you up and running.
The Text-To-Speech (TTS) endpoint transforms text into speech in a given voice. The input consists of text, voice, and voice settings with an option to specify model.
Authentication
The ElevenLabs API uses API keys for authentication. You can create multiple API keys with restricted permissions for specific API routes (e.g., text-to-speech). API keys can be disabled or deleted as needed.
To get your API key, create an account, log in, and click on “API Keys” in the bottom left corner of the console.
Your API key is a secret. Do not share it or expose it in client-side code (browsers, apps). Load it securely from an environment variable or key management service on your backend server.
Include your API key in all API requests by adding it to the xi-api-key
HTTP header.
Fetching the voice_id
Before we can generate anything, we need to get the voice_id
for the voice we want to use.
The easiest way to get the voice_id
is via the website. You can find the article on how to do that here.
A better approach, especially when dealing with the API, is to retrieve the voices accessible to your account using the GET /v1/voices
endpoint. If you do not receive the expected voices, you will either encounter an error or just see the pre-made voices, which is most likely indicating that you are not passing your API key correctly.
You can find more information in the Get Voices endpoint documentation. This endpoint is crucial if your application requires any form of flexibility in the voices you want to use via the API, as it provides information about each voice, including the voice_id
for each of them. This voice_id
is necessary when querying the TTS endpoint.
The API offers a wide range of capabilities, such as adding new voices, but we won’t discuss that here. To gain a better understanding of how to do that, you will need to read through the documentation.
You can use the Python code below to print a list of the voices in your account, along with their names and associated voice_ids, which will be required to use the voices via the API. Simply using the name won’t be sufficient.
Text-to-speech
Once you have you’ve gotten the voice_id
from the voices endpoint, you are ready to query the Text-to-speech endpoint.
First, you need to decide whether you want to use the streaming response or not. In general, we recommend streaming unless your client doesn’t support it. You can find more information about the streaming endpoint in our documentation.
Second, you need to choose your voice settings. We recommend using the default voice settings before experimenting with the expressivity and similarity settings. You can find more information about the settings in our documentation.
After generating something, we store each audio you generate together with other metadata in your personal history. You can retrieve all your history items via the /v1/history
endpoint. We also provide endpoints for retrieving audio, deletion, and download of history items. You can find more information about how to use the history endpoint in our documentation. The only thing that is not saved is websocket streaming queries to save latency.
In the basic Python example below, we send a request to the text-to-speech endpoint and receive a stream of audio bytes in return. All you need to do is set your XI API Key first and try it out! If you cannot find it, please follow this article here.
You will need to replace the constants by inserting your <xi-api-key>
, setting the correct <voice-id>
, and entering the <text>
you want the AI to convert to speech. You can change the settings stability
and similarity_boost
if you want, but they are set to good default values.
Of course, the code below is only to showcase the very bare minimum to get the system up and running, and you can then use more advanced code to create a nice GUI with sliders for the variables, a list for the voices, and anything else that you want to add.
Speech-to-Speech
Generating speech-to-speech involves a similar process to text-to-speech, but with some adjustments in the API parameters. Instead of providing text when calling the API, you provide the path to an audio file that you would like to convert from one voice to another. Here’s a modified version of your code to illustrate how to generate speech-to-speech using the given API:
This code takes an input audio file (<path>
should be something like C:/User/Documents/input.mp3
), sends a request to the speech-to-speech endpoint, and receives a stream of audio bytes in return. It then saves the streamed audio to the specified output path (output.mp3
if no specific path is specified, it will be saved in the same folder as the .py code file).
Make sure to replace placeholders like <xi-api-key>
and <voice-id>
with your actual API key and voice ID respectively. Additionally, adjust paths and settings as needed.