Quickstart
Start generating your first text-to-speech using Python and ElevenLabs API
Accessing the API
With all plans, including the free plan, you gain full access to the ElevenLabs API, enabling you to generate speech programmatically across our entire range of features.
Both the API and the website draw from a shared quota pool. Usage on either platform will affect your total available usage quota.
Authentication
All requests to the ElevenLabs API must include an xi-api-key
header with your API key. If using Client SDKs, this header is set automatically; otherwise, include it manually in each request. API keys can be restricted to specific routes, disabled or deleted as needed.
To get your API key, create an account, log in, and click on “API Keys” in the bottom left corner of the console.
Your API key is a secret. Do not share it or expose it in client-side code (browsers, apps). Load it securely from an environment variable or key management service on your backend server.
Fetching the voice_id
The Text-To-Speech (TTS) endpoint transforms text into speech in a given voice. The input consists of text, voice, and voice settings with an option to specify model.
Before we can generate anything, we need to get the voice_id
for the voice we want to use.
The easiest way to get the voice_id
is via the website. You can find the article on how to do that here.
A better approach, especially when dealing with the API, is to retrieve the voices accessible to your account using the GET /v1/voices
endpoint. If you do not receive the expected voices, you will either encounter an error or just see the pre-made voices, which is most likely indicating that you are not passing your API key correctly.
You can find more information in the Get Voices endpoint documentation. This endpoint is crucial if your application requires any form of flexibility in the voices you want to use via the API, as it provides information about each voice, including the voice_id
for each of them. This voice_id
is necessary when querying the TTS endpoint.
The API offers a wide range of capabilities, such as adding new voices, but we won’t discuss that here. To gain a better understanding of how to do that, you will need to read through the documentation.
You can use the Python code below to print a list of the voices in your account, along with their names and associated voice_ids, which will be required to use the voices via the API. Simply using the name won’t be sufficient.
Text-to-speech
Once you have you’ve gotten the voice_id
from the voices endpoint, you are ready to query the Text-to-speech endpoint.
First, you need to decide whether you want to use the streaming response or not. In general, we recommend streaming unless your client doesn’t support it. You can find more information about the streaming endpoint in our documentation.
Second, you need to choose your voice settings. We recommend using the default voice settings before experimenting with the expressivity and similarity settings. You can find more information about the settings in our documentation.
After generating something, we store each audio you generate together with other metadata in your personal history. You can retrieve all your history items via the /v1/history
endpoint. We also provide endpoints for retrieving audio, deletion, and download of history items. You can find more information about how to use the history endpoint in our documentation. The only thing that is not saved is websocket streaming queries to save latency.
In the basic Python example below, we send a request to the text-to-speech endpoint and receive a stream of audio bytes in return. All you need to do is set your XI API Key first and try it out! If you cannot find it, please follow this article here.
You will need to replace the constants by inserting your <xi-api-key>
, setting the correct <voice-id>
, and entering the <text>
you want the AI to convert to speech. You can change the settings stability
and similarity_boost
if you want, but they are set to good default values.
Of course, the code below is only to showcase the very bare minimum to get the system up and running, and you can then use more advanced code to create a nice GUI with sliders for the variables, a list for the voices, and anything else that you want to add.
Speech-to-Speech
Generating speech-to-speech involves a similar process to text-to-speech, but with some adjustments in the API parameters. Instead of providing text when calling the API, you provide the path to an audio file that you would like to convert from one voice to another. Here’s a modified version of your code to illustrate how to generate speech-to-speech using the given API:
This code takes an input audio file (<path>
should be something like C:/User/Documents/input.mp3
), sends a request to the speech-to-speech endpoint, and receives a stream of audio bytes in return. It then saves the streamed audio to the specified output path (output.mp3
if no specific path is specified, it will be saved in the same folder as the .py code file).
Make sure to replace placeholders like <xi-api-key>
and <voice-id>
with your actual API key and voice ID respectively. Additionally, adjust paths and settings as needed.
Was this page helpful?