How to use text to speech with streaming in Python or Node.js

How to convert text into speech, upload to S3, and share with a signed URL

In this tutorial, you’ll learn how to convert text to speech with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application.

If you want to jump straight to the finished repo you can find it.

Requirements

  • An ElevenLabs account with an API key (here’s how to find your API key).
  • Python(Node.js, TypeScript) installed on your machine
  • (Optionally) an AWS account with access to S3.

Setup

Installing our SDK

Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip:

$pip install elevenlabs

Additionally, install necessary packages to manage your environmental variables:

$pip install python-dotenv

Next, create a .env file in your project directory and fill it with your credentials like so:

.env
$ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Convert text to speech (file)

To convert text to speech and save it as a file, we’ll use the convert method of the ElevenLabs SDK and then it locally as a .mp3 file.

1import os
2import uuid
3from elevenlabs import VoiceSettings
4from elevenlabs.client import ElevenLabs
5
6ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
7client = ElevenLabs(
8 api_key=ELEVENLABS_API_KEY,
9)
10
11
12def text_to_speech_file(text: str) -> str:
13 # Calling the text_to_speech conversion API with detailed parameters
14 response = client.text_to_speech.convert(
15 voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
16 output_format="mp3_22050_32",
17 text=text,
18 model_id="eleven_turbo_v2_5", # use the turbo model for low latency
19 voice_settings=VoiceSettings(
20 stability=0.0,
21 similarity_boost=1.0,
22 style=0.0,
23 use_speaker_boost=True,
24 ),
25 )
26
27 # uncomment the line below to play the audio back
28 # play(response)
29
30 # Generating a unique file name for the output MP3 file
31 save_file_path = f"{uuid.uuid4()}.mp3"
32
33 # Writing the audio to a file
34 with open(save_file_path, "wb") as f:
35 for chunk in response:
36 if chunk:
37 f.write(chunk)
38
39 print(f"{save_file_path}: A new audio file was saved successfully!")
40
41 # Return the path of the saved audio file
42 return save_file_path

You can then run this function with:

1text_to_speech_file("Hello World")

Convert text to speech (streaming)

If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature.

1import os
2from typing import IO
3from io import BytesIO
4from elevenlabs import VoiceSettings
5from elevenlabs.client import ElevenLabs
6
7ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
8client = ElevenLabs(
9 api_key=ELEVENLABS_API_KEY,
10)
11
12
13def text_to_speech_stream(text: str) -> IO[bytes]:
14 # Perform the text-to-speech conversion
15 response = client.text_to_speech.convert(
16 voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
17 output_format="mp3_22050_32",
18 text=text,
19 model_id="eleven_multilingual_v2",
20 voice_settings=VoiceSettings(
21 stability=0.0,
22 similarity_boost=1.0,
23 style=0.0,
24 use_speaker_boost=True,
25 ),
26 )
27
28 # Create a BytesIO object to hold the audio data in memory
29 audio_stream = BytesIO()
30
31 # Write each chunk of audio data to the stream
32 for chunk in response:
33 if chunk:
34 audio_stream.write(chunk)
35
36 # Reset stream position to the beginning
37 audio_stream.seek(0)
38
39 # Return the stream for further use
40 return audio_stream

You can then run this function with:

1text_to_speech_stream("This is James")

Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link.

Conclusion

You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically.

Here are some examples of what you could build with this.

  1. Educational Podcasts: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting.

  2. Accessibility Features for Websites: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning.

  3. Automated Customer Support Messages: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails.

  4. Audio Books and Narration: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading.

  5. Language Learning Tools: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way.

For more details, visit the following to see the full project files which give a clear structure for setting up your application:

For Python: example repo

For TypeScript: example repo

If you have any questions please create an issue on the elevenlabs-doc Github.

Built with