Streaming text to speech — ElevenLabs Documentation

In this tutorial, you’ll learn how to convert text to speech with the ElevenLabs SDK. We’ll start by talking through how to generate speech and receive a file and then how to generate speech and stream the response back. Finally, as a bonus we’ll show you how to upload the generated audio to an AWS S3 bucket, and share it through a signed URL. This signed URL will provide temporary access to the audio file, making it perfect for sharing with users by SMS or embedding into an application.

If you want to jump straight to an example you can find them in the Python and Node.js example repositories.

Requirements

An ElevenLabs account with an API key (here’s how to find your API key).
Python or Node installed on your machine
(Optionally) an AWS account with access to S3.

Setup

Installing our SDK

Before you begin, make sure you have installed the necessary SDKs and libraries. You will need the ElevenLabs SDK for the text to speech conversion. You can install it using pip:

$ pip install elevenlabs

Additionally, install necessary packages to manage your environmental variables:

$ pip install python-dotenv

Next, create a .env file in your project directory and fill it with your credentials like so:

.env

$ ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Convert text to speech (file)

To convert text to speech and save it as a file, we’ll use the convert method of the ElevenLabs SDK and then it locally as a .mp3 file.

1 import os
2 import uuid
3 from elevenlabs import VoiceSettings
4 from elevenlabs.client import ElevenLabs
5 
6 ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
7 client = ElevenLabs(
8     api_key=ELEVENLABS_API_KEY,
9 )
10 
11 
12 def text_to_speech_file(text: str) -> str:
13     # Calling the text_to_speech conversion API with detailed parameters
14     response = client.text_to_speech.convert(
15         voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
16         output_format="mp3_22050_32",
17         text=text,
18         model_id="eleven_turbo_v2_5", # use the turbo model for low latency
19         # Optional voice settings that allow you to customize the output
20         voice_settings=VoiceSettings(
21             stability=0.0,
22             similarity_boost=1.0,
23             style=0.0,
24             use_speaker_boost=True,
25             speed=1.0,
26         ),
27     )
28 
29     # uncomment the line below to play the audio back
30     # play(response)
31 
32     # Generating a unique file name for the output MP3 file
33     save_file_path = f"{uuid.uuid4()}.mp3"
34 
35     # Writing the audio to a file
36     with open(save_file_path, "wb") as f:
37         for chunk in response:
38             if chunk:
39                 f.write(chunk)
40 
41     print(f"{save_file_path}: A new audio file was saved successfully!")
42 
43     # Return the path of the saved audio file
44     return save_file_path

You can then run this function with:

1 text_to_speech_file("Hello World")

Convert text to speech (streaming)

If you prefer to stream the audio directly without saving it to a file, you can use our streaming feature.

1 import os
2 from typing import IO
3 from io import BytesIO
4 from elevenlabs import VoiceSettings
5 from elevenlabs.client import ElevenLabs
6 
7 ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
8 client = ElevenLabs(
9     api_key=ELEVENLABS_API_KEY,
10 )
11 
12 
13 def text_to_speech_stream(text: str) -> IO[bytes]:
14     # Perform the text-to-speech conversion
15     response = client.text_to_speech.convert(
16         voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
17         output_format="mp3_22050_32",
18         text=text,
19         model_id="eleven_multilingual_v2",
20         # Optional voice settings that allow you to customize the output
21         voice_settings=VoiceSettings(
22             stability=0.0,
23             similarity_boost=1.0,
24             style=0.0,
25             use_speaker_boost=True,
26             speed=1.0,
27         ),
28     )
29 
30     # Create a BytesIO object to hold the audio data in memory
31     audio_stream = BytesIO()
32 
33     # Write each chunk of audio data to the stream
34     for chunk in response:
35         if chunk:
36             audio_stream.write(chunk)
37 
38     # Reset stream position to the beginning
39     audio_stream.seek(0)
40 
41     # Return the stream for further use
42     return audio_stream

You can then run this function with:

1 text_to_speech_stream("This is James")

Once your audio data is created as either a file or a stream you might want to share this with your users. One way to do this is to upload it to an AWS S3 bucket and generate a secure sharing link.

Creating your AWS credentials

To upload the data to S3 you’ll need to add your AWS access key ID, secret access key and AWS region name to your .env file. Follow these steps to find the credentials:

Access the IAM (Identity and Access Management) Dashboard: You can find IAM under “Security, Identity, & Compliance” on the services menu. The IAM dashboard manages access to your AWS services securely.

Create a New User (if necessary): On the IAM dashboard, select “Users” and then “Add user”. Enter a user name.

Set the permissions: attach policies directly to the user according to the access level you wish to grant. For S3 uploads, you can use the AmazonS3FullAccess policy. However, it’s best practice to grant least privilege, or the minimal permissions necessary to perform a task. You might want to create a custom policy that specifically allows only the necessary actions on your S3 bucket.

Review and create the user: Review your settings and create the user. Upon creation, you’ll be presented with an access key ID and a secret access key. Be sure to download and securely save these credentials; the secret access key cannot be retrieved again after this step.

Get AWS region name: ex. us-east-1

If you do not have an AWS S3 bucket, you will need to create a new one by following these steps:

Access the S3 dashboard: You can find S3 under “Storage” on the services menu.

Create a new bucket: On the S3 dashboard, click the “Create bucket” button.

Enter a bucket name and click on the “Create bucket” button. You can leave the other bucket options as default. The newly added bucket will appear in the list.

Installing the AWS SDK and adding the credentials

Install boto3 for interacting with AWS services using pip and npm.

$ pip install boto3

Then add the environment variables to .env file like so:

AWS_ACCESS_KEY_ID=your_aws_access_key_id_here
AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key_here
AWS_REGION_NAME=your_aws_region_name_here
AWS_S3_BUCKET_NAME=your_s3_bucket_name_here

Uploading to AWS S3 and generating the signed URL

Add the following functions to upload the audio stream to S3 and generate a signed URL.

1 import os
2 import boto3
3 import uuid
4 
5 AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
6 AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
7 AWS_REGION_NAME = os.getenv("AWS_REGION_NAME")
8 AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")
9 
10 session = boto3.Session(
11     aws_access_key_id=AWS_ACCESS_KEY_ID,
12     aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
13     region_name=AWS_REGION_NAME,
14 )
15 s3 = session.client("s3")
16 
17 
18 def generate_presigned_url(s3_file_name: str) -> str:
19     signed_url = s3.generate_presigned_url(
20         "get_object",
21         Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name},
22         ExpiresIn=3600,
23     )  # URL expires in 1 hour
24     return signed_url
25 
26 
27 def upload_audiostream_to_s3(audio_stream) -> str:
28     s3_file_name = f"{uuid.uuid4()}.mp3"  # Generates a unique file name using UUID
29     s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name)
30 
31     return s3_file_name

You can then call uploading function with the audio stream from the text.

1 s3_file_name = upload_audiostream_to_s3(audio_stream)

After uploading the audio file to S3, generate a signed URL to share access to the file. This URL will be time-limited, meaning it will expire after a certain period, making it secure for temporary sharing.

You can now generate a URL from a file with:

1 signed_url = generate_presigned_url(s3_file_name)
2 print(f"Signed URL to access the file: {signed_url}")

If you want to use the file multiple times, you should store the s3 file path in your database and then regenerate the signed URL each time you need rather than saving the signed URL directly as it will expire.

Putting it all together

To put it all together, you can use the following script:

1 import os
2 
3 from dotenv import load_dotenv
4 
5 load_dotenv()
6 
7 from text_to_speech_stream import text_to_speech_stream
8 from s3_uploader import upload_audiostream_to_s3, generate_presigned_url
9 
10 
11 def main():
12     text = "This is James"
13 
14     audio_stream = text_to_speech_stream(text)
15     s3_file_name = upload_audiostream_to_s3(audio_stream)
16     signed_url = generate_presigned_url(s3_file_name)
17 
18     print(f"Signed URL to access the file: {signed_url}")
19 
20 
21 if __name__ == "__main__":
22     main()

Conclusion

You now know how to convert text into speech and generate a signed URL to share the audio file. This functionality opens up numerous opportunities for creating and sharing content dynamically.

Here are some examples of what you could build with this.

Educational Podcasts: Create personalized educational content that can be accessed by students on demand. Teachers can convert their lessons into audio format, upload them to S3, and share the links with students for a more engaging learning experience outside the traditional classroom setting.
Accessibility Features for Websites: Enhance website accessibility by offering text content in audio format. This can make information on websites more accessible to individuals with visual impairments or those who prefer auditory learning.
Automated Customer Support Messages: Produce automated and personalized audio messages for customer support, such as FAQs or order updates. This can provide a more engaging customer experience compared to traditional text emails.
Audio Books and Narration: Convert entire books or short stories into audio format, offering a new way for audiences to enjoy literature. Authors and publishers can diversify their content offerings and reach audiences who prefer listening over reading.
Language Learning Tools: Develop language learning aids that provide learners with audio lessons and exercises. This makes it possible to practice pronunciation and listening skills in a targeted way.

For more details, visit the following to see the full project files which give a clear structure for setting up your application:

For Python: example repo

For TypeScript: example repo

If you have any questions please create an issue on the elevenlabs-doc Github.

1	import os
2	import uuid
3	from elevenlabs import VoiceSettings
4	from elevenlabs.client import ElevenLabs
5
6	ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
7	client = ElevenLabs(
8	api_key=ELEVENLABS_API_KEY,
9	)
10
11
12	def text_to_speech_file(text: str) -> str:
13	# Calling the text_to_speech conversion API with detailed parameters
14	response = client.text_to_speech.convert(
15	voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
16	output_format="mp3_22050_32",
17	text=text,
18	model_id="eleven_turbo_v2_5", # use the turbo model for low latency
19	# Optional voice settings that allow you to customize the output
20	voice_settings=VoiceSettings(
21	stability=0.0,
22	similarity_boost=1.0,
23	style=0.0,
24	use_speaker_boost=True,
25	speed=1.0,
26	),
27	)
28
29	# uncomment the line below to play the audio back
30	# play(response)
31
32	# Generating a unique file name for the output MP3 file
33	save_file_path = f"{uuid.uuid4()}.mp3"
34
35	# Writing the audio to a file
36	with open(save_file_path, "wb") as f:
37	for chunk in response:
38	if chunk:
39	f.write(chunk)
40
41	print(f"{save_file_path}: A new audio file was saved successfully!")
42
43	# Return the path of the saved audio file
44	return save_file_path

1	import os
2	from typing import IO
3	from io import BytesIO
4	from elevenlabs import VoiceSettings
5	from elevenlabs.client import ElevenLabs
6
7	ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
8	client = ElevenLabs(
9	api_key=ELEVENLABS_API_KEY,
10	)
11
12
13	def text_to_speech_stream(text: str) -> IO[bytes]:
14	# Perform the text-to-speech conversion
15	response = client.text_to_speech.convert(
16	voice_id="pNInz6obpgDQGcFmaJgB", # Adam pre-made voice
17	output_format="mp3_22050_32",
18	text=text,
19	model_id="eleven_multilingual_v2",
20	# Optional voice settings that allow you to customize the output
21	voice_settings=VoiceSettings(
22	stability=0.0,
23	similarity_boost=1.0,
24	style=0.0,
25	use_speaker_boost=True,
26	speed=1.0,
27	),
28	)
29
30	# Create a BytesIO object to hold the audio data in memory
31	audio_stream = BytesIO()
32
33	# Write each chunk of audio data to the stream
34	for chunk in response:
35	if chunk:
36	audio_stream.write(chunk)
37
38	# Reset stream position to the beginning
39	audio_stream.seek(0)
40
41	# Return the stream for further use
42	return audio_stream

1	import os
2	import boto3
3	import uuid
4
5	AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
6	AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY")
7	AWS_REGION_NAME = os.getenv("AWS_REGION_NAME")
8	AWS_S3_BUCKET_NAME = os.getenv("AWS_S3_BUCKET_NAME")
9
10	session = boto3.Session(
11	aws_access_key_id=AWS_ACCESS_KEY_ID,
12	aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
13	region_name=AWS_REGION_NAME,
14	)
15	s3 = session.client("s3")
16
17
18	def generate_presigned_url(s3_file_name: str) -> str:
19	signed_url = s3.generate_presigned_url(
20	"get_object",
21	Params={"Bucket": AWS_S3_BUCKET_NAME, "Key": s3_file_name},
22	ExpiresIn=3600,
23	) # URL expires in 1 hour
24	return signed_url
25
26
27	def upload_audiostream_to_s3(audio_stream) -> str:
28	s3_file_name = f"{uuid.uuid4()}.mp3" # Generates a unique file name using UUID
29	s3.upload_fileobj(audio_stream, AWS_S3_BUCKET_NAME, s3_file_name)
30
31	return s3_file_name

1	signed_url = generate_presigned_url(s3_file_name)
2	print(f"Signed URL to access the file: {signed_url}")

1	import os
2
3	from dotenv import load_dotenv
4
5	load_dotenv()
6
7	from text_to_speech_stream import text_to_speech_stream
8	from s3_uploader import upload_audiostream_to_s3, generate_presigned_url
9
10
11	def main():
12	text = "This is James"
13
14	audio_stream = text_to_speech_stream(text)
15	s3_file_name = upload_audiostream_to_s3(audio_stream)
16	signed_url = generate_presigned_url(s3_file_name)
17
18	print(f"Signed URL to access the file: {signed_url}")
19
20
21	if __name__ == "__main__":
22	main()