Speech to Text quickstart

Learn how to convert spoken audio into text.

Speech to Text quickstart

Learn how to convert spoken audio into text.

This guide will show you how to convert spoken audio into text using the Speech to Text API.

Use the ElevenLabs speech-to-text skill to transcribe audio from your AI coding assistant:

$ npx skills add elevenlabs/skills --skill speech-to-text

This tutorial will demonstrate how to use the Batch Speech to Text API. For a guide on how to use the Realtime Speech to Text API, see the Client-side streaming or Server-side streaming guides.

Using the Speech to Text API

Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.

.env

1 ELEVENLABS_API_KEY=<your_api_key_here>

Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

1 pip install elevenlabs
2 pip install python-dotenv

Make the API request

Create a new file named example.py or example.mts, depending on your language of choice and add the following code:

1 # example.py
2 import os
3 from dotenv import load_dotenv
4 from io import BytesIO
5 import requests
6 from elevenlabs.client import ElevenLabs
7 
8 load_dotenv()
9 
10 elevenlabs = ElevenLabs(
11   api_key=os.getenv("ELEVENLABS_API_KEY"),
12 )
13 
14 audio_url = (
15     "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
16 )
17 response = requests.get(audio_url)
18 audio_data = BytesIO(response.content)
19 
20 transcription = elevenlabs.speech_to_text.convert(
21     file=audio_data,
22     model_id="scribe_v2", # Model to use
23     tag_audio_events=True, # Tag audio events like laughter, applause, etc.
24     language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.
25     diarize=True, # Whether to annotate who is speaking
26 )
27 
28 print(transcription)

Execute the code

1 python example.py

You should see the transcription of the audio file printed to the console.

Next steps

Batch transcription

Transcribe pre-recorded audio files with speaker diarization and event tagging

Realtime transcription

Stream audio and receive transcriptions in real time

API reference

Explore all Speech to Text parameters and response formats

This guide will show you how to convert spoken audio into text using the Speech to Text API.

Use the ElevenLabs speech-to-text skill to transcribe audio from your AI coding assistant:

$ npx skills add elevenlabs/skills --skill speech-to-text

This tutorial will demonstrate how to use the Batch Speech to Text API. For a guide on how to use the Realtime Speech to Text API, see the Client-side streaming or Server-side streaming guides.

Using the Speech to Text API

Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.

.env

1 ELEVENLABS_API_KEY=<your_api_key_here>

Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

1 pip install elevenlabs
2 pip install python-dotenv

Make the API request

Create a new file named example.py or example.mts, depending on your language of choice and add the following code:

1 # example.py
2 import os
3 from dotenv import load_dotenv
4 from io import BytesIO
5 import requests
6 from elevenlabs.client import ElevenLabs
7 
8 load_dotenv()
9 
10 elevenlabs = ElevenLabs(
11   api_key=os.getenv("ELEVENLABS_API_KEY"),
12 )
13 
14 audio_url = (
15     "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
16 )
17 response = requests.get(audio_url)
18 audio_data = BytesIO(response.content)
19 
20 transcription = elevenlabs.speech_to_text.convert(
21     file=audio_data,
22     model_id="scribe_v2", # Model to use
23     tag_audio_events=True, # Tag audio events like laughter, applause, etc.
24     language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.
25     diarize=True, # Whether to annotate who is speaking
26 )
27 
28 print(transcription)

Execute the code

1 python example.py

You should see the transcription of the audio file printed to the console.

Next steps

Batch transcription

Transcribe pre-recorded audio files with speaker diarization and event tagging

Realtime transcription

Stream audio and receive transcriptions in real time

API reference

Explore all Speech to Text parameters and response formats

1	# example.py
2	import os
3	from dotenv import load_dotenv
4	from io import BytesIO
5	import requests
6	from elevenlabs.client import ElevenLabs
7
8	load_dotenv()
9
10	elevenlabs = ElevenLabs(
11	api_key=os.getenv("ELEVENLABS_API_KEY"),
12	)
13
14	audio_url = (
15	"https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
16	)
17	response = requests.get(audio_url)
18	audio_data = BytesIO(response.content)
19
20	transcription = elevenlabs.speech_to_text.convert(
21	file=audio_data,
22	model_id="scribe_v2", # Model to use
23	tag_audio_events=True, # Tag audio events like laughter, applause, etc.
24	language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.
25	diarize=True, # Whether to annotate who is speaking
26	)
27
28	print(transcription)