Speech to Text quickstart

Learn how to convert spoken audio into text.

This guide will show you how to convert spoken audio into text using the Speech to Text API.

Using the Speech to Text API

1

Create an API key

Create an API key in the dashboard here, which you’ll use to securely access the API.

Store the key as a managed secret and pass it to the SDKs either as a environment variable via an .env file, or directly in your app’s configuration depending on your preference.

.env
1ELEVENLABS_API_KEY=<your_api_key_here>
2

Install the SDK

We’ll also use the dotenv library to load our API key from an environment variable.

1pip install elevenlabs
2pip install python-dotenv
3

Make the API request

Create a new file named example.py or example.mts, depending on your language of choice and add the following code:

1# example.py
2import os
3from dotenv import load_dotenv
4from io import BytesIO
5import requests
6from elevenlabs.client import ElevenLabs
7
8load_dotenv()
9
10client = ElevenLabs(
11 api_key=os.getenv("ELEVENLABS_API_KEY"),
12)
13
14audio_url = (
15 "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
16)
17response = requests.get(audio_url)
18audio_data = BytesIO(response.content)
19
20transcription = client.speech_to_text.convert(
21 file=audio_data,
22 model_id="scribe_v1", # Model to use, for now only "scribe_v1" is supported
23 tag_audio_events=True, # Tag audio events like laughter, applause, etc.
24 language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.
25 diarize=True, # Whether to annotate who is speaking
26)
27
28print(transcription)
4

Execute the code

1python example.py

You should see the transcription of the audio file printed to the console.

Next steps

Explore the API reference for more information on the Speech to Text API and its options.