Synchronous Speech to Text

Learn how to transcribe audio with ElevenLabs synchronously

In this tutorial, you’ll learn how to transcribe audio with the ElevenLabs SDK synchronously.

Requirements

  • An ElevenLabs account with an API key.
  • Python or Node installed on your machine

Setup

Installing the SDK

Before you begin, make sure you have installed the ElevenLabs SDK.

1pip install elevenlabs

Additionally, install necessary packages to manage your environmental variables:

1pip install python-dotenv

Next, create a .env file in your project directory and fill it with your credentials like so:

.env
1ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

To convert the audio of a file to text, we’ll use the convert method of the ElevenLabs SDK.

Convert speech to text

1from dotenv import load_dotenv
2from io import BytesIO
3import requests
4from elevenlabs.client import ElevenLabs
5
6load_dotenv()
7
8client = ElevenLabs(
9 api_key=os.getenv("ELEVENLABS_API_KEY"),
10)
11
12audio_url = (
13 "https://storage.googleapis.com/eleven-public-cdn/audio/marketing/nicole.mp3"
14)
15response = requests.get(audio_url)
16audio_data = BytesIO(response.content)
17
18transcription = client.speech_to_text.convert(
19 file=audio_data,
20 model_id="scribe_v1", # Model to use, for now only "scribe_v1" is supported
21 tag_audio_events=True, # Tag audio events like laughter, applause, etc.
22 language_code="eng", # Language of the audio file. If set to None, the model will detect the language automatically.
23 diarize=True, # Whether to annotate who is speaking
24)
25
26print(transcription.text)

Run the script with:

1python speech_to_text.py
Built with