Keyterm prompting

Learn how to use keyterm prompting with the Speech to Text API

Overview

Keyterm prompting is only available with the Scribe v2 model and comes at an additional cost. See the API pricing page for detailed pricing information.

Keyterm prompting is a feature that allows you to highlight up to 100 words or phrases to bias the model towards transcribing them. This is useful for transcribing specific words or sentences that are not common in the audio, such as product names, names, or other specific terms. Keyterms are more powerful than biased keywords or customer vocabularies offered by other models, because it relies on the context to decide whether to transcribe that term or not.

For example, if your company name is not a common phrase or has a unique spelling or pronunciation you can use keyterms to ensure the model transcribes correctly. Take the following audio:

Without keyterm prompting, the model might transcribe the above as:

I work at eleven labs.

Which uses the wrong style for the company name. With keyterm prompting, you can ensure the model transcribes the above with the correct spelling and style:

I work at ElevenLabs.

Context

The model is able to use context to determine whether a term should be transcribed or not. When providing the keyterm “ElevenLabs”, the above audio transcribes as expected, yet the model will still be able to transcribe the following correctly based on the context:

Which outputs the following transcription:

I've worked at many labs. In fact I've worked at eleven labs.

Integrating keyterm prompting

Keyterm prompting is integrated into the Speech to Text API by passing the keyterms parameter to the convert method.

1import os
2from dotenv import load_dotenv
3from io import BytesIO
4import requests
5from elevenlabs.client import ElevenLabs
6
7load_dotenv()
8
9elevenlabs = ElevenLabs(
10 api_key=os.getenv("ELEVENLABS_API_KEY"),
11)
12
13audio_url = (
14 "https://storage.googleapis.com/eleven-public-cdn/documentation_assets/audio/stt-keyterm-prompting.mp3"
15)
16response = requests.get(audio_url)
17audio_data = BytesIO(response.content)
18
19transcription = elevenlabs.speech_to_text.convert(
20 file=audio_data,
21 model_id="scribe_v2", # Model to use
22 # Keyterms to prompt the model with.
23 # Up to 100 keyterms can be provided, with a maximum length of 50 characters each
24 keyterms=["ElevenLabs"],
25)
26
27print(transcription)